Elastic Search- Searching Multiple Queries in Single Field - ruby-on-rails

I'm new to elastic Search. I have a field name clearance in my users table and I'm trying to filter my results based on this.
match: {
clearance: {
query: 'None',
type: 'phrase'
}
}
When I give the above match query i get 3 results. What I'm trying to get is to pass one more string along with None. For eg I want to find the users with clearance None and First Level
I tried this.
multi_match: {
clearance: {
query: 'None OR First Level',
type: 'phrase'
}
}
But ended up in some error. Please Help. Correct me if my question is wrong.

One way would be making clearance as not_analyzed field in the mapping and using terms filter.
Example:
PUT test
{
"mappings": {
"e1":{
"properties": {
"clearance":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Some test data:
PUT test/e1/1
{
"clearance":"None"
}
PUT test/e1/2
{
"clearance":"First Level"
}
PUT test/e1/3
{
"clearance":"Second Level"
}
Now query part:
GET test/e1/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"clearance": [
"None",
"First Level"
],
"execution": "or"
}
}
}
}
}
Result verfication:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "e1",
"_id": "1",
"_score": 1,
"_source": {
"clearance": "None"
}
},
{
"_index": "test",
"_type": "e1",
"_id": "2",
"_score": 1,
"_source": {
"clearance": "First Level"
}
}
]
}
}

Related

elasticsearch with painless script to return extra fields

I am following this example https://www.compose.com/articles/how-to-script-painless-ly-in-elasticsearch/
where BOTH the ORIGINAL fields plus the calculated field (some_scores) are presented in the result document.
{
"_index": "sat",
"_type": "scores",
"_id": "AV3CYR8JFgEfgdUCQSON",
"_score": 1,
"_source": {
"cds": 1611760130062,
"rtype": "S",
"sname": "American High",
"dname": "Fremont Unified",
"cname": "Alameda",
"enroll12": 444,
"NumTstTakr": 298,
"AvgScrRead": 576,
"AvgScrMath": 610,
"AvgScrWrit": 576,
"NumGE1500": 229,
"PctGE1500": 76.85,
"year": 1516
},
"fields": {
"some_scores": [
1152
]
}
}
Now i am doing a _search with the following post body
{
"query": {
"match_all": {}
},
"script_fields": {
"some_scores": {
"script": {
"lang": "painless",
"inline": "\"hello\""
}
}
}
}
but the results i am getting DOESN'T contain the original fields; it only contains the testing field which i hardcoded to hello. Is there anything wrong with my query ?
"hits": [
{
"_index": "abcIndex",
"_type": "data",
"_id": "id_00000025",
"_score": 1.0,
"fields": {
"some_scores": [
"hello"
]
}
}]
You need to explicitly pass _source": ["*"] when using script field.
I was not able to find reason for this , looks like some kind of optimization.
{
"_source": ["*"],
"query": {
"match_all": {}
},
"script_fields": {
"some_scores": {
"script": {
"lang": "painless",
"inline": "doc['authorization']+\"hello\""
}
}
}

How to make Elasticsearch sort/prefer hits with exactly matching strings first

I'm using default analyzers and indexing. So let's say I have this simple mapping:
"question": {
"properties": {
"title": {
"type": "string"
},
"answer": {
"properties": {
"text": {
"type": "string"
}
}
}
}
}
(that was an example. sorry if it has typos)
Now, I perform the following search.
GET _search
{
"query": {
"query_string": {
"query": "yes correct",
"fields": ["answer.text"]
}
}
}
The results will score a text value like "yes correct." (doc id value 1) higher than simply "yes correct" (without a period, doc id value 181). Both hits have the same score value, but the hits array lists the one with the smaller doc id first. I understand that the default index option includes sorting by doc id, so how do I exclude that one attribute and still use the rest of the default options?
I'm not setting any custom analyzers, so everything is using default values for Elasticsearch 2.0.
This is probably a use case for Dis Max Query
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.
So following that, you need to make your answer score as an exact match and give it highest boost. You'll have to use a custom analyzer for that. That'd be your mappings:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
}
}
}
},
"mappings": {
"question": {
"properties": {
"title": {
"type": "string"
},
"answer": {
"type": "object",
"properties": {
"text": {
"type": "string",
"analyzer": "my_keyword",
"fields": {
"stemmed": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
}
}
}
}
Your test data:
PUT /test/question/1
{
"title": "title nr1",
"answer": [
{
"text": "yes correct."
}
]
}
PUT /test/question/2
{
"title": "title nr2",
"answer": [
{
"text": "yes correct"
}
]
}
Now when you're querying for "yes correct." using such query:
POST /test/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"answer.text": {
"query": "yes correct.",
"type": "phrase"
}
}
},
{
"match": {
"answer.text.stemmed": {
"query": "yes correct.",
"operator": "and"
}
}
}
]
}
}
}
You get this output:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.37919715,
"hits": [
{
"_index": "test",
"_type": "question",
"_id": "1",
"_score": 0.37919715,
"_source": {
"title": "title nr1",
"answer": [
{
"text": "yes correct."
}
]
}
},
{
"_index": "test",
"_type": "question",
"_id": "2",
"_score": 0.11261705,
"_source": {
"title": "title nr2",
"answer": [
{
"text": "yes correct"
}
]
}
}
]
}
}
If you run very same query without trailing dot, which then becomes "yes correct", you're getting this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.37919715,
"hits": [
{
"_index": "test",
"_type": "question",
"_id": "2",
"_score": 0.37919715,
"_source": {
"title": "title nr2",
"answer": [
{
"text": "yes correct"
}
]
}
},
{
"_index": "test",
"_type": "question",
"_id": "1",
"_score": 0.11261705,
"_source": {
"title": "title nr1",
"answer": [
{
"text": "yes correct."
}
]
}
}
]
}
}
Hopefully this is what you're looking for.
By the way, I'd recommend to always use Match query when performing text search. Taken from documentation:
Comparison to query_string / field The match family of queries
does not go through a "query parsing" process. It does not support
field name prefixes, wildcard characters, or other "advanced"
features. For this reason, chances of it failing are very small / non
existent, and it provides an excellent behavior when it comes to just
analyze and run that text as a query behavior (which is usually what a
text search box does). Also, the phrase_prefix type can provide a
great "as you type" behavior to automatically load search results.
Elasticsearch or rather Lucene scoring does not take into account the relative positioning of the tokens. It utlizes 3 different criterias to do the same
Term frequency - Frequency at which the search terms is present in
the document
Inverse document frequency - Number of occurrence of the search term
in the entire database. The more the occurance , the more the common
is the search term and less the importance it has in search
Field length normalization - Number of tokens present in the target
field.
You can learn more about it here.

First letter match in elasticsearch aggregations

I'm migrating my elasticsearch from using facets to using aggregations, and I want to create a query where the aggregations represent all the creator names that begin with a certain letter.
I've created a nested index like so:
indexes creators, type: 'nested' do
indexes :name, type: 'string', analyzer: 'caseinsensitive', index: 'not_analyzed'
end
The following query will return all the items where a creator's name begins with a "b". Great working so far.
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "creators",
"query": {
"prefix": {
"creators.name": {
"value": "b"
}
}
}
}
}
}
},
"aggregations": {
"creators": {
"nested": {
"path": "creators"
},
"aggs": {
"name": {
"terms": {
"field": "creators.name",
"size": 100
}
}
}
}
}
}
However, the aggregations part of the query returns ALL of the aggregations for the results, including instances creator names that do not begin with a "b." For instance, if I had an item with two creators:
"creators": [
{
"name": "Beyonce"
},
{
"name": "JayZ"
}
],
The aggregation results would include both JayZ and Beyonce. Like most people, I only want Beyonce.
Try this query and see how it goes:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "creators",
"query": {
"prefix": {
"creators.name": {
"value": "b"
}
}
}
}
}
}
},
"aggregations": {
"creators": {
"nested": {
"path": "creators"
},
"aggs": {
"NAME": {
"filter": {
"prefix": {
"creators.name": "b"
}
},
"aggs": {
"name": {
"terms": {
"field": "creators.name",
"size": 100
}
}
}
}
}
}
}
}

Elasticsearch DSL with multiple multi_match query in ruby

I have this scenario wherein there are two multi_match searches within the same query. The trouble is, when I create the JSON for it in ruby, a json with non-unique keys doesn't seem possible so only one of them appear.
Here is my query:
{
"fields": ["id", "title",
"address.city", "address.state", "address.country", "address.state_code", "address.country_code", "proxy_titles", "location"],
"size":2,
"query":{
"filtered":{
"filter": {
"range": {
"custom_score": {
"gte": 100
}
}
},
"query":{
"bool": {
"must": {
"multi_match":{
"query": "term 1",
"type": "cross_fields",
"fields": ["title^2", "proxy_titles^2","description"]
}
},
"must": {
"multi_match": {
"query": "us",
"fields": ["address.city", "address.country", "address.state",
"address.zone", "address.country_code", "address.state_code", "address.zone_code"]
}
}
}
}
}
},
"sort": {
"_score": { "order": "desc" },
"variation": {"order": "asc"},
"updated_at": { "order": "desc" }
}
}
I have also only recently started using elasticsearch so it be very helpful if you could suggest me a better query to accomplish the same as well.
You have the syntax wrong. For multiple "must" values in a "bool", they need to be in an array. The documentation is not always terribly helpful, unfortunately (the bool query page shows this for "should" but not "must").
Try this:
{
"fields": ["id","title","address.city","address.state","address.country","address.state_code","address.country_code","proxy_titles","location"],
"size": 2,
"query": {
"filtered": {
"filter": {
"range": {
"custom_score": {
"gte": 100
}
}
},
"query": {
"bool": [
{
"must": {
"multi_match": {
"query": "term 1",
"type": "cross_fields",
"fields": ["title^2","proxy_titles^2","description"]
}
}
},
{
"must": {
"multi_match": {
"query": "us",
"fields": ["address.city","address.country","address.state","address.zone","address.country_code","address.state_code","address.zone_code"]
}
}
}
]
}
}
},
"sort": {
"_score": {
"order": "desc"
},
"variation": {
"order": "asc"
},
"updated_at": {
"order": "desc"
}
}
}

Elasticsearch Facet List doesn't Match Results

Problem
When I filter by a particular facet, that specific field's facets are correctly filtered in the result but the other facet fields remain the same. Best way to explain this is with the query and the response.
Query
{
query: {
match_all: {}
},
filter: {
and: [{
term: {
"address.state": "oregon"
}
}]
},
facets: {
"address.city": {
terms: {
field: "address.city"
},
facet_filter: {}
},
"address.state": {
terms: {
field: "address.state"
},
facet_filter: {
and: [{
term: {
"address.state": "oregon"
}
}]
}
},
"address.country": {
terms: {
field: "address.country"
},
facet_filter: {}
}
}
}
Result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "races",
"_type": "race",
"_id": "6",
"_score": 1,
"_source": {
"id": 6,
"name": "Eugene Marathon",
"description": "...",
"created_at": "2015-05-24T19:41:45.043Z",
"updated_at": "2015-05-24T19:41:45.046Z",
"address": {
"race_id": 6,
"id": 7,
"line1": null,
"line2": null,
"city": "Eugene",
"state": "oregon",
"country": "united_states",
"zip": null,
"user_id": null,
"created_at": "2015-05-24T19:41:45.044Z",
"updated_at": "2015-05-24T19:41:45.044Z"
},
"race_years": []
}
}
]
},
"facets": {
"address.city": {
"_type": "terms",
"missing": 0,
"total": 7,
"other": 0,
"terms": [
{
"term": "long beach",
"count": 1
},
{
"term": "lincoln",
"count": 1
},
{
"term": "las vegas",
"count": 1
},
{
"term": "jackson",
"count": 1
},
{
"term": "eugene",
"count": 1
},
{
"term": "duluth",
"count": 1
},
{
"term": "denver",
"count": 1
}
]
},
"address.state": {
"_type": "terms",
"missing": 0,
"total": 1,
"other": 0,
"terms": [
{
"term": "oregon",
"count": 1
}
]
},
"address.country": {
"_type": "terms",
"missing": 0,
"total": 7,
"other": 0,
"terms": [
{
"term": "united_states",
"count": 7
}
]
}
}
}
So as you can see it returns all the address.city facets even though the only result is located in Eugene. It is also returning a count of 7 on the united_states. Why would it be returning all of these extra facets and with incorrect counts? My ruby mapping is found below.
Ruby Mapping
settings index: {
number_of_shards: 1,
analysis: {
analyzer: {
facet_analyzer: {
type: 'custom',
tokenizer: 'keyword',
filter: ['lowercase', 'trim']
}
}
}
} do
mapping do
indexes :name, type: 'string', analyzer: 'english', boost: 10
indexes :description, type: 'string', analyzer: 'english'
indexes :address do
indexes :city, type: 'string', analyzer: 'facet_analyzer'
indexes :state, type: 'string'
indexes :country, type: 'string'
end
end
end
This is the normal behavior of facets when ran against a filter. From the official documentation:
There’s one important distinction to keep in mind. While search
queries restrict both the returned documents and facet counts, search
filters restrict only returned documents — but not facet counts.
In your case, your query matches all documents (i.e. match_all) so the facet counts are counted against all documents, too.
Change your query to this and your facet counts will change (in this case you don't need the facet_filter anymore):
{
query: {
term: {
"address.state": "oregon"
}
},
facets: {
"address.city": {
terms: {
field: "address.city"
}
},
"address.state": {
terms: {
field: "address.state"
}
},
"address.country": {
terms: {
field: "address.country"
}
}
}
}
Another thing worth noting is that facets are deprecated and have been replaced by the much more powerful aggregations.

Resources