prevent BooleanQuery$TooManyClauses error elasticsearch - elasticsearch-5

Below is my index setting. I'm using shingle filter for xyz type of index for field synonym.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"shingle"
]
}
}
}
}
},
abc: {
"abc": {
"properties": {
"value": {
"type": "string",
"search_analyzer": "my_analyzer_keyword",
"analyzer": "my_analyzer_keyword"
}
}
}
},
xyz: {
"xyz": {
"properties": {
"synonym": {
"type": "string",
"search_analyzer": "my_analyzer_shingle",
"analyzer": "my_analyzer_keyword"
}
}
}
}
I have input text in which no of words can be 30 or more. My requirement is to get all the synonym field of type xyz from this particular input text which im providing. So im using the following query but it is throwing BooleanQuery$TooManyClauses exception.
{
"query": {
"match": {
"synonym": {
"query": "abas asas asas qwqw ererer asas asas kjjkkj hhha asas nnn jhhha kkka nnna asas qwqw asas qwqw sdsd qwqw erer rtrtr fgfg asas nnn jhhha kkka nnna asas qwqw asas qwqw sdsd qwqw erer rtrtr fgfg "
}
}
}
}
Also i need to identify all one letter synonym as well as two letter synonym from this input text.
I have also tried increasing the indices.query.bool.max_clause_count 4096.
BUt still its throwing error.

For the input text given, it is exceeding too many clauses/terms which is more than max clause count 4096 setting provided while creating the index. solution is to break the input text into two or more queries and combining the result of these is working fine. 2 shingle is working fine with 13 letter input text with setting of max clause count as 4096.

Related

Exact search getting less precedence than phonetic search?

I have an elasticsearch index and am using the following query:
"_source": [
"title",
"content"
],
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "{{query}}",
"fields": [
"title",
"content"
],
"operator": "or"
}
},
"should": [
{
"multi_match": {
"query": "{{query}}",
"fields": [
"title.standard^16",
"content.standard^2"
],
"operator": "and"
}
},
{
"match_phrase": {
"content.standard": {
"query": "{{query}}",
"_name": "Phrase on title",
"boost": 1000
}
}
}
]
}
},
"highlight": {
"fields": {
"content": {}
},
"fragment_size": 100
}
}
Here is the mapping I set:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "my_analyzer",
"fields": {
"standard": {
"type": "text"
},
"stemmer": {
"type": "text",
"analyzer": "english"
}
}
},
"content": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "my_analyzer",
"fields": {
"standard": {
"type": "text"
},
"stemmer": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
Here is my logic with the query:
1) It will give the highest precedence to a phrase if it appears.
2) If not it will use the standard analyzer (that is the text, as is) and give it the highest precedence.
3) If all else doesn't match up, it will use the phonetic analyzer to get the results, that is the least precedence.
But obviously there is some fault to this as it seems to give higher precedence to the phonetic analyzer than the standard or phrase. For example, if I search for "Person of Indian Origin" it returns results on the top highlighting "Pursuant" "pursuing" and very, very less number of results with person of Indian origin although I know a large number of them exists. How do I solve this?

Elasticsearch exact match only for specific fields on multi_match fields

Im trying to search only on the following fields:
name (product name)
vendor.username
vendor.name
categories_name
But the results is to wide, I want the results to be exactly what user is typed.
Example:
I type Cloth A I want the result to be exactly Cloth A not something else contain Cloth or A
Here is my attempt:
```
GET /products/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "cloth A",
"fields": [
"name",
"vendor.name",
"vendor.username",
"categories_name"
]
}
},
"filter": [
{
"term": {
"is_available": true
}
},
{
"term": {
"is_ready": true
}
},
{
"missing": {
"field": "deleted_at"
}
}
]
}
}
}
```
How do I do that? Thanks in advance
Put this in your multi_match
"multi_match": {
"type": "best_fields"
}
This one works:
"multi_match": {
"type": "phrase"
}

First letter match in elasticsearch aggregations

I'm migrating my elasticsearch from using facets to using aggregations, and I want to create a query where the aggregations represent all the creator names that begin with a certain letter.
I've created a nested index like so:
indexes creators, type: 'nested' do
indexes :name, type: 'string', analyzer: 'caseinsensitive', index: 'not_analyzed'
end
The following query will return all the items where a creator's name begins with a "b". Great working so far.
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "creators",
"query": {
"prefix": {
"creators.name": {
"value": "b"
}
}
}
}
}
}
},
"aggregations": {
"creators": {
"nested": {
"path": "creators"
},
"aggs": {
"name": {
"terms": {
"field": "creators.name",
"size": 100
}
}
}
}
}
}
However, the aggregations part of the query returns ALL of the aggregations for the results, including instances creator names that do not begin with a "b." For instance, if I had an item with two creators:
"creators": [
{
"name": "Beyonce"
},
{
"name": "JayZ"
}
],
The aggregation results would include both JayZ and Beyonce. Like most people, I only want Beyonce.
Try this query and see how it goes:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "creators",
"query": {
"prefix": {
"creators.name": {
"value": "b"
}
}
}
}
}
}
},
"aggregations": {
"creators": {
"nested": {
"path": "creators"
},
"aggs": {
"NAME": {
"filter": {
"prefix": {
"creators.name": "b"
}
},
"aggs": {
"name": {
"terms": {
"field": "creators.name",
"size": 100
}
}
}
}
}
}
}
}

Elasticsearch DSL with multiple multi_match query in ruby

I have this scenario wherein there are two multi_match searches within the same query. The trouble is, when I create the JSON for it in ruby, a json with non-unique keys doesn't seem possible so only one of them appear.
Here is my query:
{
"fields": ["id", "title",
"address.city", "address.state", "address.country", "address.state_code", "address.country_code", "proxy_titles", "location"],
"size":2,
"query":{
"filtered":{
"filter": {
"range": {
"custom_score": {
"gte": 100
}
}
},
"query":{
"bool": {
"must": {
"multi_match":{
"query": "term 1",
"type": "cross_fields",
"fields": ["title^2", "proxy_titles^2","description"]
}
},
"must": {
"multi_match": {
"query": "us",
"fields": ["address.city", "address.country", "address.state",
"address.zone", "address.country_code", "address.state_code", "address.zone_code"]
}
}
}
}
}
},
"sort": {
"_score": { "order": "desc" },
"variation": {"order": "asc"},
"updated_at": { "order": "desc" }
}
}
I have also only recently started using elasticsearch so it be very helpful if you could suggest me a better query to accomplish the same as well.
You have the syntax wrong. For multiple "must" values in a "bool", they need to be in an array. The documentation is not always terribly helpful, unfortunately (the bool query page shows this for "should" but not "must").
Try this:
{
"fields": ["id","title","address.city","address.state","address.country","address.state_code","address.country_code","proxy_titles","location"],
"size": 2,
"query": {
"filtered": {
"filter": {
"range": {
"custom_score": {
"gte": 100
}
}
},
"query": {
"bool": [
{
"must": {
"multi_match": {
"query": "term 1",
"type": "cross_fields",
"fields": ["title^2","proxy_titles^2","description"]
}
}
},
{
"must": {
"multi_match": {
"query": "us",
"fields": ["address.city","address.country","address.state","address.zone","address.country_code","address.state_code","address.zone_code"]
}
}
}
]
}
}
},
"sort": {
"_score": {
"order": "desc"
},
"variation": {
"order": "asc"
},
"updated_at": {
"order": "desc"
}
}
}

ElasticSearch: Multi field "upgrade" yields error:

Okay, here is the task:
I've already read the whole documentation and I noticed that I can "upgrade" a data type like string to a multi field - in a test scenario it already worked.
My documents structure is currently:
{
"name": "test",
"words": [
{
"words": "hello world",
"verts": [
1,
2,
3
]
}
]
}
These documents were created using the default mappings - so no mapping has been set explicitly.
I am issuing a XDELETE command with data like:
{
"article": {
"properties": {
"words": {
"type": "multi_field",
"fields": {
"words": {
"type": "string",
"index": "analyzed"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
But I receive this error here:
{"error":"MergeMappingException[Merge failed with failures {[Can't
merge a non multi_field / non simple mapping [words] with a
multi_field mapping [words]]}]","status":400}
Can someone explain to me, why this happens? When I issue this mapping to a clean index, it works and the not_analyzed filter is being applied.
Thanks :)
Jan
Because the "words" field in your document has properties of its own ("words" and "verts"), you can't "upgrade" it to a multi_field. However, if you had a mapping like
{
"article": {
"properties": {
"words": {
"properties": {
"words": {
"type": "multi_field",
"fields": {
"words": {
"type": "string",
"index": "analyzed"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
then everything should work out.

Resources