Elasticsearch Facet List doesn't Match Results - ruby-on-rails

Problem
When I filter by a particular facet, that specific field's facets are correctly filtered in the result but the other facet fields remain the same. Best way to explain this is with the query and the response.
Query
{
query: {
match_all: {}
},
filter: {
and: [{
term: {
"address.state": "oregon"
}
}]
},
facets: {
"address.city": {
terms: {
field: "address.city"
},
facet_filter: {}
},
"address.state": {
terms: {
field: "address.state"
},
facet_filter: {
and: [{
term: {
"address.state": "oregon"
}
}]
}
},
"address.country": {
terms: {
field: "address.country"
},
facet_filter: {}
}
}
}
Result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "races",
"_type": "race",
"_id": "6",
"_score": 1,
"_source": {
"id": 6,
"name": "Eugene Marathon",
"description": "...",
"created_at": "2015-05-24T19:41:45.043Z",
"updated_at": "2015-05-24T19:41:45.046Z",
"address": {
"race_id": 6,
"id": 7,
"line1": null,
"line2": null,
"city": "Eugene",
"state": "oregon",
"country": "united_states",
"zip": null,
"user_id": null,
"created_at": "2015-05-24T19:41:45.044Z",
"updated_at": "2015-05-24T19:41:45.044Z"
},
"race_years": []
}
}
]
},
"facets": {
"address.city": {
"_type": "terms",
"missing": 0,
"total": 7,
"other": 0,
"terms": [
{
"term": "long beach",
"count": 1
},
{
"term": "lincoln",
"count": 1
},
{
"term": "las vegas",
"count": 1
},
{
"term": "jackson",
"count": 1
},
{
"term": "eugene",
"count": 1
},
{
"term": "duluth",
"count": 1
},
{
"term": "denver",
"count": 1
}
]
},
"address.state": {
"_type": "terms",
"missing": 0,
"total": 1,
"other": 0,
"terms": [
{
"term": "oregon",
"count": 1
}
]
},
"address.country": {
"_type": "terms",
"missing": 0,
"total": 7,
"other": 0,
"terms": [
{
"term": "united_states",
"count": 7
}
]
}
}
}
So as you can see it returns all the address.city facets even though the only result is located in Eugene. It is also returning a count of 7 on the united_states. Why would it be returning all of these extra facets and with incorrect counts? My ruby mapping is found below.
Ruby Mapping
settings index: {
number_of_shards: 1,
analysis: {
analyzer: {
facet_analyzer: {
type: 'custom',
tokenizer: 'keyword',
filter: ['lowercase', 'trim']
}
}
}
} do
mapping do
indexes :name, type: 'string', analyzer: 'english', boost: 10
indexes :description, type: 'string', analyzer: 'english'
indexes :address do
indexes :city, type: 'string', analyzer: 'facet_analyzer'
indexes :state, type: 'string'
indexes :country, type: 'string'
end
end
end

This is the normal behavior of facets when ran against a filter. From the official documentation:
There’s one important distinction to keep in mind. While search
queries restrict both the returned documents and facet counts, search
filters restrict only returned documents — but not facet counts.
In your case, your query matches all documents (i.e. match_all) so the facet counts are counted against all documents, too.
Change your query to this and your facet counts will change (in this case you don't need the facet_filter anymore):
{
query: {
term: {
"address.state": "oregon"
}
},
facets: {
"address.city": {
terms: {
field: "address.city"
}
},
"address.state": {
terms: {
field: "address.state"
}
},
"address.country": {
terms: {
field: "address.country"
}
}
}
}
Another thing worth noting is that facets are deprecated and have been replaced by the much more powerful aggregations.

Related

Rails Searchkick not returning results when I use a where statement

I run
Post.search("daniel")
I get 60+ results
Post.where(archive: true)
I get 60+ results
Post.search("daniel", where: { archive: true }
Here is the full searchkick query.
I get 0 results
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"dis_max": {
"queries": [
{
"multi_match": {
"query": "daniel",
"boost": 10,
"operator": "and",
"analyzer": "searchkick_search",
"fields": [
"*.analyzed"
],
"type": "best_fields"
}
},
{
"multi_match": {
"query": "daniel",
"boost": 10,
"operator": "and",
"analyzer": "searchkick_search2",
"fields": [
"*.analyzed"
],
"type": "best_fields"
}
},
{
"multi_match": {
"query": "daniel",
"boost": 1,
"operator": "and",
"analyzer": "searchkick_search",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 3,
"fuzzy_transpositions": true,
"fields": [
"*.analyzed"
],
"type": "best_fields"
}
},
{
"multi_match": {
"query": "daniel",
"boost": 1,
"operator": "and",
"analyzer": "searchkick_search2",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 3,
"fuzzy_transpositions": true,
"fields": [
"*.analyzed"
],
"type": "best_fields"
}
}
]
}
}
]
}
},
"filter": [
{
"term": {
"archive": {
"value": true
}
...
I looked at the searchkick gem doc and I am following exactly what they have listed to do. The normal search works fine and it only returns 0 posts when I add the where clause.
Without the where clause it shows all the posts which have "daniel" in the and it breaks when the where clause is added.
What am I doing wrong here? Is more information needed?
require 'elasticsearch/model'
class Post < ApplicationRecord
searchkick text_start: [:title]

elasticsearch with painless script to return extra fields

I am following this example https://www.compose.com/articles/how-to-script-painless-ly-in-elasticsearch/
where BOTH the ORIGINAL fields plus the calculated field (some_scores) are presented in the result document.
{
"_index": "sat",
"_type": "scores",
"_id": "AV3CYR8JFgEfgdUCQSON",
"_score": 1,
"_source": {
"cds": 1611760130062,
"rtype": "S",
"sname": "American High",
"dname": "Fremont Unified",
"cname": "Alameda",
"enroll12": 444,
"NumTstTakr": 298,
"AvgScrRead": 576,
"AvgScrMath": 610,
"AvgScrWrit": 576,
"NumGE1500": 229,
"PctGE1500": 76.85,
"year": 1516
},
"fields": {
"some_scores": [
1152
]
}
}
Now i am doing a _search with the following post body
{
"query": {
"match_all": {}
},
"script_fields": {
"some_scores": {
"script": {
"lang": "painless",
"inline": "\"hello\""
}
}
}
}
but the results i am getting DOESN'T contain the original fields; it only contains the testing field which i hardcoded to hello. Is there anything wrong with my query ?
"hits": [
{
"_index": "abcIndex",
"_type": "data",
"_id": "id_00000025",
"_score": 1.0,
"fields": {
"some_scores": [
"hello"
]
}
}]
You need to explicitly pass _source": ["*"] when using script field.
I was not able to find reason for this , looks like some kind of optimization.
{
"_source": ["*"],
"query": {
"match_all": {}
},
"script_fields": {
"some_scores": {
"script": {
"lang": "painless",
"inline": "doc['authorization']+\"hello\""
}
}
}

Function Score attribute to rank searches based on clicks not working with elastic search and rails

I have implemented the function score attribute in my document model which contains a click field that keeps tracks of a number of view per document. Now I want the search results to get more priority and appear at the top based on the clicks per search
My document.rb code
require 'elasticsearch/model'
def self.search(query)
__elasticsearch__.search(
{
query: {
function_score: {
query: {
multi_match: {
query: query,
fields: ['name', 'service'],
fuzziness: "AUTO"
}
},
field_value_factor: {
field: 'clicks',
modifier: 'log1p',
factor: 2
}
}
}
}
)
end
settings index: { "number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
},
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
end
end
end
Search View is here
<h1>Document Search</h1>
<%= form_for search_path, method: :get do |f| %>
<p>
<%= f.label "Search for" %>
<%= text_field_tag :query, params[:query] %>
<%= submit_tag "Go", name: nil %>
</p>
<% end %>
<% if #documents %>
<ul class="search_results">
<% #documents.each do |document| %>
<li>
<h3>
<%= link_to document.name, controller: "documents", action: "show",
id: document._id %>
</h3>
</li>
<% end %>
</ul>
<% else %>
<p>Your search did not match any documents.</p>
<% end %>
<br/>
When I search for Estamp, I get the results follow in the following order:
Franking and Estamp # clicks 5
Notary and Estamp #clicks 8
So clearly when the Notary and Estamp had more clicks it does not come to the top of the search.How can I achieve this?
This is what I get when I run it on the console.
POST _search
"hits": {
"total": 2,
"max_score": 1.322861,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "13",
"_score": 1.322861,
"_source": {
"id": 13,
"name": "Franking and Estamp",
"service": "Estamp",
"user_id": 1,
"clicks": 7
},
{
"_index": "documents",
"_type": "document",
"_id": "14",
"_score": 0.29015404,
"_source": {
"id": 14,
"name": "Notary and Estamp",
"service": "Notary",
"user_id": 1,
"clicks": 12
}
}
]
Here the score of the documents is not getting updated based on the clicks
Without seeing your indexed data it's not easy to answer. But looking at the query one thing comes to my mind, I'll show it with short example:
Example 1:
I've indexed following documents:
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"Notary and Estamp", "service" :"text", "clicks": 8}
Running the same query you provided gave this result:
"hits": {
"total": 2,
"max_score": 4.333119,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 4.333119,
"_source": {
"name": "Notary and Estamp",
"service": "text",
"clicks": 8
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 3.6673431,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
}
]
}
So everything is fine - document with 8 clicks got higher scoring (_score field value) and the order is correct.
Example 2:
I noticed in your query that name field is boosted with high factor. So what would happen if I had following data indexed?
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"text", "service" :"Notary and Estamp", "clicks": 8}
And result:
"hits": {
"total": 2,
"max_score": 13.647502,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 13.647502,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 1.5597181,
"_source": {
"name": "text",
"service": "Notary and Estamp",
"clicks": 8
}
}
]
}
Although Franking and Estampy has only 5 clicks, it has much much higher scoring than the second document with greater number of clicks.
So the point is that in your query, the number of clicks is not the only factor that has an impact on scoring and final order of documents. Without the real data it's only a guess from my side. You can run the query yourself with some REST client and check scoring/field/matching phrases.
Update
Based on your search result - you can see that document with id=13 has Estamp term in both fields (name and service). That is the reason why this document got higer scoring (it means that in the algorithm of calculating scoring it is more important to have the term in both fields than have higher number of clicks). If you want clicks field to have bigger impact on the scoring, try to experiment with factor (probably should be higher) and modifier ("modifier": "square" could work in your case). You can check possible values here.
Try for example this combination:
{
"query": {
"function_score": {
... // same as before
},
"field_value_factor": {
"field": "clicks" ,
"modifier": "square",
"factor": 3
}
}
}
}
Update 2 - scoring based only on number of clicks
If the only parameter that should have an impact on scoring should be the value in clicks field, you can try to use "boost_mode": "replace" - in this case only function score is used, the query score is ignored. So the frequency of Estamp term in name and service fields will have no impact on the scoring. Try this query:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "Estamp",
"fields": [ "name", "service"],
"fuzziness": "AUTO"
}
},
"field_value_factor": {
"field": "clicks",
"factor": 1
},
"boost_mode": "replace"
}
}
}
It gave me:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 5,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2nI0HkJPYn0YKQxRvd",
"_score": 5,
"_source": {
"name": "Notary and Estamp",
"service": "Notary",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2nIwKvJPYn0YKQxRvc",
"_score": 4,
"_source": {
"name": "Franking and Estamp",
"service": "Estamp",
"clicks": 4
}
}
]
}
}
This may be the one you are looking for (note the values "_score": 5 and "_score": 4 are matching the number of clicks).

Cypher results return certain structure

I'm trying to return a certain structure.
Here is my query:
MATCH (tracker:tracker { active: true }) OPTIONAL MATCH (tracker { active: true })--(timer:timer) RETURN { tracker:tracker, timers:COLLECT(timer) } as trackers
Here is what I am returning so far:
{
"results": [{
"columns": ["trackers"],
"data": [{
"row": [{
"tracker": {
"title": "a",
"id": "04e3fddc-5aef-4c3a-9aeb-62a9fb15bd75",
"active": true
},
"timers": []
}]
}]
}],
"errors": []
}
I would like the timers to be nested under the "tracker" with the tracker's properties, like this:
{
"results": [{
"columns": ["trackers"],
"data": [{
"row": [{
"tracker": {
"title": "a",
"id": "04e3fddc-5aef-4c3a-9aeb-62a9fb15bd75",
"active": true,
"timers": []
}]
}]
}],
"errors": []
}
Try this:
MATCH (tr:tracker {active: true})
OPTIONAL MATCH (tr)--(ti:timer)
WITH {
title: tr.title,
id: tr.id,
active: tr.active,
timers: COLLECT(ti)
} as trackers
RETURN trackers

Elastic Search- Searching Multiple Queries in Single Field

I'm new to elastic Search. I have a field name clearance in my users table and I'm trying to filter my results based on this.
match: {
clearance: {
query: 'None',
type: 'phrase'
}
}
When I give the above match query i get 3 results. What I'm trying to get is to pass one more string along with None. For eg I want to find the users with clearance None and First Level
I tried this.
multi_match: {
clearance: {
query: 'None OR First Level',
type: 'phrase'
}
}
But ended up in some error. Please Help. Correct me if my question is wrong.
One way would be making clearance as not_analyzed field in the mapping and using terms filter.
Example:
PUT test
{
"mappings": {
"e1":{
"properties": {
"clearance":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Some test data:
PUT test/e1/1
{
"clearance":"None"
}
PUT test/e1/2
{
"clearance":"First Level"
}
PUT test/e1/3
{
"clearance":"Second Level"
}
Now query part:
GET test/e1/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"clearance": [
"None",
"First Level"
],
"execution": "or"
}
}
}
}
}
Result verfication:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "e1",
"_id": "1",
"_score": 1,
"_source": {
"clearance": "None"
}
},
{
"_index": "test",
"_type": "e1",
"_id": "2",
"_score": 1,
"_source": {
"clearance": "First Level"
}
}
]
}
}

Resources