Elasticsearch : Multi match query on nested fields - ruby-on-rails

I am having a problem with multi-match query in RoR. I have Elastic Search configured and working however I am working on setting up aggregations which so far seem to work, but for whatever reason I am not able to search on the field which I am aggregating. This is the extract from my model:
settings :index => { :number_of_shards => 1 } do
mapping do
indexes :id, index: :not_analyzed
indexes :name
indexes :summary
indexes :description
indexes :occasions, type: 'nested' do
indexes :id, type: 'integer'
indexes :occasion_name, type: 'string', index: :not_analyzed
...
end
end
end
def as_indexed_json(options = {})
self.as_json(only: [:id, :name, :summary, :description],
include: {
occasions: { only: [:id, :occasion_name] },
courses: { only: [:id, :course_name] },
allergens: { only: [:id, :allergen_name] },
cookingtechniques: { only: [:id, :name] },
cuisine: { only: [:id, :cuisine_name]}
})
end
class << self
def custom_search(query)
__elasticsearch__.search(query: multi_match_query(query), aggs: aggregations)
end
def multi_match_query(query)
{
multi_match:
{
query: query,
type: "best_fields",
fields: ["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
operator: "and"
}
}
end
I am able to search on all fields as specified in the multi_match_query apart of "occasion_name" which happens to be the field I am aggregating. I have checked that the field is correctly indexed (using elastic search-head plugin). I am also able to display the facets with the aggregated occasion_names in my view. I tried everything I can think of, including removing the aggregation and searching on occasion_name, but still no luck.
(I am using the elasticsearch-rails gem)
Any help will be much appreciated.
Edit:
I got this ES query from rails:
#search=
#<Elasticsearch::Model::Searching::SearchRequest:0x007f91244df460
#definition=
{:index=>"recipes",
:type=>"recipe",
:body=>
{:query=>
{:multi_match=>
{:query=>"Christmas",
:type=>"best_fields",
:fields=>["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
:operator=>"and"}},
:aggs=>
{:occasion_aggregation=>
{:nested=>{:path=>"occasions"}, :aggs=>{:id_and_name=>{:terms=>{:script=>"doc['occasions.id'].value + '|' + doc['occasions.occasion_name'].join(' ')", :size=>35}}}}}}},
This is an example of all that gets indexed for 1 of my dummy recipes I use for testing (the contents are meaningless - I use this only for testing):
{
"_index": "recipes",
"_type": "recipe",
"_id": "7",
"_version": 1,
"_score": 1,
"_source": {
"id": 7,
"name": "Mustard-stuffed chicken",
"summary": "This is so good we'd be surprised if this chicken fillet recipe doesn't become a firm favourite. Save it to your My Good Food collection and enjoy",
"description": "Heat oven to 200C/fan 180C/gas 6. Mix the cheeses and mustard together. Cut a slit into the side of each chicken breast, then stuff with the mustard mixture. Wrap each stuffed chicken breast with 2 bacon rashers – not too tightly, but enough to hold the chicken together. Season, place on a baking sheet and roast for 20-25 mins.",
"occasions": [
{
"id": 9,
"occasion_name": "Christmas"
}
,
{
"id": 7,
"occasion_name": "Halloween"
}
,
{
"id": 8,
"occasion_name": "Bonfire Night"
}
,
{
"id": 10,
"occasion_name": "New Year"
}
],
"courses": [
{
"id": 9,
"course_name": "Side Dish"
}
,
{
"id": 7,
"course_name": "Poultry"
}
,
{
"id": 8,
"course_name": "Salad"
}
,
{
"id": 10,
"course_name": "Soup"
}
],
"allergens": [
{
"id": 6,
"allergen_name": "Soya"
}
,
{
"id": 7,
"allergen_name": "Nut"
}
,
{
"id": 8,
"allergen_name": "Other"
}
,
{
"id": 1,
"allergen_name": "Dairy"
}
],
"cookingtechniques": [
{
"id": 15,
"name": "Browning"
}
],
"cuisine": {
"id": 1,
"cuisine_name": "African"
}
}
}
EDIT 2:
I managed to make the search work for occasions as suggested by #rahulroc, but now I can't search on anything else...
def multi_match_query(query)
{
nested:{
path: 'occasions',
query:{
multi_match:
{
query: query,
type: "best_fields",
fields: ["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
operator: "and"
}
}
}
}
end
UPDATE: Adding multiple nested fields - I am trying to add the rest of my aggregations but I am facing similar problem as before. My end goal will be to use the aggregations as filters so I need to add about 4 more nested fields to my query (I also would like to have the fields searchable) Here is the working query as provided by #rahulroc + the addition of another nested field which I can't search on. As before in terms of indexing everything is working and I can display the aggregations for the newly added field, but I can't search on it. I tried different variations of this query but I couldn't make it work (the rest of the fields are still working and searchable - the problem is just the new field):
def multi_match_query(query)
{
bool: {
should: [
{
nested:{
path: 'occasions',
query: {
multi_match:
{
query: query,
type: "best_fields",
fields: ["occasion_name"]
}
}
}
},
{
nested:{
path: 'courses',
query: {
multi_match:
{
query: query,
type: "best_fields",
fields: ["course_name"]
}
}
}
},
{
multi_match: {
query: query,
fields:["name^9", "summary^8", "cuisine_name^7", "description^6"],
}
}
]
}
}
end

You need to create a separate nested clause for matching a nested field
"query": {
"bool": {
"should": [
{
"nested": {
"path": "occassions",
"query": {
"multi_match": {
"query": "Christmas",
"fields": ["occassion_name^2"]
}
}
}
},
{
"multi_match": {
"query": "Christmas",
"fields":["name^9", "summary^8", "cuisine_name^7", "description^6","course_name^6"] }
}
]
}
}

Related

Elasticsearch Find Out does user stops or moving - Possible?

I want to use elasticsearch configuration about mapping to display user location and his/her direction to admin in my web app. so I create an index in elasticsearch like:
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"analyzer-name": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"driver_id": { "type": "integer" },
"email": { "type": "text" },
"location": { "type": "geo_point" },
"app-platform": { "type": "text" },
"app-version": { "type": "text" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}
}
}
}
and start to inserting user location to elasticsearch with this curl
{
"driver_id": 357,
"driver_email": "Andrew#mailinatior.com",
"location": {
"lat": 37.3,
"lon": 59.52
},
"created_at": "2021-06-04 00:09:00"
}
this structure came from user mobile to my elasticsearch, after that I wrote these services to fetch data for my web-end part of my designing:
module Api
module V1
module Drivers
module Elastic
class LiveLocation
include Peafowl
attribute :driver_id, ::Integer
def call
#driver = ::Driver.find(driver_id) if driver_id.present?
result = []
options = {
headers: {
'Content-Type' => 'application/json'
},
body: #driver.present? ? options_with_driver : options
}
begin
response = HTTParty.get(elasticseach_url.to_s, options)
records = JSON.parse(response.body)['hits']['hits']
if records.present?
records.group_by { |r| r['_source']['driver_id'] }.to_a.each do |record|
driver = ::Driver.where(id: record[0]).first
if driver.present?
location = record[1][0]['_source']['location']
app_platform = record[1][0]['_source']['app-platform']
app_version = record[1][0]['_source']['app-version']
result.push(driver_id: driver.id, driver_email: driver.profile.email, location: location, app_platform: app_platform, app_version: app_version)
end
end
end
rescue StandardError => error
Rails.logger.info "Error => #{error}"
result = []
end
context[:response] = result
end
def elasticseach_url
"#{ENV.fetch('ELASTICSEARCH_BASE_URL', 'http://127.0.0.1:9200')}/#{ENV.fetch('ELASTICSEARCH_DRIVER_POSITION_INDEX', 'live_location')}/_search"
end
def options
{
query: {
bool: {
filter: [
{
range: {
created_at: {
gte: (Time.now.beginning_of_day.strftime '%Y-%m-%d %H:%M:%S')
}
}
}
]
}
},
sort: [
{
created_at: {
order: 'desc'
}
}
]
}.to_json
end
def optinos_with_driver
{
query: {
bool: {
must: [
{
term: {
driver_id: {
value: #driver.id
}
}
}
],
filter: [
{
range: {
created_at: {
gte: (Time.now.beginning_of_day.strftime '%Y-%m-%d %H:%M:%S')
}
}
}
]
}
},
sort: [
{
created_at: {
order: 'desc'
}
}
]
}.to_json
end
end
end
end
end
end
this structure working perfectly but even if the user stops while elasticsearch saves his location but I need to filter user data that if the user stops for one hour in place elasticsearch understand and not saving data. Is it possible?
I use elsticsearch 7.1
and ruby 2.5
I know it's possible in kibana but I could not using kibana at this tim.
I am not sure if this can be done via a single ES query...
However you can use 2 queries:
one to check if the user's location's during the last hour is the same
Second same then don't insert
But i don't recommend that
What you could do:
Use REDIS or any in-mem cache to maintain the user's last geo-location duration
Basis that, update or skip update to Elastic Search
PS: I am not familiar with ES geo-location API

How to setup date and fuzzy title search on elasticsearch

I am building an Rails 5 app with an Angular 7 frontent.
In this app I am using Searchkick (an Elasticsearch gem) and I have indexed a model called Event that got attributes title (string) and starts_at (datetime).
I want to be able to build a query in the search controller where I am able to do the following:
Search the title with a fuzzy search meaning it do not have to match 100% (which it now require).
Search with a date range matching starts_at for the indexed Events.
This is my controller index method
def index
args = {}
args[:eventable_id] = params[:id]
args[:eventable_type] = params[:type]
args[:title] = params[:title] if params[:title].present?
if params[:starts_at].present?
args[:starts_at] = {}
args[:starts_at][:gte] = params[:starts_at].to_date.beginning_of_day
args[:starts_at][:lte] = params[:ends_at].to_date.end_of_day
end
#events = Event.search where: args, page: params[:page], per_page: params[:per_page]
end
I have added this line to my Event model
searchkick text_middle: [:title]
This is the actual query that is run
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": [{
"term": {
"eventable_id": "2"
}
}, {
"term": {
"eventable_type": "Space"
}
}, {
"term": {
"title": "nice event"
}
}, {
"range": {
"starts_at": {
"from": "2020-02-01T00:00:00.000Z",
"include_lower": true,
"to": "2020-02-29T23:59:59.999Z",
"include_upper": true
}
}
}]
}
},
"timeout": "11s",
"_source": false,
"size": 10000
}
The date search does not work (but I get no errors) and the title search must match 100% (even the case).
Thankful for all help!
Rather than using Fuzzy queries, I would recommend an ngram analyzer.
Here is an example of an ngram analyzer:
analyzer: {
ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "ngram_filter"],
char_filter: [
"replace_dots"
]
}
},
filter: {
ngram_filter: {
type: "ngram",
min_gram: "3",
max_gram: "20",
}
}
You will also have to add this code to your settings index:
max_ngram_diff: 17
Then on your mapping, make sure you create two fields. 1 mapping for your regular field such as name and then another mapping for your ngram field such as name.ngram.
In my query, I like to give my name field a boost of 10 and my name.ngram field a boost of 5 so that the exact matches will be rendered first. You will have to play with this though.
In regard to your range query, I am using gte and lte. Here is an example:
query:{
bool: {
must: {
range: {date: {gte: params[:date], lte: params[:date], boost: 10}}
}
}
}
I hope this helps.

Spell check Ngram for elastic Search not working with rails

I have used in my model to include spell check such that if the user inputs data like "Rentaal" then it should fetch the correct data as "Rental"
document.rb code
require 'elasticsearch/model'
class Document < ApplicationRecord
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
belongs_to :user
Document.import force: true
def self.search(query)
__elasticsearch__.search({
query: {
multi_match: {
query: query,
fields: ['name^10', 'service']
}
}
})
end
settings index: {
"number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: {
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer"
end
end
end
search controller code:
def search
if params[:query].nil?
#documents = []
else
#documents = Document.search params[:query]
end
end
However, if I enter Rentaal or any misspelled word, it does not display anything.
In my console
#documents.results.to_a
gives an empty array.
What am I doing wrong here? Let me know if more data is required.
Try to add fuzziness in your multi_match query:
{
"query": {
"multi_match": {
"query": "Rentaal",
"fields": ["name^10", "service"],
"fuzziness": "AUTO"
}
}
}
Explanation
Kstem filter is used for reducing words to their root forms and it does not work as you expected here - it would handle corectly phrases like Renta or Rent, but not the misspelling you provided.
You can check how stemming works with following query:
curl -X POST \
'http://localhost:9200/my_index/_analyze?pretty=true' \
-d '{
"analyzer" : "edge_ngram_analyzer",
"text" : ["rentaal"]
}'
As a result I see:
{
"tokens": [
{
"token": "ren"
},
{
"token": "rent"
},
{
"token": "renta"
},
{
"token": "rentaa"
},
{
"token": "rentaal"
}
]
}
So typical misspelling will be handled much better with applying fuzziness.

Elasticsearch doesn't apply the NOT filter

I've been knocking my head against a wall with Elasticsearch today, trying to fix a failing test case.
I am using Rails 3.2.14, Ruby 1.9.3, the Tire gem and ElasticSearch 0.90.2
The objective is to have the query return matching results EXCLUDING the item where
vid == "ABC123xyz"
The Ruby code in the Video model looks like this:
def related_videos(count)
Video.search load: true do
size(count)
filter :term, :category_id => self.category_id
filter :term, :live => true
filter :term, :public => true
filter :not, {:term => {:vid => self.vid}}
query do
boolean do
should { text(:_all, self.title, boost: 2) }
should { text(:_all, self.description) }
should { terms(:tags, self.tag_list, minimum_match: 1) }
end
end
end
end
The resulting search query generated by Tire looks like this:
{
"query":{
"bool":{
"should":[
{
"text":{
"_all":{
"query":"Top Gun","boost":2
}
}
},
{
"text":{
"_all":{
"query":"The macho students of an elite US Flying school for advanced fighter pilots compete to be best in the class, and one romances the teacher."
}
}
},
{
"terms":{
"tags":["top-gun","80s"],
"minimum_match":1
}
}
]
}
},
"filter":{
"and":[
{
"term":{
"category_id":1
}
},
{
"term":{
"live":true
}
},
{
"term":{
"public":true
}
},
{
"not":{
"term":{
"vid":"ABC123xyz"
}
}
}
]
},
"size":10
}
The resulting JSON from ElasticSearch:
{
"took": 7,
"timed_out": false,
"_shards":{
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total":1,
"max_score":0.2667512,
"hits":[
{
"_index":"test_videos",
"_type":"video",
"_id":"8",
"_score":0.2667512,
"_source":{
"vid":"ABC123xyz",
"title":"Top Gun",
"description":"The macho students of an elite US Flying school for advanced fighter pilots compete to be best in the class, and one romances the teacher.",
"tags":["top-gun","80s"],
"category_id":1,
"live":true,
"public":true,
"featured":false,
"last_video_view_count":0,
"boost_factor":0.583013698630137,
"created_at":"2013-08-28T14:24:47Z"
}
}
]
}
}
Could somebody help! The docs for Elasticsearch are sparse around this topic and I'm running out of ideas.
Thanks
Using a top-level filter the way you are doesn't filter the results of your query - it just filters results out of things like facet counts. There's a fuller description in the elasticsearch documentation for filter.
You need to do a filtered query which is slightly different and filters the results of your query clauses:
Video.search load: true do
query do
filtered do
boolean do
should { text(:_all, self.title, boost: 2) }
should { text(:_all, self.description) }
should { terms(:tags, self.tag_list, minimum_match: 1) }
end
filter :term, :category_id => self.category_id
filter :term, :live => true
filter :term, :public => true
filter :not, {:term => {:vid => self.vid}}
end
end
end

EdgeNGram with Tire and ElasticSearch

If I have two strings:
Doe, Joe
Doe, Jonathan
I want to implement a search such that:
"Doe" > "Doe, Joe", "Doe, Jonathan"
"Doe J" > "Doe, Joe", "Doe, Jonathan"
"Jon Doe" > "Doe, Jonathan"
"Jona Do" > "Doe, Jonathan"
Here's the code that I have:
settings analysis: {
filter: {
nameNGram: {
type: "edgeNGram",
min_gram: 1,
max_gram: 20,
}
},
tokenizer: {
non_word: {
type: "pattern",
pattern: "[^\\w]+"
}
},
analyzer: {
name_analyzer: {
type: "custom",
tokenizer: "non_word",
filter: ["lowercase", "nameNGram"]
},
}
} do
mapping do
indexes :name, type: "multi_field", fields: {
analyzed: { type: "string", index: :analyzed, index_analyzer: "name_analyzer" }, # for indexing
unanalyzed: { type: "string", index: :not_analyzed, :include_in_all => false } # for sorting
}
end
end
def self.search(params)
tire.search(:page => params[:page], :per_page => 20) do
query do
string "name.analyzed:" + params[:query], default_operator: "AND"
end
sort do
by "name.unanalyzed", "asc"
end
end
end
Unfortunately, this doesn't appear to be working... The tokenizing looks great, for "Doe, Jonathan" I get something like "d", "do", "doe", "j", "jo", "jon", "jona" etc. but if I search for "do AND jo", I get back nothing. If I, however, search for "jona", I get back "Doe, Jonathan." What am I doing wrong?
You should likely only be using EdgeNGram if you want to create an autocomplete. I suspect that you want to use a pattern filter to separate words my commas.
Something like this:
"tokenizer": {
"comma_pattern_token": {
"type": "pattern",
"pattern": ",",
"group": -1
}
}
If I am mistaken and you need edgeNGrams for some other reason then your problem is that your index analyzer is ignoring stop words (such as the word AND) and your search analyzer is not. You need to create a custom analyzer for your search_analyzer that does not include the stop word filter.

Resources