How to setup date and fuzzy title search on elasticsearch - ruby-on-rails

I am building an Rails 5 app with an Angular 7 frontent.
In this app I am using Searchkick (an Elasticsearch gem) and I have indexed a model called Event that got attributes title (string) and starts_at (datetime).
I want to be able to build a query in the search controller where I am able to do the following:
Search the title with a fuzzy search meaning it do not have to match 100% (which it now require).
Search with a date range matching starts_at for the indexed Events.
This is my controller index method
def index
args = {}
args[:eventable_id] = params[:id]
args[:eventable_type] = params[:type]
args[:title] = params[:title] if params[:title].present?
if params[:starts_at].present?
args[:starts_at] = {}
args[:starts_at][:gte] = params[:starts_at].to_date.beginning_of_day
args[:starts_at][:lte] = params[:ends_at].to_date.end_of_day
end
#events = Event.search where: args, page: params[:page], per_page: params[:per_page]
end
I have added this line to my Event model
searchkick text_middle: [:title]
This is the actual query that is run
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": [{
"term": {
"eventable_id": "2"
}
}, {
"term": {
"eventable_type": "Space"
}
}, {
"term": {
"title": "nice event"
}
}, {
"range": {
"starts_at": {
"from": "2020-02-01T00:00:00.000Z",
"include_lower": true,
"to": "2020-02-29T23:59:59.999Z",
"include_upper": true
}
}
}]
}
},
"timeout": "11s",
"_source": false,
"size": 10000
}
The date search does not work (but I get no errors) and the title search must match 100% (even the case).
Thankful for all help!

Rather than using Fuzzy queries, I would recommend an ngram analyzer.
Here is an example of an ngram analyzer:
analyzer: {
ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "ngram_filter"],
char_filter: [
"replace_dots"
]
}
},
filter: {
ngram_filter: {
type: "ngram",
min_gram: "3",
max_gram: "20",
}
}
You will also have to add this code to your settings index:
max_ngram_diff: 17
Then on your mapping, make sure you create two fields. 1 mapping for your regular field such as name and then another mapping for your ngram field such as name.ngram.
In my query, I like to give my name field a boost of 10 and my name.ngram field a boost of 5 so that the exact matches will be rendered first. You will have to play with this though.
In regard to your range query, I am using gte and lte. Here is an example:
query:{
bool: {
must: {
range: {date: {gte: params[:date], lte: params[:date], boost: 10}}
}
}
}
I hope this helps.

Related

Elasticsearch Find Out does user stops or moving - Possible?

I want to use elasticsearch configuration about mapping to display user location and his/her direction to admin in my web app. so I create an index in elasticsearch like:
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"analyzer-name": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"driver_id": { "type": "integer" },
"email": { "type": "text" },
"location": { "type": "geo_point" },
"app-platform": { "type": "text" },
"app-version": { "type": "text" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}
}
}
}
and start to inserting user location to elasticsearch with this curl
{
"driver_id": 357,
"driver_email": "Andrew#mailinatior.com",
"location": {
"lat": 37.3,
"lon": 59.52
},
"created_at": "2021-06-04 00:09:00"
}
this structure came from user mobile to my elasticsearch, after that I wrote these services to fetch data for my web-end part of my designing:
module Api
module V1
module Drivers
module Elastic
class LiveLocation
include Peafowl
attribute :driver_id, ::Integer
def call
#driver = ::Driver.find(driver_id) if driver_id.present?
result = []
options = {
headers: {
'Content-Type' => 'application/json'
},
body: #driver.present? ? options_with_driver : options
}
begin
response = HTTParty.get(elasticseach_url.to_s, options)
records = JSON.parse(response.body)['hits']['hits']
if records.present?
records.group_by { |r| r['_source']['driver_id'] }.to_a.each do |record|
driver = ::Driver.where(id: record[0]).first
if driver.present?
location = record[1][0]['_source']['location']
app_platform = record[1][0]['_source']['app-platform']
app_version = record[1][0]['_source']['app-version']
result.push(driver_id: driver.id, driver_email: driver.profile.email, location: location, app_platform: app_platform, app_version: app_version)
end
end
end
rescue StandardError => error
Rails.logger.info "Error => #{error}"
result = []
end
context[:response] = result
end
def elasticseach_url
"#{ENV.fetch('ELASTICSEARCH_BASE_URL', 'http://127.0.0.1:9200')}/#{ENV.fetch('ELASTICSEARCH_DRIVER_POSITION_INDEX', 'live_location')}/_search"
end
def options
{
query: {
bool: {
filter: [
{
range: {
created_at: {
gte: (Time.now.beginning_of_day.strftime '%Y-%m-%d %H:%M:%S')
}
}
}
]
}
},
sort: [
{
created_at: {
order: 'desc'
}
}
]
}.to_json
end
def optinos_with_driver
{
query: {
bool: {
must: [
{
term: {
driver_id: {
value: #driver.id
}
}
}
],
filter: [
{
range: {
created_at: {
gte: (Time.now.beginning_of_day.strftime '%Y-%m-%d %H:%M:%S')
}
}
}
]
}
},
sort: [
{
created_at: {
order: 'desc'
}
}
]
}.to_json
end
end
end
end
end
end
this structure working perfectly but even if the user stops while elasticsearch saves his location but I need to filter user data that if the user stops for one hour in place elasticsearch understand and not saving data. Is it possible?
I use elsticsearch 7.1
and ruby 2.5
I know it's possible in kibana but I could not using kibana at this tim.
I am not sure if this can be done via a single ES query...
However you can use 2 queries:
one to check if the user's location's during the last hour is the same
Second same then don't insert
But i don't recommend that
What you could do:
Use REDIS or any in-mem cache to maintain the user's last geo-location duration
Basis that, update or skip update to Elastic Search
PS: I am not familiar with ES geo-location API

Group a Searchkick result?

I have a basic Searchkick system set-up. I want to take the results and then group them by an attribute to sum a another attribute etc.
This question is close to my issue:
Elasticsearch + searckick
and the only answer was to use aggregations. I could do that but then I would be building an active record call for each of the agg keys returned.
Here is what I have so far:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { "total": { "sum": { "field": "total" } } } } } } )
which results in:
"aggregations"=>{"cbs_item_id"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}}, {"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}}]}}}>
in my search_data I have a term 'cbs' which is a text value that relates to the 'cbs_item_id'. I am looking for this result:
"aggregations"=>
{"cbs_item_id"=>
{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"value"=>"MY CBS Related Field" }},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"value"=>"MY OTHER CBS Related Field" }}]}}}
This of this where you have in inventory of cars and a separate table of car_colors ( [id = 1, color = red], [id = 3, color = blue ]. I want to search for the cars of a given color then group them and sum etc.
I am sure I am perhaps missing something simple here.
UPDATE
Getting close:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { cbs: { terms: { field: "cbs" } }, "total": { "sum": { "field": "total" } } } } } } )
which results:
"buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"001", "doc_count"=>2}]}},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"002", "doc_count"=>2}]}}]}}
The second "key"s 001 and 002 are the data I am looking for.

Spell check Ngram for elastic Search not working with rails

I have used in my model to include spell check such that if the user inputs data like "Rentaal" then it should fetch the correct data as "Rental"
document.rb code
require 'elasticsearch/model'
class Document < ApplicationRecord
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
belongs_to :user
Document.import force: true
def self.search(query)
__elasticsearch__.search({
query: {
multi_match: {
query: query,
fields: ['name^10', 'service']
}
}
})
end
settings index: {
"number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: {
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer"
end
end
end
search controller code:
def search
if params[:query].nil?
#documents = []
else
#documents = Document.search params[:query]
end
end
However, if I enter Rentaal or any misspelled word, it does not display anything.
In my console
#documents.results.to_a
gives an empty array.
What am I doing wrong here? Let me know if more data is required.
Try to add fuzziness in your multi_match query:
{
"query": {
"multi_match": {
"query": "Rentaal",
"fields": ["name^10", "service"],
"fuzziness": "AUTO"
}
}
}
Explanation
Kstem filter is used for reducing words to their root forms and it does not work as you expected here - it would handle corectly phrases like Renta or Rent, but not the misspelling you provided.
You can check how stemming works with following query:
curl -X POST \
'http://localhost:9200/my_index/_analyze?pretty=true' \
-d '{
"analyzer" : "edge_ngram_analyzer",
"text" : ["rentaal"]
}'
As a result I see:
{
"tokens": [
{
"token": "ren"
},
{
"token": "rent"
},
{
"token": "renta"
},
{
"token": "rentaa"
},
{
"token": "rentaal"
}
]
}
So typical misspelling will be handled much better with applying fuzziness.

Elasticsearch : Multi match query on nested fields

I am having a problem with multi-match query in RoR. I have Elastic Search configured and working however I am working on setting up aggregations which so far seem to work, but for whatever reason I am not able to search on the field which I am aggregating. This is the extract from my model:
settings :index => { :number_of_shards => 1 } do
mapping do
indexes :id, index: :not_analyzed
indexes :name
indexes :summary
indexes :description
indexes :occasions, type: 'nested' do
indexes :id, type: 'integer'
indexes :occasion_name, type: 'string', index: :not_analyzed
...
end
end
end
def as_indexed_json(options = {})
self.as_json(only: [:id, :name, :summary, :description],
include: {
occasions: { only: [:id, :occasion_name] },
courses: { only: [:id, :course_name] },
allergens: { only: [:id, :allergen_name] },
cookingtechniques: { only: [:id, :name] },
cuisine: { only: [:id, :cuisine_name]}
})
end
class << self
def custom_search(query)
__elasticsearch__.search(query: multi_match_query(query), aggs: aggregations)
end
def multi_match_query(query)
{
multi_match:
{
query: query,
type: "best_fields",
fields: ["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
operator: "and"
}
}
end
I am able to search on all fields as specified in the multi_match_query apart of "occasion_name" which happens to be the field I am aggregating. I have checked that the field is correctly indexed (using elastic search-head plugin). I am also able to display the facets with the aggregated occasion_names in my view. I tried everything I can think of, including removing the aggregation and searching on occasion_name, but still no luck.
(I am using the elasticsearch-rails gem)
Any help will be much appreciated.
Edit:
I got this ES query from rails:
#search=
#<Elasticsearch::Model::Searching::SearchRequest:0x007f91244df460
#definition=
{:index=>"recipes",
:type=>"recipe",
:body=>
{:query=>
{:multi_match=>
{:query=>"Christmas",
:type=>"best_fields",
:fields=>["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
:operator=>"and"}},
:aggs=>
{:occasion_aggregation=>
{:nested=>{:path=>"occasions"}, :aggs=>{:id_and_name=>{:terms=>{:script=>"doc['occasions.id'].value + '|' + doc['occasions.occasion_name'].join(' ')", :size=>35}}}}}}},
This is an example of all that gets indexed for 1 of my dummy recipes I use for testing (the contents are meaningless - I use this only for testing):
{
"_index": "recipes",
"_type": "recipe",
"_id": "7",
"_version": 1,
"_score": 1,
"_source": {
"id": 7,
"name": "Mustard-stuffed chicken",
"summary": "This is so good we'd be surprised if this chicken fillet recipe doesn't become a firm favourite. Save it to your My Good Food collection and enjoy",
"description": "Heat oven to 200C/fan 180C/gas 6. Mix the cheeses and mustard together. Cut a slit into the side of each chicken breast, then stuff with the mustard mixture. Wrap each stuffed chicken breast with 2 bacon rashers – not too tightly, but enough to hold the chicken together. Season, place on a baking sheet and roast for 20-25 mins.",
"occasions": [
{
"id": 9,
"occasion_name": "Christmas"
}
,
{
"id": 7,
"occasion_name": "Halloween"
}
,
{
"id": 8,
"occasion_name": "Bonfire Night"
}
,
{
"id": 10,
"occasion_name": "New Year"
}
],
"courses": [
{
"id": 9,
"course_name": "Side Dish"
}
,
{
"id": 7,
"course_name": "Poultry"
}
,
{
"id": 8,
"course_name": "Salad"
}
,
{
"id": 10,
"course_name": "Soup"
}
],
"allergens": [
{
"id": 6,
"allergen_name": "Soya"
}
,
{
"id": 7,
"allergen_name": "Nut"
}
,
{
"id": 8,
"allergen_name": "Other"
}
,
{
"id": 1,
"allergen_name": "Dairy"
}
],
"cookingtechniques": [
{
"id": 15,
"name": "Browning"
}
],
"cuisine": {
"id": 1,
"cuisine_name": "African"
}
}
}
EDIT 2:
I managed to make the search work for occasions as suggested by #rahulroc, but now I can't search on anything else...
def multi_match_query(query)
{
nested:{
path: 'occasions',
query:{
multi_match:
{
query: query,
type: "best_fields",
fields: ["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
operator: "and"
}
}
}
}
end
UPDATE: Adding multiple nested fields - I am trying to add the rest of my aggregations but I am facing similar problem as before. My end goal will be to use the aggregations as filters so I need to add about 4 more nested fields to my query (I also would like to have the fields searchable) Here is the working query as provided by #rahulroc + the addition of another nested field which I can't search on. As before in terms of indexing everything is working and I can display the aggregations for the newly added field, but I can't search on it. I tried different variations of this query but I couldn't make it work (the rest of the fields are still working and searchable - the problem is just the new field):
def multi_match_query(query)
{
bool: {
should: [
{
nested:{
path: 'occasions',
query: {
multi_match:
{
query: query,
type: "best_fields",
fields: ["occasion_name"]
}
}
}
},
{
nested:{
path: 'courses',
query: {
multi_match:
{
query: query,
type: "best_fields",
fields: ["course_name"]
}
}
}
},
{
multi_match: {
query: query,
fields:["name^9", "summary^8", "cuisine_name^7", "description^6"],
}
}
]
}
}
end
You need to create a separate nested clause for matching a nested field
"query": {
"bool": {
"should": [
{
"nested": {
"path": "occassions",
"query": {
"multi_match": {
"query": "Christmas",
"fields": ["occassion_name^2"]
}
}
}
},
{
"multi_match": {
"query": "Christmas",
"fields":["name^9", "summary^8", "cuisine_name^7", "description^6","course_name^6"] }
}
]
}
}

EdgeNGram with Tire and ElasticSearch

If I have two strings:
Doe, Joe
Doe, Jonathan
I want to implement a search such that:
"Doe" > "Doe, Joe", "Doe, Jonathan"
"Doe J" > "Doe, Joe", "Doe, Jonathan"
"Jon Doe" > "Doe, Jonathan"
"Jona Do" > "Doe, Jonathan"
Here's the code that I have:
settings analysis: {
filter: {
nameNGram: {
type: "edgeNGram",
min_gram: 1,
max_gram: 20,
}
},
tokenizer: {
non_word: {
type: "pattern",
pattern: "[^\\w]+"
}
},
analyzer: {
name_analyzer: {
type: "custom",
tokenizer: "non_word",
filter: ["lowercase", "nameNGram"]
},
}
} do
mapping do
indexes :name, type: "multi_field", fields: {
analyzed: { type: "string", index: :analyzed, index_analyzer: "name_analyzer" }, # for indexing
unanalyzed: { type: "string", index: :not_analyzed, :include_in_all => false } # for sorting
}
end
end
def self.search(params)
tire.search(:page => params[:page], :per_page => 20) do
query do
string "name.analyzed:" + params[:query], default_operator: "AND"
end
sort do
by "name.unanalyzed", "asc"
end
end
end
Unfortunately, this doesn't appear to be working... The tokenizing looks great, for "Doe, Jonathan" I get something like "d", "do", "doe", "j", "jo", "jon", "jona" etc. but if I search for "do AND jo", I get back nothing. If I, however, search for "jona", I get back "Doe, Jonathan." What am I doing wrong?
You should likely only be using EdgeNGram if you want to create an autocomplete. I suspect that you want to use a pattern filter to separate words my commas.
Something like this:
"tokenizer": {
"comma_pattern_token": {
"type": "pattern",
"pattern": ",",
"group": -1
}
}
If I am mistaken and you need edgeNGrams for some other reason then your problem is that your index analyzer is ignoring stop words (such as the word AND) and your search analyzer is not. You need to create a custom analyzer for your search_analyzer that does not include the stop word filter.

Resources