Spell check Ngram for elastic Search not working with rails - ruby-on-rails

I have used in my model to include spell check such that if the user inputs data like "Rentaal" then it should fetch the correct data as "Rental"
document.rb code
require 'elasticsearch/model'
class Document < ApplicationRecord
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
belongs_to :user
Document.import force: true
def self.search(query)
__elasticsearch__.search({
query: {
multi_match: {
query: query,
fields: ['name^10', 'service']
}
}
})
end
settings index: {
"number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: {
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer"
end
end
end
search controller code:
def search
if params[:query].nil?
#documents = []
else
#documents = Document.search params[:query]
end
end
However, if I enter Rentaal or any misspelled word, it does not display anything.
In my console
#documents.results.to_a
gives an empty array.
What am I doing wrong here? Let me know if more data is required.

Try to add fuzziness in your multi_match query:
{
"query": {
"multi_match": {
"query": "Rentaal",
"fields": ["name^10", "service"],
"fuzziness": "AUTO"
}
}
}
Explanation
Kstem filter is used for reducing words to their root forms and it does not work as you expected here - it would handle corectly phrases like Renta or Rent, but not the misspelling you provided.
You can check how stemming works with following query:
curl -X POST \
'http://localhost:9200/my_index/_analyze?pretty=true' \
-d '{
"analyzer" : "edge_ngram_analyzer",
"text" : ["rentaal"]
}'
As a result I see:
{
"tokens": [
{
"token": "ren"
},
{
"token": "rent"
},
{
"token": "renta"
},
{
"token": "rentaa"
},
{
"token": "rentaal"
}
]
}
So typical misspelling will be handled much better with applying fuzziness.

Related

How to setup date and fuzzy title search on elasticsearch

I am building an Rails 5 app with an Angular 7 frontent.
In this app I am using Searchkick (an Elasticsearch gem) and I have indexed a model called Event that got attributes title (string) and starts_at (datetime).
I want to be able to build a query in the search controller where I am able to do the following:
Search the title with a fuzzy search meaning it do not have to match 100% (which it now require).
Search with a date range matching starts_at for the indexed Events.
This is my controller index method
def index
args = {}
args[:eventable_id] = params[:id]
args[:eventable_type] = params[:type]
args[:title] = params[:title] if params[:title].present?
if params[:starts_at].present?
args[:starts_at] = {}
args[:starts_at][:gte] = params[:starts_at].to_date.beginning_of_day
args[:starts_at][:lte] = params[:ends_at].to_date.end_of_day
end
#events = Event.search where: args, page: params[:page], per_page: params[:per_page]
end
I have added this line to my Event model
searchkick text_middle: [:title]
This is the actual query that is run
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": [{
"term": {
"eventable_id": "2"
}
}, {
"term": {
"eventable_type": "Space"
}
}, {
"term": {
"title": "nice event"
}
}, {
"range": {
"starts_at": {
"from": "2020-02-01T00:00:00.000Z",
"include_lower": true,
"to": "2020-02-29T23:59:59.999Z",
"include_upper": true
}
}
}]
}
},
"timeout": "11s",
"_source": false,
"size": 10000
}
The date search does not work (but I get no errors) and the title search must match 100% (even the case).
Thankful for all help!
Rather than using Fuzzy queries, I would recommend an ngram analyzer.
Here is an example of an ngram analyzer:
analyzer: {
ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "ngram_filter"],
char_filter: [
"replace_dots"
]
}
},
filter: {
ngram_filter: {
type: "ngram",
min_gram: "3",
max_gram: "20",
}
}
You will also have to add this code to your settings index:
max_ngram_diff: 17
Then on your mapping, make sure you create two fields. 1 mapping for your regular field such as name and then another mapping for your ngram field such as name.ngram.
In my query, I like to give my name field a boost of 10 and my name.ngram field a boost of 5 so that the exact matches will be rendered first. You will have to play with this though.
In regard to your range query, I am using gte and lte. Here is an example:
query:{
bool: {
must: {
range: {date: {gte: params[:date], lte: params[:date], boost: 10}}
}
}
}
I hope this helps.

Elastic Search "Did you mean" for auto correction of words implementation not working with rails

I am trying to implement full search text engine for my rails app using elastic search for a document class. It should have an auto correction for misspelled words.
This is my document.rb
require 'elasticsearch/model'
class Document < ApplicationRecord
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
belongs_to :user
Document.import force: true
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^10', 'service']
}
}
}
)
end
settings index: { "number_of_shards": 1,
analysis: {
analyzer: {
string_lowercase: { tokenizer: 'keyword', filter: %w(lowercase
ascii_folding) },
did_you_mean: { filter: ['lowercase'], char_filter: ['html_strip'],
type: 'custom', tokenzier: 'standard'},
autocomplete: {filter: ["lowercase", "autocompleteFilter"], char_filter:
[ "html_strip"], type: "custom", tokenizer: "standard"},
default: {filter: [ "lowercase", "stopwords", "stemmer"], char_filter: [
"html_strip"], type: "custom",tokenizer: "standard"}
}
},
filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
},
stemmer: {type: 'stemmer', language: 'english'},
autocompleteFilter: { max_shingle_size: 5, min_shingle_size:2,
type: 'shingle'},
stopwords: {type: 'stop', stopwords: ['_english_'] }
}
} do
mapping do{
document: {
properties: {
autocomplete: {
type: "string",
analyzer: "autocomplete"
},
name: {
type: "string",
copy_to: ["did_you_mean","autocomplete"]
},
did_you_mean: {
type: "string",
analyzer: "didYouMean"
},
service: {
type: "string",
copy_to: ["autocomplete", "did_you_mean"]
}
}
}
}
end
end
It helps me search data. However, the did you mean phrase is not working here.
What can I do further to improve this code?I am using elastic search for the very first time.

Elasticsearch : Multi match query on nested fields

I am having a problem with multi-match query in RoR. I have Elastic Search configured and working however I am working on setting up aggregations which so far seem to work, but for whatever reason I am not able to search on the field which I am aggregating. This is the extract from my model:
settings :index => { :number_of_shards => 1 } do
mapping do
indexes :id, index: :not_analyzed
indexes :name
indexes :summary
indexes :description
indexes :occasions, type: 'nested' do
indexes :id, type: 'integer'
indexes :occasion_name, type: 'string', index: :not_analyzed
...
end
end
end
def as_indexed_json(options = {})
self.as_json(only: [:id, :name, :summary, :description],
include: {
occasions: { only: [:id, :occasion_name] },
courses: { only: [:id, :course_name] },
allergens: { only: [:id, :allergen_name] },
cookingtechniques: { only: [:id, :name] },
cuisine: { only: [:id, :cuisine_name]}
})
end
class << self
def custom_search(query)
__elasticsearch__.search(query: multi_match_query(query), aggs: aggregations)
end
def multi_match_query(query)
{
multi_match:
{
query: query,
type: "best_fields",
fields: ["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
operator: "and"
}
}
end
I am able to search on all fields as specified in the multi_match_query apart of "occasion_name" which happens to be the field I am aggregating. I have checked that the field is correctly indexed (using elastic search-head plugin). I am also able to display the facets with the aggregated occasion_names in my view. I tried everything I can think of, including removing the aggregation and searching on occasion_name, but still no luck.
(I am using the elasticsearch-rails gem)
Any help will be much appreciated.
Edit:
I got this ES query from rails:
#search=
#<Elasticsearch::Model::Searching::SearchRequest:0x007f91244df460
#definition=
{:index=>"recipes",
:type=>"recipe",
:body=>
{:query=>
{:multi_match=>
{:query=>"Christmas",
:type=>"best_fields",
:fields=>["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
:operator=>"and"}},
:aggs=>
{:occasion_aggregation=>
{:nested=>{:path=>"occasions"}, :aggs=>{:id_and_name=>{:terms=>{:script=>"doc['occasions.id'].value + '|' + doc['occasions.occasion_name'].join(' ')", :size=>35}}}}}}},
This is an example of all that gets indexed for 1 of my dummy recipes I use for testing (the contents are meaningless - I use this only for testing):
{
"_index": "recipes",
"_type": "recipe",
"_id": "7",
"_version": 1,
"_score": 1,
"_source": {
"id": 7,
"name": "Mustard-stuffed chicken",
"summary": "This is so good we'd be surprised if this chicken fillet recipe doesn't become a firm favourite. Save it to your My Good Food collection and enjoy",
"description": "Heat oven to 200C/fan 180C/gas 6. Mix the cheeses and mustard together. Cut a slit into the side of each chicken breast, then stuff with the mustard mixture. Wrap each stuffed chicken breast with 2 bacon rashers – not too tightly, but enough to hold the chicken together. Season, place on a baking sheet and roast for 20-25 mins.",
"occasions": [
{
"id": 9,
"occasion_name": "Christmas"
}
,
{
"id": 7,
"occasion_name": "Halloween"
}
,
{
"id": 8,
"occasion_name": "Bonfire Night"
}
,
{
"id": 10,
"occasion_name": "New Year"
}
],
"courses": [
{
"id": 9,
"course_name": "Side Dish"
}
,
{
"id": 7,
"course_name": "Poultry"
}
,
{
"id": 8,
"course_name": "Salad"
}
,
{
"id": 10,
"course_name": "Soup"
}
],
"allergens": [
{
"id": 6,
"allergen_name": "Soya"
}
,
{
"id": 7,
"allergen_name": "Nut"
}
,
{
"id": 8,
"allergen_name": "Other"
}
,
{
"id": 1,
"allergen_name": "Dairy"
}
],
"cookingtechniques": [
{
"id": 15,
"name": "Browning"
}
],
"cuisine": {
"id": 1,
"cuisine_name": "African"
}
}
}
EDIT 2:
I managed to make the search work for occasions as suggested by #rahulroc, but now I can't search on anything else...
def multi_match_query(query)
{
nested:{
path: 'occasions',
query:{
multi_match:
{
query: query,
type: "best_fields",
fields: ["name^9", "summary^8", "cuisine_name^7", "description^6", "occasion_name^6", "course_name^6", "cookingtechniques.name^5"],
operator: "and"
}
}
}
}
end
UPDATE: Adding multiple nested fields - I am trying to add the rest of my aggregations but I am facing similar problem as before. My end goal will be to use the aggregations as filters so I need to add about 4 more nested fields to my query (I also would like to have the fields searchable) Here is the working query as provided by #rahulroc + the addition of another nested field which I can't search on. As before in terms of indexing everything is working and I can display the aggregations for the newly added field, but I can't search on it. I tried different variations of this query but I couldn't make it work (the rest of the fields are still working and searchable - the problem is just the new field):
def multi_match_query(query)
{
bool: {
should: [
{
nested:{
path: 'occasions',
query: {
multi_match:
{
query: query,
type: "best_fields",
fields: ["occasion_name"]
}
}
}
},
{
nested:{
path: 'courses',
query: {
multi_match:
{
query: query,
type: "best_fields",
fields: ["course_name"]
}
}
}
},
{
multi_match: {
query: query,
fields:["name^9", "summary^8", "cuisine_name^7", "description^6"],
}
}
]
}
}
end
You need to create a separate nested clause for matching a nested field
"query": {
"bool": {
"should": [
{
"nested": {
"path": "occassions",
"query": {
"multi_match": {
"query": "Christmas",
"fields": ["occassion_name^2"]
}
}
}
},
{
"multi_match": {
"query": "Christmas",
"fields":["name^9", "summary^8", "cuisine_name^7", "description^6","course_name^6"] }
}
]
}
}

Rails Elasticsearch analyzer mappings defined in model are not reported in elasticsearch

In my Recipe model I have :
class Recipe < ActiveRecord::Base
index_name "recipes-#{Rails.env}"
settings do
mappings dynamic: 'false' do
indexes :title, type: 'string', analyzer: 'french'
indexes :description, type: 'string', analyzer: 'french'
end
end
def as_indexed_json(options={})
self.as_json({only: [:title, :description]})
end
Then in rails console, I launch Recipe.import. When asking to elasticsearch via curl or Sense GET /recipes-development/_mapping/, I get
{
"recipes-development": {
"mappings": {
"recipe": {
"properties": {
"description": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
}
}
}
I have lost all informations about analyzer. Any idea would be appreciated
Before Recipe.import you have to execute
Recipe.__elasticsearch__.create_index! force: true

EdgeNGram with Tire and ElasticSearch

If I have two strings:
Doe, Joe
Doe, Jonathan
I want to implement a search such that:
"Doe" > "Doe, Joe", "Doe, Jonathan"
"Doe J" > "Doe, Joe", "Doe, Jonathan"
"Jon Doe" > "Doe, Jonathan"
"Jona Do" > "Doe, Jonathan"
Here's the code that I have:
settings analysis: {
filter: {
nameNGram: {
type: "edgeNGram",
min_gram: 1,
max_gram: 20,
}
},
tokenizer: {
non_word: {
type: "pattern",
pattern: "[^\\w]+"
}
},
analyzer: {
name_analyzer: {
type: "custom",
tokenizer: "non_word",
filter: ["lowercase", "nameNGram"]
},
}
} do
mapping do
indexes :name, type: "multi_field", fields: {
analyzed: { type: "string", index: :analyzed, index_analyzer: "name_analyzer" }, # for indexing
unanalyzed: { type: "string", index: :not_analyzed, :include_in_all => false } # for sorting
}
end
end
def self.search(params)
tire.search(:page => params[:page], :per_page => 20) do
query do
string "name.analyzed:" + params[:query], default_operator: "AND"
end
sort do
by "name.unanalyzed", "asc"
end
end
end
Unfortunately, this doesn't appear to be working... The tokenizing looks great, for "Doe, Jonathan" I get something like "d", "do", "doe", "j", "jo", "jon", "jona" etc. but if I search for "do AND jo", I get back nothing. If I, however, search for "jona", I get back "Doe, Jonathan." What am I doing wrong?
You should likely only be using EdgeNGram if you want to create an autocomplete. I suspect that you want to use a pattern filter to separate words my commas.
Something like this:
"tokenizer": {
"comma_pattern_token": {
"type": "pattern",
"pattern": ",",
"group": -1
}
}
If I am mistaken and you need edgeNGrams for some other reason then your problem is that your index analyzer is ignoring stop words (such as the word AND) and your search analyzer is not. You need to create a custom analyzer for your search_analyzer that does not include the stop word filter.

Resources