ElasticSearch/Tire: How to properly set partial word searches up - ruby-on-rails

Even though I've seen many accounts of it mentioning this as relatively straightforward, I haven't managed to see it working properly. Let's say I have this:
class Car < ActiveRecord::Base
settings analysis: {
filter: {
ngram_filter: { type: "nGram", min_gram: 3, max_gram: 12 }
},
analyzer: {
partial_analyzer: {
type: "snowball",
tokenizer: "standard",
filter: ["standard", "lowercase", "ngram_filter"]
}
}
} do
mapping do
indexes :name, index_analyzer: "partial_analyzer"
end
end
end
And let's say I have a car named "Ford" and I update my index. Now, if I search for "Ford":
Car.tire.search { query { string "Ford" } }
My car is in my results. Now, If I look for "For":
Car.tire.search { query { string "For" } }
My car isn't found anymore. I thought the nGram filter would automatically take care of it for me, but apparently it isn't. As a temporary solution I'm using the wildcard (*) for such searches, but this is definitely not the best approach, being the min_gram and max_gram definitions key elements in my search. Can anyone tell me how they solved this?
I'm using Rails 3.2.12 with ruby 1.9.3 . ElasticSearch version is 0.20.5.

You want to use the custom analyzer instead of the snowball one: Elasticsearch custom analyzer
Basically the other analyzers come with a predefined set of filters and tokenizers.
You probably also want to use the Edge-Ngram filter: Edge-Ngram filter
The difference between Edge-NGram and NGram is basically Edge-Ngram basically only sticking to the "edges" of a term. So it starts at the front or at the back. Ford -> [For] instead of -> [For, ord]
Some more advanced links on the topic of autocompletion:
Autocompletion with fuzziness (pure elasticsearch, no tire, but very good read)
Another useful question with links provided
Edit
Basically I have a very similar setup to what you have. But with another analyzer for title and multi-field for both. And because of multi-language support here is an array of names instead of just a name.
I also specify the search_analyzer and I use string-keys instead of symbols. This is what I actually have:
settings "analysis" => {
"filter" => {
"name_ngrams" => {
"side" => "front",
"max_gram" => 20,
"min_gram" => 2,
"type" => "edgeNGram"
}
},
"analyzer" => {
"full_name" => {
"filter" => %w(standard lowercase asciifolding),
"type" => "custom",
"tokenizer" => "letter"
},
"partial_name" => {
"filter" => %w(standard lowercase asciifolding name_ngrams),
"type" => "custom",
"tokenizer" => "standard"
}
}
} do
mapping do
indexes :names do
mapping do
indexes :name, :type => 'multi_field',
:fields => {
"partial" => {
"search_analyzer" => "full_name",
"index_analyzer" => "partial_name",
"type" => "string"
},
"title" => {
"type" => "string",
"analyzer" => "full_name"
}
}
end
end
end
end

Related

Elasticsearch different behaviour on test server

My elasticsearch is currently giving different results on different environments even though I'm doing the same search.
It works fine in development on my localhost, however it doesn't work on my test server (doesn't give expected records, yes I do have the database seeded).
Far as I understand what this should do is check whether it finds a hit on one of the three matches, and if it does return all the hits.
I'm running Windows 10, just using rails s.
The server is running Ubuntu 16, using nginx and unicorn.
Here's my mapping: (note: I'm not completely sure whether the analyzer does anything but it shouldn't matter)
settings index: { number_of_shards: 1 } do
mappings dynamic: 'true' do
indexes :reportdate, type: 'date'
indexes :client do
indexes :id
indexes :name, analyzer: 'dutch'
end
indexes :animal do
indexes :id
indexes :species, analyzer: 'dutch'
indexes :other_species, analyzer: 'dutch'
indexes :chip_code
end
indexes :locations do
indexes :id
indexes :street, analyzer: 'dutch'
indexes :city, analyzer: 'dutch'
indexes :postalcode
end
end
end
Here's my search:
__elasticsearch__.search({
sort: [
{ reportdate: { order: "desc" }},
"_score"
],
query: {
bool: {
should: [
{ multi_match: {
query: query,
type: "phrase_prefix",
fields: [ "other_species", "name"]
}},
{ prefix: {
chip_code: query
}},
{ match_phrase: {
"_all": {
query: query,
fuzziness: "AUTO"
}
}}
]
}
}
})
EDIT #1: Note: I'm fairly new to ruby on rails, started about 2 weeks ago, doing maintenance work on an old project and they also requested a search function.
Turns out that the problem was that I was using foreign tables (well, kinda) and nested mapping (probably this).
Here's the updated code that works on both production and locally:
__elasticsearch__.search({
sort: [
{ reportdate: { order: "desc" }},
"_score"
],
query: {
bool: {
should: [
{ multi_match: {
query: query,
type: "phrase_prefix",
fields: [ "animal.other_species", "client.name"]
}},
{ prefix: {
"animal.chip_code": query
}},
{ match_phrase: {
"_all": {
query: query,
fuzziness: "AUTO"
}
}}
]
}
}
})
Not sure why it doesn't need the animal and client parents preappended to work locally whilst it does need them on my testing server. However this works on both this way.

Find model with part of title using ElasticSearch / Rails

There is the following Post model:
class Post < ActiveRecord::Base
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
def self.search query
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['title']
}
},
filter: {
and: [
{ term: { deleted: false } },
{ term: { enabled: true } }
]
}
}
)
end
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :title, analyzer: 'english'
end
end
end
Post.import
I have one Post with 'Amsterdam' title. When I execute Post.search('Amsterdam') I will get one record, all is good. But if I execute Post.search('Amster') I will get no records. What do I wrong? How can I fix it? Thanks!
OS - OS X, ElasticSearch I installed using Homebrew
You will have to use nGram tokenizer, in order to create a partial text search. A very good example of how to do this can be found here. That said, I would be very careful with nGram, as it can often turn up unrelated results.
This is because the substring "mon" is contained within all of the strings: "monkey", "money", and "monday". All of which are unrelated.
Alternatively (What I would do.)
You could try making it a fuzzy search. However, the max distance with fuzzy search is only two, which still doesn't return anything in your example. However, it tends to return relevant results.
The example I found: How to use Fuzzy Search
# Perform a fuzzy search!
POST /fuzzy_products/product/_search
{
"query": {
"match": {
"name": {
"query": "Vacuummm",
"fuzziness": 2,
"prefix_length": 1
}
}
}
}

Elasticsearch and Rails: Using ngram to search for part of a word

I am trying to use the Elasticsearch-Gem in my project. As I understand: By now there is no need for the Tire-Gem anymore, or am I wrong?
In my project I have a search (obivously), which currently applies to one model. Now I am trying to avoid wildcards, since they don't scale well, but I can't seem to get the ngram-Analyzers work properly. If I search for whole words, the search still works, but not for parts of it.
class Pictures < ActiveRecord::Base
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
settings :analysis => {
:analyzer => {
:my_index_analyzer => {
:tokenizer => "keyword",
:filter => ["lowercase", "substring"]
},
:my_search_analyzer => {
:tokenizer => "keyword",
:filter => ["lowercase", "substring"]
}
},
:filter => {
:substring => {
:type => "nGram",
:min_gram => 2,
:max_gram => 50
}
}
} do
mapping do
indexes :title,
:properties => {
:type => "string",
:index_analyzer => 'my_index_analyzer',
:search_analyzer => "my_search_analyzer"
}
Maybe somebody can give me a hint into the right direction.
I have given up on defining schema in the model class. In fact, it does not make much sense too.
So here is what I have done. A schema/mapping definition the db/ folder and a rake task to build it.
https://gist.github.com/geordee/9313f4867d61ce340a08
In the model
def as_indexed_json(options={})
self.as_json(only: [:id, :name, :description, :price])
end
I'm using an index for suggestions based on edgeNGram (like nGram, but always starting at the left side of the word) with this settings:
{
"en_suggestions": {
"settings": {
"index": {
"analysis": {
"filter": {
"tpNGramFilter": {
"min_gram": "4",
"type": "edgeNGram",
"max_gram": "50"
}
},
"analyzer": {
"tpNGramAnalyzer": {
"type": "custom",
"filter": [
"tpNGramFilter"
],
"tokenizer": "lowercase"
}
}
}
}
}
}
}
and this mapping:
{
"en_suggestions": {
"mappings": {
"suggest": {
"properties": {
"proposal": {
"type": "string",
"analyzer": "tpNGramAnalyzer"
}
}
}
}
}
}

Elasticsearch doesn't apply the NOT filter

I've been knocking my head against a wall with Elasticsearch today, trying to fix a failing test case.
I am using Rails 3.2.14, Ruby 1.9.3, the Tire gem and ElasticSearch 0.90.2
The objective is to have the query return matching results EXCLUDING the item where
vid == "ABC123xyz"
The Ruby code in the Video model looks like this:
def related_videos(count)
Video.search load: true do
size(count)
filter :term, :category_id => self.category_id
filter :term, :live => true
filter :term, :public => true
filter :not, {:term => {:vid => self.vid}}
query do
boolean do
should { text(:_all, self.title, boost: 2) }
should { text(:_all, self.description) }
should { terms(:tags, self.tag_list, minimum_match: 1) }
end
end
end
end
The resulting search query generated by Tire looks like this:
{
"query":{
"bool":{
"should":[
{
"text":{
"_all":{
"query":"Top Gun","boost":2
}
}
},
{
"text":{
"_all":{
"query":"The macho students of an elite US Flying school for advanced fighter pilots compete to be best in the class, and one romances the teacher."
}
}
},
{
"terms":{
"tags":["top-gun","80s"],
"minimum_match":1
}
}
]
}
},
"filter":{
"and":[
{
"term":{
"category_id":1
}
},
{
"term":{
"live":true
}
},
{
"term":{
"public":true
}
},
{
"not":{
"term":{
"vid":"ABC123xyz"
}
}
}
]
},
"size":10
}
The resulting JSON from ElasticSearch:
{
"took": 7,
"timed_out": false,
"_shards":{
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total":1,
"max_score":0.2667512,
"hits":[
{
"_index":"test_videos",
"_type":"video",
"_id":"8",
"_score":0.2667512,
"_source":{
"vid":"ABC123xyz",
"title":"Top Gun",
"description":"The macho students of an elite US Flying school for advanced fighter pilots compete to be best in the class, and one romances the teacher.",
"tags":["top-gun","80s"],
"category_id":1,
"live":true,
"public":true,
"featured":false,
"last_video_view_count":0,
"boost_factor":0.583013698630137,
"created_at":"2013-08-28T14:24:47Z"
}
}
]
}
}
Could somebody help! The docs for Elasticsearch are sparse around this topic and I'm running out of ideas.
Thanks
Using a top-level filter the way you are doesn't filter the results of your query - it just filters results out of things like facet counts. There's a fuller description in the elasticsearch documentation for filter.
You need to do a filtered query which is slightly different and filters the results of your query clauses:
Video.search load: true do
query do
filtered do
boolean do
should { text(:_all, self.title, boost: 2) }
should { text(:_all, self.description) }
should { terms(:tags, self.tag_list, minimum_match: 1) }
end
filter :term, :category_id => self.category_id
filter :term, :live => true
filter :term, :public => true
filter :not, {:term => {:vid => self.vid}}
end
end
end

EdgeNGram with Tire and ElasticSearch

If I have two strings:
Doe, Joe
Doe, Jonathan
I want to implement a search such that:
"Doe" > "Doe, Joe", "Doe, Jonathan"
"Doe J" > "Doe, Joe", "Doe, Jonathan"
"Jon Doe" > "Doe, Jonathan"
"Jona Do" > "Doe, Jonathan"
Here's the code that I have:
settings analysis: {
filter: {
nameNGram: {
type: "edgeNGram",
min_gram: 1,
max_gram: 20,
}
},
tokenizer: {
non_word: {
type: "pattern",
pattern: "[^\\w]+"
}
},
analyzer: {
name_analyzer: {
type: "custom",
tokenizer: "non_word",
filter: ["lowercase", "nameNGram"]
},
}
} do
mapping do
indexes :name, type: "multi_field", fields: {
analyzed: { type: "string", index: :analyzed, index_analyzer: "name_analyzer" }, # for indexing
unanalyzed: { type: "string", index: :not_analyzed, :include_in_all => false } # for sorting
}
end
end
def self.search(params)
tire.search(:page => params[:page], :per_page => 20) do
query do
string "name.analyzed:" + params[:query], default_operator: "AND"
end
sort do
by "name.unanalyzed", "asc"
end
end
end
Unfortunately, this doesn't appear to be working... The tokenizing looks great, for "Doe, Jonathan" I get something like "d", "do", "doe", "j", "jo", "jon", "jona" etc. but if I search for "do AND jo", I get back nothing. If I, however, search for "jona", I get back "Doe, Jonathan." What am I doing wrong?
You should likely only be using EdgeNGram if you want to create an autocomplete. I suspect that you want to use a pattern filter to separate words my commas.
Something like this:
"tokenizer": {
"comma_pattern_token": {
"type": "pattern",
"pattern": ",",
"group": -1
}
}
If I am mistaken and you need edgeNGrams for some other reason then your problem is that your index analyzer is ignoring stop words (such as the word AND) and your search analyzer is not. You need to create a custom analyzer for your search_analyzer that does not include the stop word filter.

Resources