Rails, Tire, Elasticsearch: how to use synonyms? - ruby-on-rails

I have no idea how to use synonyms/plural with Elasticsearch through Tire gem. Have I a synonyms file to download (an english one in enough)? Something to setup in ES regardless I use Tire or not?
class Story < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
attr_accessible :author, :content, :title
mapping do
indexes :id, :index => :not_analyzed
indexes :author, :analyzer => 'keyword'
indexes :title, :analyzer => 'snowball'
indexes :content, :analyzer => 'snowball'
end
end
class StoriesController < ApplicationController
def index
if params[:q].present?
p = params
#stories = Story.search(per_page: 30, page: params[:page], load: true) do
query { string p[:q], default_operator: 'AND' }
end
end
end
end
I found nothing in documentation...
Thanks!

i guess you mean the synonym-tokenfilter of elasticsearch: http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter/
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
afaik in tire, this would go in the settings configuration:
settings :analysis => {
:filter => {
:synonym => {
"type" => "synonym",
"synonyms_path" => Rails.root.join("config/analysis/synonym.txt").to_s
}
},
:analyzer => {
:synonym => {
"tokenizer" => "lowercase",
"filter" => ["synonym"],
"type" => "custom" }
}
} do
mapping { indexes :the_field, :type => 'string', :analyzer => "synonym" }

Related

rails - Elasticsearch completion suggester and search API

I'm using the search API, and now need to add the completion suggester, I'm using elasticsearch-rails gem.
When I search for an article, everything works
http://localhost:9200/articles/_search
"query": {
"multi_match": {
"query": "test",
"fields": [
"title", "tags", "content"
]
}
}
}
But since I've implemented the completion suggester I had to edit as_indexed_json to make it work, but now the search API doesn't work anymore, only the suggestions.
Here is my Article model:
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['title', 'content', 'tags']
}
}
})
end
def self.suggest(query)
Article.__elasticsearch__.client.suggest(:index => Article.index_name, :body => {
:suggestions => {
:text => query,
:completion => {
:field => 'suggest'
}
}
})
end
def as_indexed_json(options={})
{
:name => self.title,
:suggest => {
:input => self.title,
:output => self.title,
:payload => {
:content => self.content,
:tags => self.tags,
:title => self.title
}
}
}.as_json
end
Is it possible to have _search and _suggest working together with the same model ?
I'm just digging into elasticsearch, but, as far as i understand, you can add what you had before modifying in the serializer function and recreate indices, they will live together well in the db. For example:
def as_indexed_json(options={})
{
:name => self.title,
:suggest => {
:input => self.title,
:output => self.title,
:payload => {
:content => self.content,
:tags => self.tags,
:title => self.title
}
}
}.as_json.merge(self.as_json) # or the customized hash you used
To avoid indices redundancy you can look at aliases and routing.

Elasticsearch: Tire to Elasticsearch Persistence migration

I would like to migrate from Tire (retire) gem to Elasticsearch Persistence gem, in Tire I used to set the index settings from inside the model as shown below
settings :number_of_shards => 5,
:number_of_replicas => 1,
:analysis => {
:analyzer => {
:my_pattern => {
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["url_ngram", "lowercase"]
}
}, :filter => {
:url_stop => {
:type => "stop",
:stopwords => ["="]
},
:url_ngram => {
:type => "nGram",
:min_gram => 4,
:max_gram => 40
}
}
} do
mapping {
indexes :msgpriority, :type => 'string', :analyzer => 'snowball'
indexes :msghostname, :type => 'string', :analyzer => 'snowball'
indexes :msgtext, :type => 'string', :analyzer => 'my_pattern'
indexes :msgdatetime, :type => 'date', :include_in_all => false
}
end
Now I'm using the Repository object and I want to apply the same settings (mainly analyzer)
The code below doesn't work, even when I change the number of shards as if I wrote nothing
REPOSITORY = Elasticsearch::Persistence::Repository.new do
# Configure the Elasticsearch client
client Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'], log: true
now_time = Time.now
# Set a custom index name
index "ip_logstreams_#{now_time.year}_#{now_time.month}_#{now_time.day}"
# Set a custom document type
type :log_entry
# Specify the class to inicialize when deserializing documents
klass LogEntry
# Configure the settings and mappings for the Elasticsearch index
settings number_of_shards: 2, :analysis => {
:analyzer => {
:my_pattern => {
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["url_ngram", "lowercase"]
}
}, :filter => {
:url_stop => {
:type => "stop",
:stopwords => ["="]
},
:url_ngram => {
:type => "nGram",
:min_gram => 4,
:max_gram => 40
}
}
} do
mapping {
indexes :msgpriority, :type => 'string', :analyzer => 'snowball'
indexes :msghostname, :type => 'string', :analyzer => 'snowball'
indexes :msgtext, :type => 'string', :analyzer => 'my_pattern'
indexes :msgdatetime, :type => 'date', :include_in_all => false
}
end
end
UPDATE:
When I issue
REPOSITORY.create_index! force: true
changes are applied, but I think the settings in elasticsearch are messed up as shown in screenshot (grabbed from head plugin)
Have you considered just using elasticsearch/elasticsearch-model - it provides automatic callbacks that suppose to help you persist data around.
When using the repository object in elasticsearch gem we should issue
REPOSITORY.create_index!
this will create the index with supplied settings, you can add the force: true if you want to re-create the index again

ElasticSearch : Combining query string with term filters

I'm using tire and elasticsearch in my rails project, which is a retail site for car parts. ES is powering a faceted search page used for browsing the parts catalog. My question is: How can I make the term filters return only exact matches while searching the same fields with a query string, making use of analyzers only on the query string?
I hope that makes sense. I will try to provide an example:
The model/index in question is called Parts. Assume that a part has mappings called categories and sub_categories. If a user selects the Brake category and the Brake Caliper Carrier subcategory (creating term filters) I have to make sure that parts from the Brake Caliper subcategory are not also returned – it is a separate subcategory. I do, however, want the user to be able to simply enter something like "brakes" into the search field (creating a query_string) and get results from products within all of those categories.
Here is the relevant code from the Part model:
def to_indexed_json
fits = fitments.try(:map) do |fit|
{
make: fit.try(:make).try(:name),
make_id: fit.try(:make).try(:id),
model: fit.try(:model).try(:name),
model_id: fit.try(:model).try(:id),
year: fit.year,
sub_model: fit.sub_model
}
end
{
id: id,
name: name,
description: description,
fitments: fits,
categories: root_categories,
sub_categories: sub_categories,
price: price,
condition_id: condition_id,
country_of_origin: country_of_origin,
brand: brand,
oem: oem,
thumb_url: part_images.first.try(:image).try(:thumb).try(:url),
city: user.try(:city),
inventory: inventory,
part_number: part_number,
user: user.try(:public_name)
}.to_json
end
mapping do
indexes :id, type: 'integer'
indexes :name, analyzer: 'snowball', boost: 40
indexes :description, analyzer: 'snowball', boost: 12
indexes :price, type: "integer"
indexes :country_of_origin, index: :not_analyzed
indexes :condition_id, type: "integer"
indexes :brand, index: :not_analyzed
indexes :oem, type: "boolean"
indexes :city, index: :not_analyzed
indexes :inventory, type: "integer"
indexes :part_number, index: :not_analyzed
indexes :user, index: :not_analyzed
indexes :thumb_url, index: :not_analyzed
indexes :fitments do
indexes :make
indexes :make_id, type: "integer" #, index: 'not_analyzed'
indexes :model
indexes :model_id, type: "integer" #, index: 'not_analyzed'
indexes :year, type: "integer"
indexes :sub_model
end
indexes :categories do
indexes :name, index: :not_analyzed
indexes :id, type: "integer"
end
indexes :sub_categories do
indexes :name, index: :not_analyzed
indexes :id, type: "integer"
end
end
def search(params={})
query_filters = []
tire.search(:page => params[:page], :per_page => 20) do
query_filters << { :term => { 'fitments.make_id' => params[:make] }} if params[:make].present?
query_filters << { :term => { 'fitments.model_id' => params[:model] }} if params[:model].present?
query_filters << { :term => { 'categories.name' => params[:category] }} if params[:category].present?
query_filters << { :term => { 'sub_categories.name' => params[:sub_category] }} if params[:sub_category].present?
query_filters << { :term => { 'city' => params[:city] }} if params[:city].present?
query_filters << { :term => { 'condition_id' => params[:condition] }} if params[:condition].present?
query_filters << { :term => { 'brand' => params[:brand] }} if params[:brand].present?
query_filters << { :term => { 'oem' => params[:oem] }} if params[:oem].present?
query do
filtered do
query {
if params[:query].present?
string params[:query]
else
all
end
}
filter :and, query_filters unless query_filters.empty?
end
end
facet("categories") { terms 'categories.name', size: 50 } unless params[:category].present?
facet("cities") { terms 'city', size: 50 } unless params[:city].present?
if params[:category].present? && !params[:sub_category].present?
facet("sub_categories") { terms 'sub_categories.name', size: 50 }
end
facet("condition_id") { terms 'condition_id', size: 50 } unless params[:condition].present?
facet("brand") { terms 'brand', size: 50 } unless params[:brand].present?
facet("oem") { terms 'oem', size: 2 } unless params[:oem].present?
size params[:size] if params[:size]
end
end
You have to use the multi_field feature of Elasticsearch and filter on the non-analyzed fields; see eg. Why multi-field mapping is not working with tire gem for elasticsearch?

Fuzzy String Matching with Rails (Tire) and ElasticSearch

I have a Rails application that is now set up with ElasticSearch and the Tire gem to do searching on a model and I was wondering how I should set up my application to do fuzzy string matching on certain indexes in the model. I have my model set up to index on things like title, description, etc. but I want to do fuzzy string matching on some of those and I'm not sure where to do this at. I will include my code below if you would like to comment! Thanks!
In the controller:
def search
#resource = Resource.search(params[:q], :page => (params[:page] || 1),
:per_page =>15, load: true )
end
In the Model:
class Resource < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
belongs_to :user
has_many :resource_views, :class_name => 'UserResourceView'
has_reputation :votes, source: :user, aggregated_by: :sum
attr_accessible :title, :description, :link, :tag_list, :user_id, :youtubeID
acts_as_taggable
mapping do
indexes :id, :index => :not_analyzed
indexes :title, :analyzer => 'snowball', :boost => 40
indexes :tag_list, :analyzer => 'snowball', :boost => 8
indexes :description, :analyzer => 'snowball', :boost => 2
indexes :user_id, :analyzer => 'snowball'
end
end
Try creating custom analyzers to achieve other stemming features, etc.
Check out my example (this example also uses Mongoid & attachments, don't look at it if you don't need it):
class Document
include Mongoid::Document
include Mongoid::Timestamps
include Tire::Model::Search
include Tire::Model::Callbacks
field :filename, type: String
field :md5, type: String
field :tags, type: String
field :size, type: String
index({md5: 1}, {unique: true})
validates_uniqueness_of :md5
DEFAULT_PAGE_SIZE = 10
settings :analysis => {
:filter => {
:ngram_filter => {
:type => "edgeNGram",
:min_gram => 2,
:max_gram => 12
},
:custom_word_delimiter => {
:type => "word_delimiter",
:preserve_original => "true",
:catenate_all => "true",
}
}, :analyzer => {
:index_ngram_analyzer => {
:type => "custom",
:tokenizer => "standard",
:filter => ["lowercase", "ngram_filter", "asciifolding", "custom_word_delimiter"]
},
:search_ngram_analyzer => {
:type => "custom",
:tokenizer => "standard",
:filter => ["standard", "lowercase", "ngram_filter", "custom_word_delimiter"]
},
:suggestions => {
:tokenizer => "standard",
:filter => ["suggestions_shingle"]
}
}
} do
mapping {
indexes :id, index: :not_analyzed
indexes :filename, :type => 'string', :store => 'yes', :boost => 100, :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
indexes :tags, :type => 'string', :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
indexes :attachment, :type => 'attachment',
:fields => {
:content_type => {:store => 'yes'},
:author => {:store => 'yes', :analyzer => 'keyword'},
:title => {:store => 'yes'},
:attachment => {:term_vector => 'with_positions_offsets', :boost => 90, :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer},
:date => {:store => 'yes'}
}
}
end
def to_indexed_json
self.to_json(:methods => [:attachment])
end
def attachment
path_to_file = "#{Rails.application.config.document_library}#{path}/#{filename}"
Base64.encode64(open(path_to_file) { |file| file.read })
end
def self.search(query, options)
tire.search do
query { string "#{query}", :default_operator => :AND, :default_field => 'attachment', :fields => ['filename', 'attachment', 'tags'] }
highlight :attachment
page = (options[:page] || 1).to_i
search_size = options[:per_page] || DEFAULT_PAGE_SIZE
from (page -1) * search_size
size search_size
sort { by :_score, :desc }
if (options[:facet])
filter :terms, :tags => [options[:facet]]
facet 'global-tags', :global => true do
terms :tags
end
facet 'current-tags' do
terms :tags
end
end
end
end
end
Hope it helps,

ElasticSearch Scoring (Tire gem)

I want ElasticSearch (Tire gem to be specific) to return the result based on the number of times a keyword appears in the fields. For example, I index the field title in a model called Article. I have two objects, the first object has the title value 'Funny Funny subject' while the second object has the title value 'Funny subject'. I want to index in such a way that if I search for the keyword 'Funny', the first object will return first since it has two 'Funny' words appearing in the title. Is it possible to do this via Tire? What is the indexing method called as well?
Here a working sample, the key factor here is the boostvalue that has to be high enough and you can't use wildcharts in the query.
require 'tire'
require 'yajl/json_gem'
articles = [
{ :id => '0', :type => 'article', :title => 'nothing funny'},
{ :id => '1', :type => 'article', :title => 'funny'},
{ :id => '2', :type => 'article', :title => 'funny funny funny'}
]
Tire.index 'articles' do
import articles
end
Tire.index 'articles' do
delete
create :mappings => {
:article => {
:properties => {
:id => { :type => 'string', :index => 'not_analyzed', :include_in_all => false },
:title => { :type => 'string', :boost => 50.0, :analyzer => 'snowball' },
:tags => { :type => 'string', :analyzer => 'keyword' },
:content => { :type => 'string', :analyzer => 'snowball' }
}
}
}
import articles do |documents|
documents.map { |document| document.update(:title => document[:title].downcase) }
end
refresh
end
s = Tire.search('articles') do
query do
string "title:funny"
end
end
s.results.each do |document|
puts "* id:#{ document.id } #{ document.title } score: #{document._score}"
end
gives
* id:2 funny funny funny score: 14.881571
* id:1 funny score: 14.728935
* id:0 nothing funny score: 9.81929

Resources