Fuzzy String Matching with Rails (Tire) and ElasticSearch - ruby-on-rails

I have a Rails application that is now set up with ElasticSearch and the Tire gem to do searching on a model and I was wondering how I should set up my application to do fuzzy string matching on certain indexes in the model. I have my model set up to index on things like title, description, etc. but I want to do fuzzy string matching on some of those and I'm not sure where to do this at. I will include my code below if you would like to comment! Thanks!
In the controller:
def search
#resource = Resource.search(params[:q], :page => (params[:page] || 1),
:per_page =>15, load: true )
end
In the Model:
class Resource < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
belongs_to :user
has_many :resource_views, :class_name => 'UserResourceView'
has_reputation :votes, source: :user, aggregated_by: :sum
attr_accessible :title, :description, :link, :tag_list, :user_id, :youtubeID
acts_as_taggable
mapping do
indexes :id, :index => :not_analyzed
indexes :title, :analyzer => 'snowball', :boost => 40
indexes :tag_list, :analyzer => 'snowball', :boost => 8
indexes :description, :analyzer => 'snowball', :boost => 2
indexes :user_id, :analyzer => 'snowball'
end
end

Try creating custom analyzers to achieve other stemming features, etc.
Check out my example (this example also uses Mongoid & attachments, don't look at it if you don't need it):
class Document
include Mongoid::Document
include Mongoid::Timestamps
include Tire::Model::Search
include Tire::Model::Callbacks
field :filename, type: String
field :md5, type: String
field :tags, type: String
field :size, type: String
index({md5: 1}, {unique: true})
validates_uniqueness_of :md5
DEFAULT_PAGE_SIZE = 10
settings :analysis => {
:filter => {
:ngram_filter => {
:type => "edgeNGram",
:min_gram => 2,
:max_gram => 12
},
:custom_word_delimiter => {
:type => "word_delimiter",
:preserve_original => "true",
:catenate_all => "true",
}
}, :analyzer => {
:index_ngram_analyzer => {
:type => "custom",
:tokenizer => "standard",
:filter => ["lowercase", "ngram_filter", "asciifolding", "custom_word_delimiter"]
},
:search_ngram_analyzer => {
:type => "custom",
:tokenizer => "standard",
:filter => ["standard", "lowercase", "ngram_filter", "custom_word_delimiter"]
},
:suggestions => {
:tokenizer => "standard",
:filter => ["suggestions_shingle"]
}
}
} do
mapping {
indexes :id, index: :not_analyzed
indexes :filename, :type => 'string', :store => 'yes', :boost => 100, :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
indexes :tags, :type => 'string', :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
indexes :attachment, :type => 'attachment',
:fields => {
:content_type => {:store => 'yes'},
:author => {:store => 'yes', :analyzer => 'keyword'},
:title => {:store => 'yes'},
:attachment => {:term_vector => 'with_positions_offsets', :boost => 90, :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer},
:date => {:store => 'yes'}
}
}
end
def to_indexed_json
self.to_json(:methods => [:attachment])
end
def attachment
path_to_file = "#{Rails.application.config.document_library}#{path}/#{filename}"
Base64.encode64(open(path_to_file) { |file| file.read })
end
def self.search(query, options)
tire.search do
query { string "#{query}", :default_operator => :AND, :default_field => 'attachment', :fields => ['filename', 'attachment', 'tags'] }
highlight :attachment
page = (options[:page] || 1).to_i
search_size = options[:per_page] || DEFAULT_PAGE_SIZE
from (page -1) * search_size
size search_size
sort { by :_score, :desc }
if (options[:facet])
filter :terms, :tags => [options[:facet]]
facet 'global-tags', :global => true do
terms :tags
end
facet 'current-tags' do
terms :tags
end
end
end
end
end
Hope it helps,

Related

Return time difference in method as JSON

I return all information in a scream as JSON.
I want to return how long ago it was created.
include ActionView::Helpers::DateHelper
def as_json(options={})
super(:only => [:id, :yell_type, :status, :payment_type],
:include => {
:trade_offer => {:only => [:id, :title, :description, :price],
:include => [:photos => {:only => [:id, :url]}]
},
:categories => {:only => [:id, :name]},
:user => {:only => [:id, :name, :avatar]}
},
:methods => [ times_ago(:create_at) ]
)
end
def times_ago(create_at)
time_ago_in_words(create_at)
end
This returns an error:
comparison of Symbol with Time failed
How should I do that?
You can add methods on the same level as include and only. So the return value of the method will be passed in the JSON too. In this case, you should implement a method times_ago in the model that returns what you want.
def as_json(options={})
super(
:only => [:id, :yell_type, :status, :payment_type],
:include => {
:trade_offer => {:only => [:id, :title, :description, :price],
:include => [:photos => {:only => [:id, :url]}]
},
:categories => {:only => [:id, :name]},
:user => {:only => [:id, :name, :avatar]}
},
:methods: [ :times_ago ]
)
end

Elasticsearch: Tire to Elasticsearch Persistence migration

I would like to migrate from Tire (retire) gem to Elasticsearch Persistence gem, in Tire I used to set the index settings from inside the model as shown below
settings :number_of_shards => 5,
:number_of_replicas => 1,
:analysis => {
:analyzer => {
:my_pattern => {
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["url_ngram", "lowercase"]
}
}, :filter => {
:url_stop => {
:type => "stop",
:stopwords => ["="]
},
:url_ngram => {
:type => "nGram",
:min_gram => 4,
:max_gram => 40
}
}
} do
mapping {
indexes :msgpriority, :type => 'string', :analyzer => 'snowball'
indexes :msghostname, :type => 'string', :analyzer => 'snowball'
indexes :msgtext, :type => 'string', :analyzer => 'my_pattern'
indexes :msgdatetime, :type => 'date', :include_in_all => false
}
end
Now I'm using the Repository object and I want to apply the same settings (mainly analyzer)
The code below doesn't work, even when I change the number of shards as if I wrote nothing
REPOSITORY = Elasticsearch::Persistence::Repository.new do
# Configure the Elasticsearch client
client Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'], log: true
now_time = Time.now
# Set a custom index name
index "ip_logstreams_#{now_time.year}_#{now_time.month}_#{now_time.day}"
# Set a custom document type
type :log_entry
# Specify the class to inicialize when deserializing documents
klass LogEntry
# Configure the settings and mappings for the Elasticsearch index
settings number_of_shards: 2, :analysis => {
:analyzer => {
:my_pattern => {
"type" => "custom",
"tokenizer" => "keyword",
"filter" => ["url_ngram", "lowercase"]
}
}, :filter => {
:url_stop => {
:type => "stop",
:stopwords => ["="]
},
:url_ngram => {
:type => "nGram",
:min_gram => 4,
:max_gram => 40
}
}
} do
mapping {
indexes :msgpriority, :type => 'string', :analyzer => 'snowball'
indexes :msghostname, :type => 'string', :analyzer => 'snowball'
indexes :msgtext, :type => 'string', :analyzer => 'my_pattern'
indexes :msgdatetime, :type => 'date', :include_in_all => false
}
end
end
UPDATE:
When I issue
REPOSITORY.create_index! force: true
changes are applied, but I think the settings in elasticsearch are messed up as shown in screenshot (grabbed from head plugin)
Have you considered just using elasticsearch/elasticsearch-model - it provides automatic callbacks that suppose to help you persist data around.
When using the repository object in elasticsearch gem we should issue
REPOSITORY.create_index!
this will create the index with supplied settings, you can add the force: true if you want to re-create the index again

uninitialized constant Gmaps4rails::ActsAsGmappable

I've setup everything according to README and here's my model
class Building
include Gmaps4rails::ActsAsGmappable
include Mongoid::Document
include Geocoder::Model::Mongoid
acts_as_gmappable :lat => 'location[0]', :lng => 'location[1]',
:address => "address", :normalized_address => "full_address",
:msg => "Sorry, not even Google could figure out where that is"
field :gmaps, :type => Boolean
field :address, :type => String, :default => ""
field :city, :type => String, :default => ""
field :province, :type => String, :default => ""
field :country, :type => String, :default => ""
field :postal_code, :type => Integer
field :location, :type => Array, spacial: {lat: :latitude, lng: :longitude, return_array: true }
## Building index
index({location: "2d"})
def full_address
"#{address}, #{city}, #{province}, #{country}, #{postal_code}"
end
def gmaps4rails_address
full_address
end
end
the controller
#hash = Gmaps4rails.build_markers(#building) do |building, marker|
marker.lat building.location[0]
marker.lng building.location[1]
end
and the view
= gmaps4rails( "markers" => { "data" => #hash.to_json, "options" => { "draggable" => true }})
when I access the control, gives me "uninitialized constant Gmaps4rails::ActsAsGmappable"
There is not a ActsAsGmappable module/class defined in the gem, hence the error.
It seems that it was removed in newer versions of the gem. Try removing that line and see if everything works.

Rails, Tire, Elasticsearch: how to use synonyms?

I have no idea how to use synonyms/plural with Elasticsearch through Tire gem. Have I a synonyms file to download (an english one in enough)? Something to setup in ES regardless I use Tire or not?
class Story < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
attr_accessible :author, :content, :title
mapping do
indexes :id, :index => :not_analyzed
indexes :author, :analyzer => 'keyword'
indexes :title, :analyzer => 'snowball'
indexes :content, :analyzer => 'snowball'
end
end
class StoriesController < ApplicationController
def index
if params[:q].present?
p = params
#stories = Story.search(per_page: 30, page: params[:page], load: true) do
query { string p[:q], default_operator: 'AND' }
end
end
end
end
I found nothing in documentation...
Thanks!
i guess you mean the synonym-tokenfilter of elasticsearch: http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter/
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
afaik in tire, this would go in the settings configuration:
settings :analysis => {
:filter => {
:synonym => {
"type" => "synonym",
"synonyms_path" => Rails.root.join("config/analysis/synonym.txt").to_s
}
},
:analyzer => {
:synonym => {
"tokenizer" => "lowercase",
"filter" => ["synonym"],
"type" => "custom" }
}
} do
mapping { indexes :the_field, :type => 'string', :analyzer => "synonym" }

ElasticSearch Scoring (Tire gem)

I want ElasticSearch (Tire gem to be specific) to return the result based on the number of times a keyword appears in the fields. For example, I index the field title in a model called Article. I have two objects, the first object has the title value 'Funny Funny subject' while the second object has the title value 'Funny subject'. I want to index in such a way that if I search for the keyword 'Funny', the first object will return first since it has two 'Funny' words appearing in the title. Is it possible to do this via Tire? What is the indexing method called as well?
Here a working sample, the key factor here is the boostvalue that has to be high enough and you can't use wildcharts in the query.
require 'tire'
require 'yajl/json_gem'
articles = [
{ :id => '0', :type => 'article', :title => 'nothing funny'},
{ :id => '1', :type => 'article', :title => 'funny'},
{ :id => '2', :type => 'article', :title => 'funny funny funny'}
]
Tire.index 'articles' do
import articles
end
Tire.index 'articles' do
delete
create :mappings => {
:article => {
:properties => {
:id => { :type => 'string', :index => 'not_analyzed', :include_in_all => false },
:title => { :type => 'string', :boost => 50.0, :analyzer => 'snowball' },
:tags => { :type => 'string', :analyzer => 'keyword' },
:content => { :type => 'string', :analyzer => 'snowball' }
}
}
}
import articles do |documents|
documents.map { |document| document.update(:title => document[:title].downcase) }
end
refresh
end
s = Tire.search('articles') do
query do
string "title:funny"
end
end
s.results.each do |document|
puts "* id:#{ document.id } #{ document.title } score: #{document._score}"
end
gives
* id:2 funny funny funny score: 14.881571
* id:1 funny score: 14.728935
* id:0 nothing funny score: 9.81929

Resources