Elasticsearch, tire and autocomplete - ruby-on-rails

Good day. I have elasticsearch in my rails app using tire.
I have many names in my db. And I want to search for them like search_query: "alex ivan", and the output should be ["Alexander Ivanov", "Alex Ivanenko] etc. (Real names from db)
I tried to make it with this article but it's not searching. So I've made a quickhack:
params[:search_query] = params[:search_query].split(" ").map{|a|a<<("*")}.join(" ")
Is it a good decision or I can do it with analyzers etc. ?

Here's what I did using analyzers for doing a search on names of businesses when I used ElasticSearch. Place this inside your mapping block and modify the index appropriately -- I think this will give you what you want:
indexes :name, :type => 'multi_field', :fields => {
:name => { :type => 'string', :analyzer => 'standard' },
:"name.exact" => { :type => 'string', :index => :not_analyzed }
}
Then inside your search and query blocks, something like:
search do
query do
# either a must match for exact match
boolean(:minimum_number_should_match => 1) do
must { string "name:#{<variable>}" }
end
# or a broader match
string "name:#{<variable>}*"
end
end

Related

Can't access data in ActiveHash

I'm using the Gem active_hash https://github.com/zilkey/active_hash to create models for simple data that I don't want to create DB tables for.
For example, I have this model setup for FieldTypes:
class FieldType < ActiveHash::Base
self.data = [
{:id => 1, :name => "text", :friendly_name => "Text"},
{:id => 2, :name => "textarea", :friendly_ => "Text Area"},
{:id => 3, :name => "image", :friendly_ => "Image"},
]
end
And I'm trying to list these field types for a select:
def field_types_for_select
#FieldType.all.order('name asc').collect { |t| [t.friendly_name, t.name] }
FieldType.pluck(:friendly_name, :name)
end
But I get an error that order, collect or pluck are not defined.
How do I access this data? This works fine on other models, just not ActiveHash ones. According to the docs the model should work the same as ActiveRecord but I don't seem to be able to access it the same. FieldType.all works, but other methods do not.
Pluck isn't defined on ActiveHash::Base. It is defined on ActiveRecord::Relation::Calculations, and it's purpose is to produce a SQL select for the columns you specify. You will not be able to get it to work with ActiveHash.
You can, however, define your own pluck on your FieldType model.
def self.pluck(*columns)
data.map { |row| row.values_at(*columns) }
end
Or query the data directly:
FiledType.data.map { |row| row.values_at(:friendly_name, :name) }

How to make fields on my model not searchable but they should still be available in the _source?

I am using the tire gem for ElasticSearch in Rails.
Ok so I have been battling with this the whole day and this is how far I have got. I would like to make fields on my model not searchable but they should still be available in the _source so I can use them for sorting on the search result.
My mappings:
mapping do
indexes :created_at, :type => 'date', :index => :not_analyzed
indexes :vote_score, :type => 'integer', :index => :not_analyzed
indexes :title
indexes :description
indexes :tags
indexes :answers do
indexes :description
end
end
My to_indexed_json method:
def to_indexed_json
{
vote_score: vote_score,
created_at: created_at,
title: title,
description: description,
tags: tags,
answers: answers.map{|answer| answer.description}
}.to_json
end
My Search query:
def self.search(term='', order_by, page: 1)
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? string(term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
end
The only issue I am battling with now is how do I make vote_score and created_at field not searchable but still manage to use them for sorting when I'm searching.
I tried indexes :created_at, :type => 'date', :index => :no but that did not work.
If I understand you, you are not specifying a field when you send your search query to elasticsearch. This means it will be executed agains the _all field. This is a "special" field that makes elasticsearch a little easier to get using quickly. By default all fields are indexed twice, once in their own field, and once in the _all field. (You can even have different mappings/analyzers applied to these two indexings.)
I think setting the field's mappings to "include_in_all": "false" should work for you (remove the "index": "no" part). Now the field will be tokenized (and you can search with it) under it's fieldname, but when directing a search at the _all field it won't affect results (as none of it's tokens are stored in the _all field).
Have a read of the es docs on mappings, scroll down to the parameters for each type
Good luck!
I ended up going with the approach of only matching on the fields I want and that worked. This matches on multiple fields.
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? (match [:title, :description, :tags, :answers], term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end

ruby on rails: ElasticSearch / Tire dynamic search on multiple indices

I've done a bunch of searching and I haven't been able to get an answer to this question - hopefully this isn't a repeat (apologies if it is)...
Preface: I'm using Rails & Tire to perform ElasticSearch.
I have an object, Place, with attributes "name", "city", "state", and "zip". They are indexed as follows:
indexes :name, :type => 'multi_field', :fields => {
:name => { :type => 'string', :analyzer => 'snowball' },
:"name.exact" => { :type => 'string', :index => :not_analyzed }
}
indexes :city
indexes :state
indexes :zip
There are three conditions for searching: 1. Name only, 2. (City, State OR Zip), 3. Name AND (City, State OR Zip).
My code for the "query" block is:
if (City, State).present?
boolean do
must { string "name:#{name}*" } if name.present?
must { string "city:#{city_state}*" }
must { string "state:#{city_state}*" }
end
elsif (Zip).present?
boolean do
must { string "name:#{name}*" } if name.present?
must { string "zip:#{query_parameters["zip"]}*" }
end
else
string "name:#{name}*" }
end
The aforementioned search conditions #1 and #2 work as expected against multiple tests. However, condition 3 does not - it seems to only pay attention to the "name" field. I'm assuming it has something to do with using the "city_state" variable to search on both "city" and "state"... But I'm doing this because a user can enter either "Chicago" or "Illinois" in the City, State / Zip text box and the search should still work, using either the geographic center of Chicago or the geographic center of Illinois, respectively.
Anything obvious I'm doing wrong?
However, condition 3 does not - it seems to only pay attention to the "name" field
Errr, isn't
string "name:#{name}*"
telling it to do exactly that?
or did you mean to just do
string "#{name}"

Elastic Search nested

I'm using Elastic search through tire gem.
Given this structure to index my resource model
mapping do
indexes :_id
indexes :version, analyzer: 'snowball', boost: 100
indexes :resource_files do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
indexes :resource_file_category do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
end
end
end
How can i retrieve all the resources that have resource_files with a given resource_file_category id?
i've looked in the elastic search docs and i think could be using the has child filter
http://www.elasticsearch.org/guide/reference/query-dsl/has-child-filter.html
i've tried this way
filter :has_child, :type => 'resource_files', :query => {:filter => {:has_child => {:type => 'resource_file_category', :query => {:filter => {:term => {'_id' => params[:resource_file_category_id]}}}}}}
but i'm not sure if is possible/valid to make a "nested has_child filter" or if is there a better/simpler way to do this... any advice is welcome ;)
I'm afraid I don't know what your mapping definition means. It'd be easier to read if you just posted the output of:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/_mapping?pretty=1'
But you probably want something like this:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/YOUR_TYPE/_search?pretty=1' -d '
{
"query" : {
"term" : {
"resource_files.resource_file_catagory._id" : "YOUR VALUE"
}
}
}
'
Note: The _id fields should probably be mapped as {"index": "not_analyzed"} so that they don't get analyzed, but instead store the exact value. Otherwise if you do a term query for 'FOO BAR' the doc won't be found, because the actual terms that are stored are: ['foo','bar']
Note: The has_child query is used to search for parent docs who have child docs (ie docs which specify a parent type and ID) that match certain search criteria.
The dot operator can be used to access nested data.
You can try something like this:
curl -XGET 'http://loclahost:port/INDEX/TYPE/_search?pretty=1' -d
'{
"query": {
"match": {
"resource_files.resource_file_catagory.name": "VALUE"
}
}
}'
If resource_file_catagory is non_analyzed the value is not tokenized and stored as a single value, hence giving you an exact match.
You can also use elasticsearch-head plugin for data validation and also query building reference.
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-plugins.html or
https://mobz.github.io/elasticsearch-head/

ElasticSearch filter to match a single date

I've been working with elastic search for sometime now and I've hit a roadblock where I have to search for events that match a particular start date (start_at). I've indexed my fields as
mapping do
indexes :name, :type => 'string', :analyzer => 'snowball'
indexes :description, :type => 'string', :analyzer => 'snowball'
indexes :start_at, :type => 'date'
indexes :end_at, :type => 'date'
indexes :tag_list, :type => 'string', :analyzer => 'snowball'
indexes :lat_lon, :type => 'geo_point'
indexes :user_details, :type => 'string'
end
def to_indexed_json
to_hash.merge({
:user_details => (user ? user.to_index : nil),
:artist_details => (artists ? artists.each{|artist| artist.to_index }: nil),
:primary_genre => (genre ? genre.name : nil),
:lat_lon => [lat, lng].join(',')
}).to_json
end
So when i hit
Tire.search('events') do
# ignore search query keywords
filter range: {start_at: {gte: Date.today, lt: Date.tomorrow}}
end
Returns nothing but works great with single ranges. That is
Tire.search('events') do
# ignore search query keywords
filter range: {start_at: {gte: Date.today}}
end
I indexed Elasticsearch for events mappings to make start_at and end_at into dates or it would perform term matches on those but something like this would not be the answer
Tire.search('events') do
query do
string "start_at: #{Date.today}"
end
end
Since this performs a string match it results in all records because the tokenizer would break into 2012, 05, 16 and since 2012 and 16 may match in multiple areas so it would return all matches.
I know I'm missing something very basic. I would appreciate any help on this.
Update
Event.find_all_by_start_at(Date.tomorrow + 1.day).size
Event Load (0.7ms) SELECT `events`.* FROM `events` WHERE `events`.`start_at` = '2012-05-19'
=> 1
So I have events for that day. Now when I run it with elastic search
ruby-1.9.2-p180 :024 > Tire.search('events') do
ruby-1.9.2-p180 :025 > filter :range, :start_at => {gte: Date.tomorrow + 1.days, lt: Date.tomorrow + 2.days}
ruby-1.9.2-p180 :026?> end
ruby-1.9.2-p180 :029 > x.to_curl
=> "curl -X GET \"http://localhost:9200/events/_search?pretty=true\" -d '{\"filter\":{\"range\":{\"start_at\":{\"gte\":\"2012-05-19\",\"lt\":\"2012-05-20\"}}}}'"
{"events":{"event":{"properties":{"allow_comments":{"type":"boolean"},"artist_details":{"type":"string"},"artist_id":{"type":"long"},"city":{"type":"string"},"comments_count":{"type":"long"},"confirm":{"type":"boolean"},"created_at":{"type":"date","format":"dateOptionalTime"},"description":{"type":"string","analyzer":"snowball"},"end_at":{"type":"string"},"event_attendees_count":{"type":"long"},"event_content_type":{"type":"string"},"event_file_name":{"type":"string"},"event_file_size":{"type":"long"},"genre_id":{"type":"long"},"hits":{"type":"long"},"id":{"type":"long"},"interview":{"type":"boolean"},"lat":{"type":"double"},"lat_lon":{"type":"geo_point"},"lng":{"type":"double"},"location":{"type":"string"},"name":{"type":"string","analyzer":"snowball"},"online_tix":{"type":"boolean"},"primary_genre":{"type":"string"},"private":{"type":"boolean"},"start_at":{"type":"string"},"state":{"type":"string"},"tag_list":{"type":"string","analyzer":"snowball"},"updated_at":{"type":"date","format":"dateOptionalTime"},"user_details":{"type":"string"},"user_id":{"type":"long"},"venue_id":{"type":"long"},"zip":{"type":"string"}}}}}
Elasticsearch tries to be flexible in handing mappings. At the same time, it has to deal with limitations of underlying search engine - Lucene. As a result, when existing mapping contradicts the updated mapping, the new mapping is ignored. Another feature of elasticsearch that probably played a role in this issue is automatic mapping creation based on the data. So, if you
Created new index
Indexed a records with the field start_at with a string that contains a date in a format that elasticsearch didn't recognize
Updated mapping assigning type "date" to the start_at field
you ended up with the mapping where the field start_at has type "string". The only way around it is to delete the index and specify the mapping before adding the first record.
It does not seem you need to use a search query - but a filter. Try something like this:
filter(:range, date: {
to: params[:date],
from: params[:date]
}) if params[:date].present?
Where params[:date] should match the format:
>> Time.now.strftime('%F')
=> "2014-03-10"
and could be anything - both hardtyped or passed in as parameters.
Fields :start_at and :end_at should be mapped as :type => 'date' (just as you have now), no need to change to string or anything alike.
This approach works with mapping of a field of date, should be fine also for datetime as Tire/Elasticsearch doesn't seem to differ those two field types.
Bonus: you can find nice rails elasticsearch/tire production setup example here:
https://gist.github.com/psyxoz/4326881

Resources