I'm using Elastic search through tire gem.
Given this structure to index my resource model
mapping do
indexes :_id
indexes :version, analyzer: 'snowball', boost: 100
indexes :resource_files do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
indexes :resource_file_category do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
end
end
end
How can i retrieve all the resources that have resource_files with a given resource_file_category id?
i've looked in the elastic search docs and i think could be using the has child filter
http://www.elasticsearch.org/guide/reference/query-dsl/has-child-filter.html
i've tried this way
filter :has_child, :type => 'resource_files', :query => {:filter => {:has_child => {:type => 'resource_file_category', :query => {:filter => {:term => {'_id' => params[:resource_file_category_id]}}}}}}
but i'm not sure if is possible/valid to make a "nested has_child filter" or if is there a better/simpler way to do this... any advice is welcome ;)
I'm afraid I don't know what your mapping definition means. It'd be easier to read if you just posted the output of:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/_mapping?pretty=1'
But you probably want something like this:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/YOUR_TYPE/_search?pretty=1' -d '
{
"query" : {
"term" : {
"resource_files.resource_file_catagory._id" : "YOUR VALUE"
}
}
}
'
Note: The _id fields should probably be mapped as {"index": "not_analyzed"} so that they don't get analyzed, but instead store the exact value. Otherwise if you do a term query for 'FOO BAR' the doc won't be found, because the actual terms that are stored are: ['foo','bar']
Note: The has_child query is used to search for parent docs who have child docs (ie docs which specify a parent type and ID) that match certain search criteria.
The dot operator can be used to access nested data.
You can try something like this:
curl -XGET 'http://loclahost:port/INDEX/TYPE/_search?pretty=1' -d
'{
"query": {
"match": {
"resource_files.resource_file_catagory.name": "VALUE"
}
}
}'
If resource_file_catagory is non_analyzed the value is not tokenized and stored as a single value, hence giving you an exact match.
You can also use elasticsearch-head plugin for data validation and also query building reference.
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-plugins.html or
https://mobz.github.io/elasticsearch-head/
Related
Good day. I have elasticsearch in my rails app using tire.
I have many names in my db. And I want to search for them like search_query: "alex ivan", and the output should be ["Alexander Ivanov", "Alex Ivanenko] etc. (Real names from db)
I tried to make it with this article but it's not searching. So I've made a quickhack:
params[:search_query] = params[:search_query].split(" ").map{|a|a<<("*")}.join(" ")
Is it a good decision or I can do it with analyzers etc. ?
Here's what I did using analyzers for doing a search on names of businesses when I used ElasticSearch. Place this inside your mapping block and modify the index appropriately -- I think this will give you what you want:
indexes :name, :type => 'multi_field', :fields => {
:name => { :type => 'string', :analyzer => 'standard' },
:"name.exact" => { :type => 'string', :index => :not_analyzed }
}
Then inside your search and query blocks, something like:
search do
query do
# either a must match for exact match
boolean(:minimum_number_should_match => 1) do
must { string "name:#{<variable>}" }
end
# or a broader match
string "name:#{<variable>}*"
end
end
I am using the tire gem for ElasticSearch in Rails.
Ok so I have been battling with this the whole day and this is how far I have got. I would like to make fields on my model not searchable but they should still be available in the _source so I can use them for sorting on the search result.
My mappings:
mapping do
indexes :created_at, :type => 'date', :index => :not_analyzed
indexes :vote_score, :type => 'integer', :index => :not_analyzed
indexes :title
indexes :description
indexes :tags
indexes :answers do
indexes :description
end
end
My to_indexed_json method:
def to_indexed_json
{
vote_score: vote_score,
created_at: created_at,
title: title,
description: description,
tags: tags,
answers: answers.map{|answer| answer.description}
}.to_json
end
My Search query:
def self.search(term='', order_by, page: 1)
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? string(term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
end
The only issue I am battling with now is how do I make vote_score and created_at field not searchable but still manage to use them for sorting when I'm searching.
I tried indexes :created_at, :type => 'date', :index => :no but that did not work.
If I understand you, you are not specifying a field when you send your search query to elasticsearch. This means it will be executed agains the _all field. This is a "special" field that makes elasticsearch a little easier to get using quickly. By default all fields are indexed twice, once in their own field, and once in the _all field. (You can even have different mappings/analyzers applied to these two indexings.)
I think setting the field's mappings to "include_in_all": "false" should work for you (remove the "index": "no" part). Now the field will be tokenized (and you can search with it) under it's fieldname, but when directing a search at the _all field it won't affect results (as none of it's tokens are stored in the _all field).
Have a read of the es docs on mappings, scroll down to the parameters for each type
Good luck!
I ended up going with the approach of only matching on the fields I want and that worked. This matches on multiple fields.
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? (match [:title, :description, :tags, :answers], term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
Mapping:
include Tire::Model::Search
mapping do
indexes :name, :boost => 10
indexes :account_id
indexes :company_name
indexes :email, :index => :not_analyzed
end
def to_indexed_json
to_json( :only => [:name, :account_id, :email, :company_name],
)
end
From the above mapping the it can be seen that the email field is set to not_analyzed (no broken tokens). I have an user with email vamsikrishna#gmail.com.
Now when I search for vamsikrishna, the result is showing the user...I guess it is using the default analyzer. why?
But, it should be shown only when the complete email is specified I guess (vamsikrishna#gmail.com). Why is the :not_analyzed not considered in this case? Please help.
I need only the email field to be set as not_analyzed, other fields should use standard analyzer (which is done by default).
You are searching using the _all field. It means that you are using analyzer specified for _all, not for email. Because of this the analyzer specified for email doesn't affect your search.
There are a couple of ways to solve this issue. First, you can modify the analyzer for _all field to treat emails differently. For, example you can switch to uax_url_email tokenizer that works as standard tokenizer, but doesn't split emails into tokens.
curl -XPUT 'http://localhost:9200/test-idx' -d '{
"settings" : {
"index": {
"analysis" :{
"analyzer": {
"default": {
"type" : "custom",
"tokenizer" : "uax_url_email",
"filter" : ["standard", "lowercase", "stop"]
}
}
}
}
}
}
'
The second way is to exclude email field from _all and use your query to search against both fields at the same time.
try :analyzer => 'keyword' instead of :index => :not_analyzed
what it does is to tokenize the string and hence it will be searchable only as a whole.
Dont forget to reindex !
Ref - http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html
And still, if u are getting results by searching for vamsikrishna, check if you have other searchable fields with same value (for eg, name / company)
You're right, you should search for the whole field content in order to have a match on it if the specific field is not analyzed.
There are two options:
The mapping hasn't been submitted correctly. You can check your current mapping through the get mapping api: 'localhost:9200/_mapping' will give you the mapping of all your indexes. Not a tire expert, but shouldn't you provide not_analyzed as a string? 'not_analyzed' instead of :not_analyzed?
If you see that your mapping is there, that means you are searching on some other fields that match. Are you specifying the name of the field in your query?
Using Elasticsearch with Rails 3 and tire gem.
I have got facets to work on a couple of fields, but I now have a special requirement and not sure it is possible.
I have two fields on my model Project that both store the same values: Country1 and Country2
The user is allowed to store up to two countries for a project. The drop down menus on both are the same. Neither field is required.
What I would like is a single facet that 'merges' the values from Country1 and Country2 and would handle clicking on those facets intelligently (i.e. would find it whether it was in 1 or 2)
Here's my model so far: (note Country1/2 can be multiple words)
class Project < ActiveRecord::Base
mapping do
indexes :id
indexes :title, :boost => 100
indexes :subtitle
indexes :country1, :type => 'string', :index => 'not_analyzed'
indexes :country2, :type => 'string', :index => 'not_analyzed'
end
def self.search(params)
tire.search(load: true, page: params[:page], per_page: 10) do
query do
boolean do
must { string params[:query], default_operator: "AND" } if params[:query].present?
must { term :country1, params[:country] } if params[:country].present?
end
end
sort { by :display_type, "desc" }
facet "country" do
terms :country1
end
end
end
Any tips greatly appreciated!
This commit https://github.com/karmi/tire/commit/730813f in Tire brings support for aggregating over multiple fields in the "terms" facet.
The interface is:
Tire.search('articles-test') do
query { string 'foo' }
# Pass fields as an array, not string
facet('multi') { terms ['bar', 'baz'] }
end
according to the elasticsearch docs for the terms facet http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html this should be possible:
Multi Fields:
The term facet can be executed against more than one field, returning
the aggregation result across those fields. For example:
{
"query" : {
"match_all" : { }
},
"facets" : {
"tag" : {
"terms" : {
"fields" : ["tag1", "tag2"],
"size" : 10
}
}
}
}
did you try providing an array of fields to the term facet like terms :country1, :country2 ?
This seems to work but I need to test it more: facet('country') { terms fields: [:country1, :country2]}
I've been working with elastic search for sometime now and I've hit a roadblock where I have to search for events that match a particular start date (start_at). I've indexed my fields as
mapping do
indexes :name, :type => 'string', :analyzer => 'snowball'
indexes :description, :type => 'string', :analyzer => 'snowball'
indexes :start_at, :type => 'date'
indexes :end_at, :type => 'date'
indexes :tag_list, :type => 'string', :analyzer => 'snowball'
indexes :lat_lon, :type => 'geo_point'
indexes :user_details, :type => 'string'
end
def to_indexed_json
to_hash.merge({
:user_details => (user ? user.to_index : nil),
:artist_details => (artists ? artists.each{|artist| artist.to_index }: nil),
:primary_genre => (genre ? genre.name : nil),
:lat_lon => [lat, lng].join(',')
}).to_json
end
So when i hit
Tire.search('events') do
# ignore search query keywords
filter range: {start_at: {gte: Date.today, lt: Date.tomorrow}}
end
Returns nothing but works great with single ranges. That is
Tire.search('events') do
# ignore search query keywords
filter range: {start_at: {gte: Date.today}}
end
I indexed Elasticsearch for events mappings to make start_at and end_at into dates or it would perform term matches on those but something like this would not be the answer
Tire.search('events') do
query do
string "start_at: #{Date.today}"
end
end
Since this performs a string match it results in all records because the tokenizer would break into 2012, 05, 16 and since 2012 and 16 may match in multiple areas so it would return all matches.
I know I'm missing something very basic. I would appreciate any help on this.
Update
Event.find_all_by_start_at(Date.tomorrow + 1.day).size
Event Load (0.7ms) SELECT `events`.* FROM `events` WHERE `events`.`start_at` = '2012-05-19'
=> 1
So I have events for that day. Now when I run it with elastic search
ruby-1.9.2-p180 :024 > Tire.search('events') do
ruby-1.9.2-p180 :025 > filter :range, :start_at => {gte: Date.tomorrow + 1.days, lt: Date.tomorrow + 2.days}
ruby-1.9.2-p180 :026?> end
ruby-1.9.2-p180 :029 > x.to_curl
=> "curl -X GET \"http://localhost:9200/events/_search?pretty=true\" -d '{\"filter\":{\"range\":{\"start_at\":{\"gte\":\"2012-05-19\",\"lt\":\"2012-05-20\"}}}}'"
{"events":{"event":{"properties":{"allow_comments":{"type":"boolean"},"artist_details":{"type":"string"},"artist_id":{"type":"long"},"city":{"type":"string"},"comments_count":{"type":"long"},"confirm":{"type":"boolean"},"created_at":{"type":"date","format":"dateOptionalTime"},"description":{"type":"string","analyzer":"snowball"},"end_at":{"type":"string"},"event_attendees_count":{"type":"long"},"event_content_type":{"type":"string"},"event_file_name":{"type":"string"},"event_file_size":{"type":"long"},"genre_id":{"type":"long"},"hits":{"type":"long"},"id":{"type":"long"},"interview":{"type":"boolean"},"lat":{"type":"double"},"lat_lon":{"type":"geo_point"},"lng":{"type":"double"},"location":{"type":"string"},"name":{"type":"string","analyzer":"snowball"},"online_tix":{"type":"boolean"},"primary_genre":{"type":"string"},"private":{"type":"boolean"},"start_at":{"type":"string"},"state":{"type":"string"},"tag_list":{"type":"string","analyzer":"snowball"},"updated_at":{"type":"date","format":"dateOptionalTime"},"user_details":{"type":"string"},"user_id":{"type":"long"},"venue_id":{"type":"long"},"zip":{"type":"string"}}}}}
Elasticsearch tries to be flexible in handing mappings. At the same time, it has to deal with limitations of underlying search engine - Lucene. As a result, when existing mapping contradicts the updated mapping, the new mapping is ignored. Another feature of elasticsearch that probably played a role in this issue is automatic mapping creation based on the data. So, if you
Created new index
Indexed a records with the field start_at with a string that contains a date in a format that elasticsearch didn't recognize
Updated mapping assigning type "date" to the start_at field
you ended up with the mapping where the field start_at has type "string". The only way around it is to delete the index and specify the mapping before adding the first record.
It does not seem you need to use a search query - but a filter. Try something like this:
filter(:range, date: {
to: params[:date],
from: params[:date]
}) if params[:date].present?
Where params[:date] should match the format:
>> Time.now.strftime('%F')
=> "2014-03-10"
and could be anything - both hardtyped or passed in as parameters.
Fields :start_at and :end_at should be mapped as :type => 'date' (just as you have now), no need to change to string or anything alike.
This approach works with mapping of a field of date, should be fine also for datetime as Tire/Elasticsearch doesn't seem to differ those two field types.
Bonus: you can find nice rails elasticsearch/tire production setup example here:
https://gist.github.com/psyxoz/4326881