I'm using elasticsearch-model on my RoR application to perform a search and have the results sorted.
I can perform the query and have back unsorted results, but when I add sort everything breaks with:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"profiles","node":"mad6gavaR3yTFabsF9m0rg","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}
from /Users/ngw/.rvm/gems/ruby-2.2.2#utelier/gems/elasticsearch-transport-5.0.4/lib/elasticsearch/transport/transport/base.rb:202:in `__raise_transport_error'
which apparently is telling me that the way I configured the indexes is wrong.
Here is what I'm indexing
def as_indexed_json(options={})
{
profile_type: profile_type,
name: name,
specialisation: specialisation,
description: description,
tags: tags,
minimum_order: minimum_order,
company_city: company_city,
company_address: company_address,
continent_id: country.try(:continent).try(:id),
country_id: country.try(:id),
industry: industry.try(:id)
}
end
A query can use any of these fields, but not :name, which is only used for sorting purposes.
The configuration of my index is very simple:
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text'
indexes :description, analyzer: 'english'
end
end
I'm pretty sure my indexes are setup wrong, but after searching for some time inside elasticsearch-model tests I can't find anything relevant.
Can someone help me figure this out? Thanks in advance.
Sure the problem is that the type of name is text.
From elastic search 5 you can't sort on an analyzed field by default
Sorting can be either done on fields with field_data or doc_values enabled -> the data structure elasticsearch uses for sorting and aggregation.
Doc_values can't be enabled on analyzed string fields.
And field_data is by default disabled on analyzed string fields.
You can do two things
either change the mapping of name to keyword->which would be non_analyzed and doc_values would be enabled on it then.
Or you can enable field_data on field name by using "fielddata": true
Here are links reffering this
https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html
You can either use multi-fields:
On your indexes:
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text', fields: { keyword: { type: :keyword } }
indexes :description, analyzer: 'english'
end
end
On your search:
Model.search(
query: ...
sort: {
'name.keyword': { order: 'asc' }
}
)
Or just set fielddata to true(Warning: Not recommended since it uses a lot more resources):
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text', fielddata: true
indexes :description, analyzer: 'english'
end
end
Take a look at these links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#fielddata-mapping-param
https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
Related
This seems like a really easy issue, but everything I've tried from other solutions and websites is not working. I have three fields I do not want indexed or queried--:p_s, :gender, and :part_of_speech--but elasticsearch is still returning values from those fields even though I don't specify that they should be indexed or queried. About halfway down, this article says to say no to indexing, but they don't indicate where this would occur.
Term Controller:
def search
#terms = Term.search(params[:query]).page(params[:page])
end
Model:
require 'elasticsearch/model'
class Term < ActiveRecord::Base
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
settings index: { number_of_shards: 1, number_of_replicas: 0 },
do
mappings dynamic: 'false' do
indexes :id, index: :not_analyzed
indexes :name, analyzer: :spanish_analyzer
indexes :definition, analyzer: :combined_analyzer
indexes :etymology1, analyzer: :combined_analyzer
indexes :etymology2, analyzer: :combined_analyzer
indexes :uses, analyzer: :combined_analyzer
indexes :notes1, analyzer: :combined_analyzer
indexes :notes2, analyzer: :combined_analyzer
end
end
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^7', 'definition^6', 'etymology1^5', 'etymology2^4', 'uses^3', 'notes1^2', 'notes2^1'],
operator: 'and'
}
}
}
)
end
end
# Delete the previous term index in Elasticsearch
Term.__elasticsearch__.client.indices.delete index: Term.index_name rescue nil
# Create the new index with the new mapping
Term.__elasticsearch__.client.indices.create \
index: Term.index_name,
body: { settings: Term.settings.to_hash, mappings: Term.mappings.to_hash }
# Index all term records from the DB to Elasticsearch
Term.import(force: true)
To mark a field as non-indexed use this:
mappings dynamic: 'false' do
...
indexes :p_s, index: :no
indexes :gender, index: :no
indexes :part_of_speech, index: :no
...
end
By default elasticsearch returns all document fields under "_source" key. To only get specific fields you can either specify fields array on the top query level like this
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^7', 'definition^6', 'etymology1^5', 'etymology2^4', 'uses^3', 'notes1^2', 'notes2^1'],
operator: 'and'
}
},
fields: ['name', 'definition', 'etymology1', 'etymology2', 'uses', 'notes1', 'notes2']
}
)
end
or filter "_source"
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^7', 'definition^6', 'etymology1^5', 'etymology2^4', 'uses^3', 'notes1^2', 'notes2^1'],
operator: 'and'
}
},
'_source': ['name', 'definition', 'etymology1', 'etymology2', 'uses', 'notes1', 'notes2']
}
)
end
See Elasticsearch source filtering docs for more.
When using multi_match clause, the inner fields element specifies the fields to run the search on and, optionally, the boost like in your example. The outer fields or '_source' clause in turn determines which fields to return and this is the one you're after.
To have a better visibility into what's going on while debugging elasticsearch queries, use a tool like Sense. When you get the result you want it may be much easier to transfer the query to ruby code than vice versa.
I think using the included elasticsearch methods makes a lot of sense. However, in my own case, in my model I did something like this, modified for your own case:
def as_indexed_json
as_json(only: [:id, :name, :definition, :etymology1, :etymology2, :uses, :notes1, :notes2])
end
This should work because by default Elasticsearch would call the as_indexed_json method in your model to get the data it needs to index.
I'm trying to sort my ES results by 2 fields: searchable and year.
The mapping in my Rails app:
# mapping
def as_indexed_json(options={})
as_json(only: [:id, :searchable, :year])
end
settings index: { number_of_shards: 5, number_of_replicas: 1 } do
mapping do
indexes :id, index: :not_analyzed
indexes :searchable
indexes :year
end
end
The query:
#records = Wine.search(query: {match: {searchable: {query:params[:search], fuzziness:2, prefix_length:1}}}, sort: {_score: {order: :desc}, year: {order: :desc}}, size:100)
The interesting thing in the query:
sort: {_score: {order: :desc}, year: {order: :desc}}
I think the query is working well with the 2 sort params.
My problem is the score is not the same for 2 documents with the same name (searchable field).
For example, I'm searching for "winery":
You can see a very different score, even if the searchable field is the same. I think the issue is due to the ID field (it's an UUID in fact). Looks like this ID field influences the score.
But in my schema mapping, I wrote that ID should not be analyzed and in my ES query, I ask to search ONLY in "searchable" field, not in ID too.
What did I miss to math the same score for same fields ? (actually, sorting by year after score is not useful cos' scores are different for same fields)
Scores are different, because they are calculated independently for each shard. See here for more info.
The data structure is a Post which has_many Post_text. Following a great example at https://github.com/elasticsearch/elasticsearch-rails/blob/master/elasticsearch-model/examples/activerecord_associations.rb. I have defined the mapping as the following:
include SearchableModule
mapping do
indexes :country
indexes :post_texts do
indexes :subject, type: 'string', boost: 10, analyzer: 'snowball'
indexes :description, type: 'string', analyzer: 'snowball'
end
end
And of course, in the searchable_module.rb I just copy what's in the example with some changes in as_index_json():
def as_indexed_json(options={})
self.as_json(
include: { post_texts: { only: [:subject, :description]}
})
end
And things seems ok. I have re-import the data:
Post.import
Post.__elasticsearch__.
Then I try to check the result of SQL's LIKE and Elasticsearch by:
SQL LIKE:
PostText.where("subject LIKE '%Testing%' OR description LIKE '%Testing%'").each do |r|
puts r.post_id
end
There are 12 unique post_id with this approach.
Elasticsearch:
Post.search("Testing").results.count
=> 10
Is there anything I have missed? Thank you!!!!
you could try Post.search("Testing").total which should return summary number of results, in case with results.count you just count number of returned records suppose limited per_page
I am using the tire gem for ElasticSearch in Rails.
Ok so I have been battling with this the whole day and this is how far I have got. I would like to make fields on my model not searchable but they should still be available in the _source so I can use them for sorting on the search result.
My mappings:
mapping do
indexes :created_at, :type => 'date', :index => :not_analyzed
indexes :vote_score, :type => 'integer', :index => :not_analyzed
indexes :title
indexes :description
indexes :tags
indexes :answers do
indexes :description
end
end
My to_indexed_json method:
def to_indexed_json
{
vote_score: vote_score,
created_at: created_at,
title: title,
description: description,
tags: tags,
answers: answers.map{|answer| answer.description}
}.to_json
end
My Search query:
def self.search(term='', order_by, page: 1)
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? string(term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
end
The only issue I am battling with now is how do I make vote_score and created_at field not searchable but still manage to use them for sorting when I'm searching.
I tried indexes :created_at, :type => 'date', :index => :no but that did not work.
If I understand you, you are not specifying a field when you send your search query to elasticsearch. This means it will be executed agains the _all field. This is a "special" field that makes elasticsearch a little easier to get using quickly. By default all fields are indexed twice, once in their own field, and once in the _all field. (You can even have different mappings/analyzers applied to these two indexings.)
I think setting the field's mappings to "include_in_all": "false" should work for you (remove the "index": "no" part). Now the field will be tokenized (and you can search with it) under it's fieldname, but when directing a search at the _all field it won't affect results (as none of it's tokens are stored in the _all field).
Have a read of the es docs on mappings, scroll down to the parameters for each type
Good luck!
I ended up going with the approach of only matching on the fields I want and that worked. This matches on multiple fields.
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? (match [:title, :description, :tags, :answers], term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
Mapping:
include Tire::Model::Search
mapping do
indexes :name, :boost => 10
indexes :account_id
indexes :company_name
indexes :email, :index => :not_analyzed
end
def to_indexed_json
to_json( :only => [:name, :account_id, :email, :company_name],
)
end
From the above mapping the it can be seen that the email field is set to not_analyzed (no broken tokens). I have an user with email vamsikrishna#gmail.com.
Now when I search for vamsikrishna, the result is showing the user...I guess it is using the default analyzer. why?
But, it should be shown only when the complete email is specified I guess (vamsikrishna#gmail.com). Why is the :not_analyzed not considered in this case? Please help.
I need only the email field to be set as not_analyzed, other fields should use standard analyzer (which is done by default).
You are searching using the _all field. It means that you are using analyzer specified for _all, not for email. Because of this the analyzer specified for email doesn't affect your search.
There are a couple of ways to solve this issue. First, you can modify the analyzer for _all field to treat emails differently. For, example you can switch to uax_url_email tokenizer that works as standard tokenizer, but doesn't split emails into tokens.
curl -XPUT 'http://localhost:9200/test-idx' -d '{
"settings" : {
"index": {
"analysis" :{
"analyzer": {
"default": {
"type" : "custom",
"tokenizer" : "uax_url_email",
"filter" : ["standard", "lowercase", "stop"]
}
}
}
}
}
}
'
The second way is to exclude email field from _all and use your query to search against both fields at the same time.
try :analyzer => 'keyword' instead of :index => :not_analyzed
what it does is to tokenize the string and hence it will be searchable only as a whole.
Dont forget to reindex !
Ref - http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html
And still, if u are getting results by searching for vamsikrishna, check if you have other searchable fields with same value (for eg, name / company)
You're right, you should search for the whole field content in order to have a match on it if the specific field is not analyzed.
There are two options:
The mapping hasn't been submitted correctly. You can check your current mapping through the get mapping api: 'localhost:9200/_mapping' will give you the mapping of all your indexes. Not a tire expert, but shouldn't you provide not_analyzed as a string? 'not_analyzed' instead of :not_analyzed?
If you see that your mapping is there, that means you are searching on some other fields that match. Are you specifying the name of the field in your query?