not_analyzed is not working as expected - ruby-on-rails

Mapping:
include Tire::Model::Search
mapping do
indexes :name, :boost => 10
indexes :account_id
indexes :company_name
indexes :email, :index => :not_analyzed
end
def to_indexed_json
to_json( :only => [:name, :account_id, :email, :company_name],
)
end
From the above mapping the it can be seen that the email field is set to not_analyzed (no broken tokens). I have an user with email vamsikrishna#gmail.com.
Now when I search for vamsikrishna, the result is showing the user...I guess it is using the default analyzer. why?
But, it should be shown only when the complete email is specified I guess (vamsikrishna#gmail.com). Why is the :not_analyzed not considered in this case? Please help.
I need only the email field to be set as not_analyzed, other fields should use standard analyzer (which is done by default).

You are searching using the _all field. It means that you are using analyzer specified for _all, not for email. Because of this the analyzer specified for email doesn't affect your search.
There are a couple of ways to solve this issue. First, you can modify the analyzer for _all field to treat emails differently. For, example you can switch to uax_url_email tokenizer that works as standard tokenizer, but doesn't split emails into tokens.
curl -XPUT 'http://localhost:9200/test-idx' -d '{
"settings" : {
"index": {
"analysis" :{
"analyzer": {
"default": {
"type" : "custom",
"tokenizer" : "uax_url_email",
"filter" : ["standard", "lowercase", "stop"]
}
}
}
}
}
}
'
The second way is to exclude email field from _all and use your query to search against both fields at the same time.

try :analyzer => 'keyword' instead of :index => :not_analyzed
what it does is to tokenize the string and hence it will be searchable only as a whole.
Dont forget to reindex !
Ref - http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html
And still, if u are getting results by searching for vamsikrishna, check if you have other searchable fields with same value (for eg, name / company)

You're right, you should search for the whole field content in order to have a match on it if the specific field is not analyzed.
There are two options:
The mapping hasn't been submitted correctly. You can check your current mapping through the get mapping api: 'localhost:9200/_mapping' will give you the mapping of all your indexes. Not a tire expert, but shouldn't you provide not_analyzed as a string? 'not_analyzed' instead of :not_analyzed?
If you see that your mapping is there, that means you are searching on some other fields that match. Are you specifying the name of the field in your query?

Related

Don't know how to sort elasticsearch-model results

I'm using elasticsearch-model on my RoR application to perform a search and have the results sorted.
I can perform the query and have back unsorted results, but when I add sort everything breaks with:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"profiles","node":"mad6gavaR3yTFabsF9m0rg","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}
from /Users/ngw/.rvm/gems/ruby-2.2.2#utelier/gems/elasticsearch-transport-5.0.4/lib/elasticsearch/transport/transport/base.rb:202:in `__raise_transport_error'
which apparently is telling me that the way I configured the indexes is wrong.
Here is what I'm indexing
def as_indexed_json(options={})
{
profile_type: profile_type,
name: name,
specialisation: specialisation,
description: description,
tags: tags,
minimum_order: minimum_order,
company_city: company_city,
company_address: company_address,
continent_id: country.try(:continent).try(:id),
country_id: country.try(:id),
industry: industry.try(:id)
}
end
A query can use any of these fields, but not :name, which is only used for sorting purposes.
The configuration of my index is very simple:
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text'
indexes :description, analyzer: 'english'
end
end
I'm pretty sure my indexes are setup wrong, but after searching for some time inside elasticsearch-model tests I can't find anything relevant.
Can someone help me figure this out? Thanks in advance.
Sure the problem is that the type of name is text.
From elastic search 5 you can't sort on an analyzed field by default
Sorting can be either done on fields with field_data or doc_values enabled -> the data structure elasticsearch uses for sorting and aggregation.
Doc_values can't be enabled on analyzed string fields.
And field_data is by default disabled on analyzed string fields.
You can do two things
either change the mapping of name to keyword->which would be non_analyzed and doc_values would be enabled on it then.
Or you can enable field_data on field name by using "fielddata": true
Here are links reffering this
https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html
You can either use multi-fields:
On your indexes:
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text', fields: { keyword: { type: :keyword } }
indexes :description, analyzer: 'english'
end
end
On your search:
Model.search(
query: ...
sort: {
'name.keyword': { order: 'asc' }
}
)
Or just set fielddata to true(Warning: Not recommended since it uses a lot more resources):
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text', fielddata: true
indexes :description, analyzer: 'english'
end
end
Take a look at these links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#fielddata-mapping-param
https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

How to make fields on my model not searchable but they should still be available in the _source?

I am using the tire gem for ElasticSearch in Rails.
Ok so I have been battling with this the whole day and this is how far I have got. I would like to make fields on my model not searchable but they should still be available in the _source so I can use them for sorting on the search result.
My mappings:
mapping do
indexes :created_at, :type => 'date', :index => :not_analyzed
indexes :vote_score, :type => 'integer', :index => :not_analyzed
indexes :title
indexes :description
indexes :tags
indexes :answers do
indexes :description
end
end
My to_indexed_json method:
def to_indexed_json
{
vote_score: vote_score,
created_at: created_at,
title: title,
description: description,
tags: tags,
answers: answers.map{|answer| answer.description}
}.to_json
end
My Search query:
def self.search(term='', order_by, page: 1)
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? string(term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
end
The only issue I am battling with now is how do I make vote_score and created_at field not searchable but still manage to use them for sorting when I'm searching.
I tried indexes :created_at, :type => 'date', :index => :no but that did not work.
If I understand you, you are not specifying a field when you send your search query to elasticsearch. This means it will be executed agains the _all field. This is a "special" field that makes elasticsearch a little easier to get using quickly. By default all fields are indexed twice, once in their own field, and once in the _all field. (You can even have different mappings/analyzers applied to these two indexings.)
I think setting the field's mappings to "include_in_all": "false" should work for you (remove the "index": "no" part). Now the field will be tokenized (and you can search with it) under it's fieldname, but when directing a search at the _all field it won't affect results (as none of it's tokens are stored in the _all field).
Have a read of the es docs on mappings, scroll down to the parameters for each type
Good luck!
I ended up going with the approach of only matching on the fields I want and that worked. This matches on multiple fields.
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? (match [:title, :description, :tags, :answers], term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end

ruby on rails: ElasticSearch / Tire dynamic search on multiple indices

I've done a bunch of searching and I haven't been able to get an answer to this question - hopefully this isn't a repeat (apologies if it is)...
Preface: I'm using Rails & Tire to perform ElasticSearch.
I have an object, Place, with attributes "name", "city", "state", and "zip". They are indexed as follows:
indexes :name, :type => 'multi_field', :fields => {
:name => { :type => 'string', :analyzer => 'snowball' },
:"name.exact" => { :type => 'string', :index => :not_analyzed }
}
indexes :city
indexes :state
indexes :zip
There are three conditions for searching: 1. Name only, 2. (City, State OR Zip), 3. Name AND (City, State OR Zip).
My code for the "query" block is:
if (City, State).present?
boolean do
must { string "name:#{name}*" } if name.present?
must { string "city:#{city_state}*" }
must { string "state:#{city_state}*" }
end
elsif (Zip).present?
boolean do
must { string "name:#{name}*" } if name.present?
must { string "zip:#{query_parameters["zip"]}*" }
end
else
string "name:#{name}*" }
end
The aforementioned search conditions #1 and #2 work as expected against multiple tests. However, condition 3 does not - it seems to only pay attention to the "name" field. I'm assuming it has something to do with using the "city_state" variable to search on both "city" and "state"... But I'm doing this because a user can enter either "Chicago" or "Illinois" in the City, State / Zip text box and the search should still work, using either the geographic center of Chicago or the geographic center of Illinois, respectively.
Anything obvious I'm doing wrong?
However, condition 3 does not - it seems to only pay attention to the "name" field
Errr, isn't
string "name:#{name}*"
telling it to do exactly that?
or did you mean to just do
string "#{name}"

Exclude indexed data in search results - elasticsearch (tire)

I'm trying to use elasticsearch via tire gem for a multi-tenant app. The app has many accounts with each account having many users.
Now I would like to index the User based on account id.
User Model:
include Tire::Model::Search
mapping do
indexes :name, :boost => 10
indexes :account_id
indexes :company_name
indexes :description
end
def to_indexed_json
to_json( :only => [:name, :account_id, :description, :company_name],
)
end
Search Query:
User.tire.search do
query do
filtered do
query { string 'vamsikrishna' }
filter :term, :account_id => 1
end
end
end
The filter works fine and the results are displayed only for the filtered account id (1). But, when I search for a query string 1:
User.tire.search do
query do
filtered do
query { string '1' }
filter :term, :account_id => 1
end
end
end
Lets say there is an user named 1. In that case, the results are getting displayed for all the users with account id 1 and the user too. This is because I added account_id in the to_indexed_json. Now, how can I display only the user with name 1? (All users should not be available in the hits. Only the user with name 1 should be displayed)
When there are no users with 1 as name or company name or description, I just don't want any results to be displayed. But, in my case as I explained I would get all the users in the account id 1.
You are searching on the _all special field, which contains a copy of the content of all the fields that you indexed. You should search on a specific field to have the right results like this: field_name:1.
If you want you can search on multiple fields using the query string:
{
"query_string" : {
"fields" : ["field1", "field2", "field3"],
"query" : "1"
}
}

Elastic Search nested

I'm using Elastic search through tire gem.
Given this structure to index my resource model
mapping do
indexes :_id
indexes :version, analyzer: 'snowball', boost: 100
indexes :resource_files do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
indexes :resource_file_category do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
end
end
end
How can i retrieve all the resources that have resource_files with a given resource_file_category id?
i've looked in the elastic search docs and i think could be using the has child filter
http://www.elasticsearch.org/guide/reference/query-dsl/has-child-filter.html
i've tried this way
filter :has_child, :type => 'resource_files', :query => {:filter => {:has_child => {:type => 'resource_file_category', :query => {:filter => {:term => {'_id' => params[:resource_file_category_id]}}}}}}
but i'm not sure if is possible/valid to make a "nested has_child filter" or if is there a better/simpler way to do this... any advice is welcome ;)
I'm afraid I don't know what your mapping definition means. It'd be easier to read if you just posted the output of:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/_mapping?pretty=1'
But you probably want something like this:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/YOUR_TYPE/_search?pretty=1' -d '
{
"query" : {
"term" : {
"resource_files.resource_file_catagory._id" : "YOUR VALUE"
}
}
}
'
Note: The _id fields should probably be mapped as {"index": "not_analyzed"} so that they don't get analyzed, but instead store the exact value. Otherwise if you do a term query for 'FOO BAR' the doc won't be found, because the actual terms that are stored are: ['foo','bar']
Note: The has_child query is used to search for parent docs who have child docs (ie docs which specify a parent type and ID) that match certain search criteria.
The dot operator can be used to access nested data.
You can try something like this:
curl -XGET 'http://loclahost:port/INDEX/TYPE/_search?pretty=1' -d
'{
"query": {
"match": {
"resource_files.resource_file_catagory.name": "VALUE"
}
}
}'
If resource_file_catagory is non_analyzed the value is not tokenized and stored as a single value, hence giving you an exact match.
You can also use elasticsearch-head plugin for data validation and also query building reference.
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-plugins.html or
https://mobz.github.io/elasticsearch-head/

Resources