Elasticsearch return less result than SQL? - ruby-on-rails

The data structure is a Post which has_many Post_text. Following a great example at https://github.com/elasticsearch/elasticsearch-rails/blob/master/elasticsearch-model/examples/activerecord_associations.rb. I have defined the mapping as the following:
include SearchableModule
mapping do
indexes :country
indexes :post_texts do
indexes :subject, type: 'string', boost: 10, analyzer: 'snowball'
indexes :description, type: 'string', analyzer: 'snowball'
end
end
And of course, in the searchable_module.rb I just copy what's in the example with some changes in as_index_json():
def as_indexed_json(options={})
self.as_json(
include: { post_texts: { only: [:subject, :description]}
})
end
And things seems ok. I have re-import the data:
Post.import
Post.__elasticsearch__.
Then I try to check the result of SQL's LIKE and Elasticsearch by:
SQL LIKE:
PostText.where("subject LIKE '%Testing%' OR description LIKE '%Testing%'").each do |r|
puts r.post_id
end
There are 12 unique post_id with this approach.
Elasticsearch:
Post.search("Testing").results.count
=> 10
Is there anything I have missed? Thank you!!!!

you could try Post.search("Testing").total which should return summary number of results, in case with results.count you just count number of returned records suppose limited per_page

Related

Don't know how to sort elasticsearch-model results

I'm using elasticsearch-model on my RoR application to perform a search and have the results sorted.
I can perform the query and have back unsorted results, but when I add sort everything breaks with:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"profiles","node":"mad6gavaR3yTFabsF9m0rg","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}]},"status":400}
from /Users/ngw/.rvm/gems/ruby-2.2.2#utelier/gems/elasticsearch-transport-5.0.4/lib/elasticsearch/transport/transport/base.rb:202:in `__raise_transport_error'
which apparently is telling me that the way I configured the indexes is wrong.
Here is what I'm indexing
def as_indexed_json(options={})
{
profile_type: profile_type,
name: name,
specialisation: specialisation,
description: description,
tags: tags,
minimum_order: minimum_order,
company_city: company_city,
company_address: company_address,
continent_id: country.try(:continent).try(:id),
country_id: country.try(:id),
industry: industry.try(:id)
}
end
A query can use any of these fields, but not :name, which is only used for sorting purposes.
The configuration of my index is very simple:
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text'
indexes :description, analyzer: 'english'
end
end
I'm pretty sure my indexes are setup wrong, but after searching for some time inside elasticsearch-model tests I can't find anything relevant.
Can someone help me figure this out? Thanks in advance.
Sure the problem is that the type of name is text.
From elastic search 5 you can't sort on an analyzed field by default
Sorting can be either done on fields with field_data or doc_values enabled -> the data structure elasticsearch uses for sorting and aggregation.
Doc_values can't be enabled on analyzed string fields.
And field_data is by default disabled on analyzed string fields.
You can do two things
either change the mapping of name to keyword->which would be non_analyzed and doc_values would be enabled on it then.
Or you can enable field_data on field name by using "fielddata": true
Here are links reffering this
https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html
You can either use multi-fields:
On your indexes:
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text', fields: { keyword: { type: :keyword } }
indexes :description, analyzer: 'english'
end
end
On your search:
Model.search(
query: ...
sort: {
'name.keyword': { order: 'asc' }
}
)
Or just set fielddata to true(Warning: Not recommended since it uses a lot more resources):
settings index: { number_of_shards: 1 } do
mapping dynamic: false do
indexes :name, type: 'text', fielddata: true
indexes :description, analyzer: 'english'
end
end
Take a look at these links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#fielddata-mapping-param
https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

Return specific fields for elasticsearch in rails

This seems like a really easy issue, but everything I've tried from other solutions and websites is not working. I have three fields I do not want indexed or queried--:p_s, :gender, and :part_of_speech--but elasticsearch is still returning values from those fields even though I don't specify that they should be indexed or queried. About halfway down, this article says to say no to indexing, but they don't indicate where this would occur.
Term Controller:
def search
#terms = Term.search(params[:query]).page(params[:page])
end
Model:
require 'elasticsearch/model'
class Term < ActiveRecord::Base
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
settings index: { number_of_shards: 1, number_of_replicas: 0 },
do
mappings dynamic: 'false' do
indexes :id, index: :not_analyzed
indexes :name, analyzer: :spanish_analyzer
indexes :definition, analyzer: :combined_analyzer
indexes :etymology1, analyzer: :combined_analyzer
indexes :etymology2, analyzer: :combined_analyzer
indexes :uses, analyzer: :combined_analyzer
indexes :notes1, analyzer: :combined_analyzer
indexes :notes2, analyzer: :combined_analyzer
end
end
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^7', 'definition^6', 'etymology1^5', 'etymology2^4', 'uses^3', 'notes1^2', 'notes2^1'],
operator: 'and'
}
}
}
)
end
end
# Delete the previous term index in Elasticsearch
Term.__elasticsearch__.client.indices.delete index: Term.index_name rescue nil
# Create the new index with the new mapping
Term.__elasticsearch__.client.indices.create \
index: Term.index_name,
body: { settings: Term.settings.to_hash, mappings: Term.mappings.to_hash }
# Index all term records from the DB to Elasticsearch
Term.import(force: true)
To mark a field as non-indexed use this:
mappings dynamic: 'false' do
...
indexes :p_s, index: :no
indexes :gender, index: :no
indexes :part_of_speech, index: :no
...
end
By default elasticsearch returns all document fields under "_source" key. To only get specific fields you can either specify fields array on the top query level like this
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^7', 'definition^6', 'etymology1^5', 'etymology2^4', 'uses^3', 'notes1^2', 'notes2^1'],
operator: 'and'
}
},
fields: ['name', 'definition', 'etymology1', 'etymology2', 'uses', 'notes1', 'notes2']
}
)
end
or filter "_source"
def self.search(query)
__elasticsearch__.search(
{
query: {
multi_match: {
query: query,
fields: ['name^7', 'definition^6', 'etymology1^5', 'etymology2^4', 'uses^3', 'notes1^2', 'notes2^1'],
operator: 'and'
}
},
'_source': ['name', 'definition', 'etymology1', 'etymology2', 'uses', 'notes1', 'notes2']
}
)
end
See Elasticsearch source filtering docs for more.
When using multi_match clause, the inner fields element specifies the fields to run the search on and, optionally, the boost like in your example. The outer fields or '_source' clause in turn determines which fields to return and this is the one you're after.
To have a better visibility into what's going on while debugging elasticsearch queries, use a tool like Sense. When you get the result you want it may be much easier to transfer the query to ruby code than vice versa.
I think using the included elasticsearch methods makes a lot of sense. However, in my own case, in my model I did something like this, modified for your own case:
def as_indexed_json
as_json(only: [:id, :name, :definition, :etymology1, :etymology2, :uses, :notes1, :notes2])
end
This should work because by default Elasticsearch would call the as_indexed_json method in your model to get the data it needs to index.

How to make fields on my model not searchable but they should still be available in the _source?

I am using the tire gem for ElasticSearch in Rails.
Ok so I have been battling with this the whole day and this is how far I have got. I would like to make fields on my model not searchable but they should still be available in the _source so I can use them for sorting on the search result.
My mappings:
mapping do
indexes :created_at, :type => 'date', :index => :not_analyzed
indexes :vote_score, :type => 'integer', :index => :not_analyzed
indexes :title
indexes :description
indexes :tags
indexes :answers do
indexes :description
end
end
My to_indexed_json method:
def to_indexed_json
{
vote_score: vote_score,
created_at: created_at,
title: title,
description: description,
tags: tags,
answers: answers.map{|answer| answer.description}
}.to_json
end
My Search query:
def self.search(term='', order_by, page: 1)
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? string(term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end
end
The only issue I am battling with now is how do I make vote_score and created_at field not searchable but still manage to use them for sorting when I'm searching.
I tried indexes :created_at, :type => 'date', :index => :no but that did not work.
If I understand you, you are not specifying a field when you send your search query to elasticsearch. This means it will be executed agains the _all field. This is a "special" field that makes elasticsearch a little easier to get using quickly. By default all fields are indexed twice, once in their own field, and once in the _all field. (You can even have different mappings/analyzers applied to these two indexings.)
I think setting the field's mappings to "include_in_all": "false" should work for you (remove the "index": "no" part). Now the field will be tokenized (and you can search with it) under it's fieldname, but when directing a search at the _all field it won't affect results (as none of it's tokens are stored in the _all field).
Have a read of the es docs on mappings, scroll down to the parameters for each type
Good luck!
I ended up going with the approach of only matching on the fields I want and that worked. This matches on multiple fields.
tire.search(page: page, per_page: PAGE_SIZE, load: true) do
query { term.present? ? (match [:title, :description, :tags, :answers], term) : all }
sort {
by case order_by
when LAST_POSTED then {created_at: 'desc'}
else {vote_score: 'desc', created_at: 'desc'}
end
}
end

Elastic Search nested

I'm using Elastic search through tire gem.
Given this structure to index my resource model
mapping do
indexes :_id
indexes :version, analyzer: 'snowball', boost: 100
indexes :resource_files do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
indexes :resource_file_category do
indexes :_id
indexes :name, analyzer: 'snowball', boost: 100
end
end
end
How can i retrieve all the resources that have resource_files with a given resource_file_category id?
i've looked in the elastic search docs and i think could be using the has child filter
http://www.elasticsearch.org/guide/reference/query-dsl/has-child-filter.html
i've tried this way
filter :has_child, :type => 'resource_files', :query => {:filter => {:has_child => {:type => 'resource_file_category', :query => {:filter => {:term => {'_id' => params[:resource_file_category_id]}}}}}}
but i'm not sure if is possible/valid to make a "nested has_child filter" or if is there a better/simpler way to do this... any advice is welcome ;)
I'm afraid I don't know what your mapping definition means. It'd be easier to read if you just posted the output of:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/_mapping?pretty=1'
But you probably want something like this:
curl -XGET 'http://127.0.0.1:9200/YOUR_INDEX/YOUR_TYPE/_search?pretty=1' -d '
{
"query" : {
"term" : {
"resource_files.resource_file_catagory._id" : "YOUR VALUE"
}
}
}
'
Note: The _id fields should probably be mapped as {"index": "not_analyzed"} so that they don't get analyzed, but instead store the exact value. Otherwise if you do a term query for 'FOO BAR' the doc won't be found, because the actual terms that are stored are: ['foo','bar']
Note: The has_child query is used to search for parent docs who have child docs (ie docs which specify a parent type and ID) that match certain search criteria.
The dot operator can be used to access nested data.
You can try something like this:
curl -XGET 'http://loclahost:port/INDEX/TYPE/_search?pretty=1' -d
'{
"query": {
"match": {
"resource_files.resource_file_catagory.name": "VALUE"
}
}
}'
If resource_file_catagory is non_analyzed the value is not tokenized and stored as a single value, hence giving you an exact match.
You can also use elasticsearch-head plugin for data validation and also query building reference.
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-plugins.html or
https://mobz.github.io/elasticsearch-head/

Tire/Elasticsearch search for association

I have the following code and i'm trying to use ElasticSearch to query it.
It is working when i do Book.search(:q=>'Foo') but it doesn't work when i do Book.search(:author=>'Doctor'). In my database I have a entry with a name like "Barz, Foo, Doctor"
I'm not sure if I should use terms or term, in my query, because i'm breaking the name using snowball. I tried with terms and then I get an error. With term I get no results.
class Author < ActiveRecord::Base
has_many :books
end
class Book < ActiveRecord::Base
belongs_to :author
include Tire::Model::Search
include Tire::Model::Callbacks
mapping do
indexes :title,
indexes :description
indexes :author,type: 'object', properties: {
name: { type: 'multi_field',
fields: { name: { type: 'string', analyzer: 'snowball' },
exact: { type: 'string', index: 'not_analyzed' }
}
} }
end
def to_indexed_json
to_json(:include=>{:author=>{:only=>[:name]}} )
end
def self.search(params = {})
tire.search(load:true) do
query do
boolean do
should { string params[:q] } if params[:q].present?
should { term params[:author] } if params[:author].present?
end
end
filter :term, :active=>true
end
end
end
You can do like this
should { terms :author, [params[:author]]} if params[:author].present?
OR
should { term :author, params[:author]} if params[:author].present?
OR
should { string "author:#{params[:author]}"} if params[:author].present?
As #Karmi stated enter link description here
Hi, yeah, your approach seems one. Couple of things:
* unless you want to use Lucene query syntax (boosting, ranges, etc), it's maybe best to use the text query,
* yes, filters are more performant then queries, an the active=true in your example is a good fit for filters. Beware of the interplay between queries, filters and facets, though.
Your definition of the term query is incorrect, though -- it should be:
term :author, params[:author]

Resources