Elasticsearch:Tire - If field is missing, put it last - ruby-on-rails

I am using rails and for search I am using Tire and elasticsearch. I have a string type field which in some records have value and in some records is nil.
I'd like to sort and show last, all the records that have null value in this field. As I see in this issue https://github.com/elasticsearch/elasticsearch/issues/896 in the current version this can't be possible through sort and elasticsearch.
Is there a workaround with rails? I am trying to do it using two searches and using filters like the following example:
filter :not, :missing => { :field => :video_url } if params[:video].present?
filter :missing, { :field => :video_url } if params[:video].blank?
But it didn't work (I can't understand why until now, I'll continue debugging).
Another idea is to create two methods with the specific fields. Any other solution/idea?
Update 2/2/2013
I finally did it like the following:
if video == "yes"
filter :not, :missing => { :field => :video_url }
elsif video == "no"
filter :missing, { :field => :video_url }
end
And I am passing the video parameter by my own. I am sorting and boosting the search but additionally I want all the objects that hasn't got video_url field, to appear at the bottom no matter how relevant they are. Indeed I don't need to sort by this field, just to show last the nil value fields.
So to solve this I am calling two times the search and with the addition of the code above, it works like a charm.
Just for completeness, my search method is the following:
def self.search(params, video = nil)
tire.search do
query do
boolean do
must { string params[:query], default_operator: "AND" } if params[:query].present?
must { term :active, true }
end
end
sort { by :update_ad => "desc" } unless params[:query].present?
facet "categories" do
terms :category_id
end
if video == "yes"
filter :not, :missing => { :field => :video_url }
elsif video == "no"
filter :missing, { :field => :video_url }
end
end
end
If you don't pass the video param, it won't apply any filter. In my mapping, I have set the boost, analyzers etc.
Thank you

First, the Elasticsearch issue you're linking to is still open and is only a feature suggestion.
Second, just as a note, are you really sure you want to sort as opposed to boost the score of certain records?
Third, if you indeed do want to sort on this field, the easiest way is to just index the field with some value which comes last ("ZZZ", weird Unicode chars, you get the picture). You probably don't want to do this by default, so it's a good idea to use the multi_field feature. Of course, you have to reindex your corpus to pick up the new settings.
Lastly, it is possible to sort by a script (see documentation), but it has the usual and obvious performance impact.

Related

pg_search negation: how to negate a string with whitespaces?

I'm using pg_search and trying to implement the negation option.
My search form contains a bunch of options, among which selecting key words and selecting words that mustn't be within the keywords. I get all search keywords and all negated keywords and format them for the ps_search method. This is how:
def self.find_words(words, not_words)
if words.present? || not_words.present?
a_not_words = not_words.gsub(/,/,"").split(" ").each{|l|l.prepend("!")}
a_words = words.gsub(/,/,"").split(" ")
a_all = a_not_words + a_words
my_words = a_all.join(" ")
pg_search_keywords(my_words)
else
order("id DESC")
end
end
This works well with simple words. For example, if someone want all results that contain "kitchen" and "drawer" but not "spoon", this is sent to the pg_search method: "kitchen drawer !spoon"
However, many of my keywords contain white spaces and stop words ("de" and "en" in French).
So if a person looks for a "rouleau de cuisine" (rolling pin) but doesn't want "assiette de table" (table dish) that this gets sent to pg_search method: "assiette de table !rouleau !de !cuisine".
This creates 3 problems:
1/ the word "de" is within the search terms and within the negated terms
2/ this looks for the keyword "table", which doesn't exist - only "assiette de table" exists
2/ I can't remove the stop word "de" because then it will start looking for and negating terms that don't exist.
UPDATE
This is how I implement pg_search_keywords :
pg_search_scope :pg_search_keywords,
:against => :summery,
:associated_against => {
:keywords => [:keyword]
},
:ignoring => :accents,
:using => {
:tsearch => {
:negation => true,
:dictionary => 'french'
}
}
Thanks!

What is a good way to `update_or_initialize_with` in Mongoid?

Each user has one address.
class User
include Mongoid::Document
has_one :address
end
class Address
include Mongoid::Document
belongs_to :user
field :street_name, type:String
end
u = User.find(...)
u.address.update(street_name: 'Main St')
If we have a User without an Address, this will fail.
So, is there a good (built-in) way to do u.address.update_or_initialize_with?
Mongoid 5
I am not familiar with ruby. But I think I understand the problem. Your schema might looks like this.
user = {
_id : user1234,
address: address789
}
address = {
_id: address789,
street_name: ""
user: user1234
}
//in mongodb(javascript), you can get/update address of user this way
u = User.find({_id: user1234})
u.address //address789
db.address.update({user: u.address}, {street_name: "new_street name"})
//but since the address has not been created, the variable u does not even have property address.
u.address = undefined
Perhaps you can try to just create and attached it manually like this:
#create an address document, to get _id of this address
address = address.insert({street_name: "something"});
#link or attached it to u.address
u.update({address: address._id})
I had this problem recently. There is a built in way but it differs from active records' #find_or_initialize_by or #find_or_create_by method.
In my case, I needed to bulk insert records and update or create if not found, but I believe the same technique can be used even if you are not bulk inserting.
# returns an array of query hashes:
def update_command(users)
updates = []
users.each do |user|
updates << { 'q' => {'user_id' => user._id},
'u' => {'address' => 'address'},
'multi' => false,
'upsert' => true }
end
{ update: Address.collection_name.to_s, updates: updates, ordered: false }
end
def bulk_update(users)
client = Mongoid.default_client
command = bulk_command(users)
client.command command
client.close
end
since your not bulk updating, assuming you have a foreign key field called user_id in your Address collection. You might be able to:
Address.collection.update({ 'q' => {'user_id' => user._id},
'u' => {'address' => 'address'},
'multi' => false,
'upsert' => true }
which will match against the user_id, update the given fields when found (address in this case) or create a new one when not found.
For this to work, there is 1 last crucial step though.
You must add an index to your Address collection with a special flag.
The field you are querying on (user_id in this case)
must be indexed with a flag of either { unique: true }
or { sparse: true }. the unique flag will raise an error
if you have 2 or more nil user_id fields. The sparse option wont.
Use that if you think you may have nil values.
access your mongo db through the terminal
show dbs
use your_db_name
check if the addresses collection already has the index you are looking for
db.addresses.getIndexes()
if it already has an index on user_id, you may want to remove it
db.addresses.dropIndex( { user_id: 1} )
and create it again with the following flag:
db.addresses.createIndex( { user_id: 1}, { sparse: true } )
https://docs.mongodb.com/manual/reference/method/db.collection.update/
EDIT #1
There seems to have changes in Mongoid 5.. instead of User.collection.update you can use User.collection.update_one
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/
The docs show you need a filter rather than a query as first argument but they seem to be the same..
Address.collection.update_one( { user_id: user_id },
'$set' => { "address": 'the_address', upsert: true} )
PS:
If you only write { "address": 'the_address' } as your update clause without including an update operator such as $set, the whole document will get overwritten rather than updating just the address field.
EDIT#2
About why you may want to index with unique or sparse
If you look at the upsert section in the link bellow, you will see:
To avoid multiple upserts, ensure that the filter fields are uniquely
indexed.
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/

How to store big JSON hashes in Rails

I'm using the elasticsearch-rails gem and the elasticsearch-model gem and writing a query that happens to be really huge just because of the way the gem accepts queries.
The query itself isn't very long, but it's the filters that are very, very long, and I need to pass variables in to filter out the results correctly. Here is an example:
def search_for(input, question_id, tag_id)
query = {
:query => {
:filtered => {
:query => {
:match => {
:content => input
}
},
:filter => {
:bool => {
:must => [
{
# another nested bool with should
},
{
# another nested bool with must for question_id
},
{
# another nested bool with must for tag_id
}
]
}
}
}
}
}
User.search(query) # provided by elasticsearch-model gem
end
For brevity's sake, I've omitted the other nested bools, but as you can imagine, this can get quite long quite fast.
Does anyone have any ideas on how to store this? I was thinking of a yml file, but it seems wrong especially because I need to pass in question_id and tag_id. Any other ideas?
If anyone is familiar with those gems and knows whether the gem's search method accepts other formats, I'd like to know that, too. Looks to me that it just wants something that can turn into a hash.
I think using a method is fine. I would separate the searching from the query:
def query_for(input, question_id, tag_id)
query = {
:query => {
...
end
search query_for(input, question_id, tag_id)
Also, I see that this search functionality is in the User model, but I wonder if it is belongs there. Would it make more sense to have a Search or Query model?

re-tire elastic search multi table/index search

I' trying to figure out what would be the best way to do a multi table search with elastic.co.
In particular, I was wondering if I could add more indexes to this search method.
Chapter.rb
def self.search(params)
fields = [:title, :description, :content ]
**tables** = [Chapter.index_name, Book.index_name]
tire.search(**tables**, {load: true,page: params[:page], per_page: 5}) do
query do
boolean do
must { string params[:q], default_operator: "AND" } if params[:q].present?
end
end
highlight *fields, :options => { :tag => '<strong>' }
end
The above example works without the Tables. How to make it work with the tables ?
If you're adding more indexes then you are moving away from it being a model-centric search. That's probably fine as I guess you'll be handling the search results differently on account of them being from different indexes.
In which case I think you can do:
Tire.search([Chapter.index_name, Book.index_name],
page: params[:page],
... etc ...
) do
query do
... etc ...
end
end
It does mean that you won't be able to do stuff like load: true because you've moved outside of knowing what model to load the results for.
From digging around in the code (here) it looks like you might be able to specify multiple indexes even for a model-centric search. Something like:
tire.search({
index: [Chapter.index_name, Book.index_name],
load: true,
... etc ...
I haven't tried it though and I'm doubtful as to whether it will work - again because of not being able to load the results into a specific model once multiple indexes are involved.

Invalid results when searching emails using elasticsearch with Tire and Ruby on Rails

I'm trying index and search by email using Tire and elasticsearch.
The problem is that if I search for: "something#example.com". I get strange results because of # and . symbols. I "solved" by hacking the query string and adding "email:" before a string I suspect is a string. If I don't do that, when searching "something#example.com", I would get results as "something#gmail.com" or "asd#example.com".
include Tire::Model::Search
include Tire::Model::Callbacks
settings :analysis =>{
:analyzer => {
:whole_email => {
'tokenizer' => 'uax_url_email'
}
}
} do
mapping do
indexes :id
indexes :email, :analyzer => 'whole_email', :boost => 10
end
end
def self.search(params)
params[:query] = params[:query].split(" ").map { |x| x =~ EMAIL_REGEXP ? "email:#{x}" : x }.join(" ")
tire.search(load: {:include => {'event' => 'organizer'}}, page: params[:page], per_page: params[:per_page] || 10) do
query do
boolean do
must { string params[:query] } if params[:query].present?
must { term :event_id, params[:event_id] } if params[:event_id].present?
end
end
sort do
by :id, 'desc'
end
end
end
def to_indexed_json
self.to_json
end
When searching with "email:" the analyzer works perfectly but without it, it search that string in email without the specified analyzer, getting lots of undesired results.
I think your issue is to do with the _all field. By default, all fields get indexed twice, once under their field name, and again, using a different analyzer, in the _all field.
If you send a query without specifying which field you are searching in, then it will be executed against the _all field. When you index your doc, the email fields content is indexed again under the _all field (to stop this set include_in_all: false in your mapping) where they are tokenized the standard way (split on # and .). This means that unguided queries will give strange results.
The way I would fix this is to use a term query for the emails and make sure to specify the field to search on. A term query is faster as it doesn't have a query parsing step the query_string query has (which is why when you prefix the string with "email:" it goes to the right field, that's the query parser working). Also you don't need to specify a custom analyzer unless you are indexing a field that contains both free text and urls and emails. If the field only contains emails then just set index: not_analyzed and it will remain a single token. (You might want to have a custom analyzer that lowercases the email though.)
Make your search query like this:
"term": {
"email": "example#domain.com"
}
Good luck!
Add the field to _all and try search with adding escape character(\) to special characters of emailid.
example:something\#example\.com

Resources