pg_search negation: how to negate a string with whitespaces? - ruby-on-rails

I'm using pg_search and trying to implement the negation option.
My search form contains a bunch of options, among which selecting key words and selecting words that mustn't be within the keywords. I get all search keywords and all negated keywords and format them for the ps_search method. This is how:
def self.find_words(words, not_words)
if words.present? || not_words.present?
a_not_words = not_words.gsub(/,/,"").split(" ").each{|l|l.prepend("!")}
a_words = words.gsub(/,/,"").split(" ")
a_all = a_not_words + a_words
my_words = a_all.join(" ")
pg_search_keywords(my_words)
else
order("id DESC")
end
end
This works well with simple words. For example, if someone want all results that contain "kitchen" and "drawer" but not "spoon", this is sent to the pg_search method: "kitchen drawer !spoon"
However, many of my keywords contain white spaces and stop words ("de" and "en" in French).
So if a person looks for a "rouleau de cuisine" (rolling pin) but doesn't want "assiette de table" (table dish) that this gets sent to pg_search method: "assiette de table !rouleau !de !cuisine".
This creates 3 problems:
1/ the word "de" is within the search terms and within the negated terms
2/ this looks for the keyword "table", which doesn't exist - only "assiette de table" exists
2/ I can't remove the stop word "de" because then it will start looking for and negating terms that don't exist.
UPDATE
This is how I implement pg_search_keywords :
pg_search_scope :pg_search_keywords,
:against => :summery,
:associated_against => {
:keywords => [:keyword]
},
:ignoring => :accents,
:using => {
:tsearch => {
:negation => true,
:dictionary => 'french'
}
}
Thanks!

Related

Invalid results when searching emails using elasticsearch with Tire and Ruby on Rails

I'm trying index and search by email using Tire and elasticsearch.
The problem is that if I search for: "something#example.com". I get strange results because of # and . symbols. I "solved" by hacking the query string and adding "email:" before a string I suspect is a string. If I don't do that, when searching "something#example.com", I would get results as "something#gmail.com" or "asd#example.com".
include Tire::Model::Search
include Tire::Model::Callbacks
settings :analysis =>{
:analyzer => {
:whole_email => {
'tokenizer' => 'uax_url_email'
}
}
} do
mapping do
indexes :id
indexes :email, :analyzer => 'whole_email', :boost => 10
end
end
def self.search(params)
params[:query] = params[:query].split(" ").map { |x| x =~ EMAIL_REGEXP ? "email:#{x}" : x }.join(" ")
tire.search(load: {:include => {'event' => 'organizer'}}, page: params[:page], per_page: params[:per_page] || 10) do
query do
boolean do
must { string params[:query] } if params[:query].present?
must { term :event_id, params[:event_id] } if params[:event_id].present?
end
end
sort do
by :id, 'desc'
end
end
end
def to_indexed_json
self.to_json
end
When searching with "email:" the analyzer works perfectly but without it, it search that string in email without the specified analyzer, getting lots of undesired results.
I think your issue is to do with the _all field. By default, all fields get indexed twice, once under their field name, and again, using a different analyzer, in the _all field.
If you send a query without specifying which field you are searching in, then it will be executed against the _all field. When you index your doc, the email fields content is indexed again under the _all field (to stop this set include_in_all: false in your mapping) where they are tokenized the standard way (split on # and .). This means that unguided queries will give strange results.
The way I would fix this is to use a term query for the emails and make sure to specify the field to search on. A term query is faster as it doesn't have a query parsing step the query_string query has (which is why when you prefix the string with "email:" it goes to the right field, that's the query parser working). Also you don't need to specify a custom analyzer unless you are indexing a field that contains both free text and urls and emails. If the field only contains emails then just set index: not_analyzed and it will remain a single token. (You might want to have a custom analyzer that lowercases the email though.)
Make your search query like this:
"term": {
"email": "example#domain.com"
}
Good luck!
Add the field to _all and try search with adding escape character(\) to special characters of emailid.
example:something\#example\.com

What is the Ruby on Rails-way to import a structured comma delimited file and then create records Activerecord

I have a structured comma delimited file that has two record types. The different records are differentiated by a header entry: H or P. The file format follows:
"H","USA","MD","20904"
"P","1","A","Female","W"
"P","2","A","Male","H"
I'd like to import the file and then create activerecord models with the imported data. The approach that I am using is to create a field map that includes the number of fields, object name and columns.
I then utilize the field map
$field_map =
{
'H' =>
{
:count => 4,
:object => :Header,
:cols => [:record_type, :country_id, :state, :zip]
},
'R' =>
{
:count => 4,
:object => :RaceData,
:cols => [:record_type, :household_size, :gender, :race]
}
}
I then use FastCSV to import the file and use a case statement to how the file will be transformed and then used in activerecord create statements.
FasterCSV.foreach(filename) do |row|
tbl_type = row[0]
tbl_info = $field_map[tbl_type]
unless (tbl_info.nil?)
field_no = tbl_info[:count]
object = tbl_info[:object]
columns = tbl_info[:cols]
record_type = new_record[:record_type]
case record_type
when "H"
factory_build_h_record(new_record)
when "P"
factory_build_p_record(new_record)
end
end
end
The code above is summarized due to space constraints. My approach works just fine, but my I'm new to ruby and I'm always interested in best practices and the "true" Ruby-way of doing things. I'd be interested in hearing how more experienced programmers would tackle this problem. Thanks for your input.
I suggest the gem 'roo'
You have an example source code here but I rather watch the 10 min video

Elasticsearch:Tire - If field is missing, put it last

I am using rails and for search I am using Tire and elasticsearch. I have a string type field which in some records have value and in some records is nil.
I'd like to sort and show last, all the records that have null value in this field. As I see in this issue https://github.com/elasticsearch/elasticsearch/issues/896 in the current version this can't be possible through sort and elasticsearch.
Is there a workaround with rails? I am trying to do it using two searches and using filters like the following example:
filter :not, :missing => { :field => :video_url } if params[:video].present?
filter :missing, { :field => :video_url } if params[:video].blank?
But it didn't work (I can't understand why until now, I'll continue debugging).
Another idea is to create two methods with the specific fields. Any other solution/idea?
Update 2/2/2013
I finally did it like the following:
if video == "yes"
filter :not, :missing => { :field => :video_url }
elsif video == "no"
filter :missing, { :field => :video_url }
end
And I am passing the video parameter by my own. I am sorting and boosting the search but additionally I want all the objects that hasn't got video_url field, to appear at the bottom no matter how relevant they are. Indeed I don't need to sort by this field, just to show last the nil value fields.
So to solve this I am calling two times the search and with the addition of the code above, it works like a charm.
Just for completeness, my search method is the following:
def self.search(params, video = nil)
tire.search do
query do
boolean do
must { string params[:query], default_operator: "AND" } if params[:query].present?
must { term :active, true }
end
end
sort { by :update_ad => "desc" } unless params[:query].present?
facet "categories" do
terms :category_id
end
if video == "yes"
filter :not, :missing => { :field => :video_url }
elsif video == "no"
filter :missing, { :field => :video_url }
end
end
end
If you don't pass the video param, it won't apply any filter. In my mapping, I have set the boost, analyzers etc.
Thank you
First, the Elasticsearch issue you're linking to is still open and is only a feature suggestion.
Second, just as a note, are you really sure you want to sort as opposed to boost the score of certain records?
Third, if you indeed do want to sort on this field, the easiest way is to just index the field with some value which comes last ("ZZZ", weird Unicode chars, you get the picture). You probably don't want to do this by default, so it's a good idea to use the multi_field feature. Of course, you have to reindex your corpus to pick up the new settings.
Lastly, it is possible to sort by a script (see documentation), but it has the usual and obvious performance impact.

PG full text search on rails using pg_search gem for substring

I am using Pg full text search for my search . As i am using Ruby on rails, I am using pg_search gem. How do i configure it to give a hit for substring as well.
pg_search_scope :search_by_detail,
:against => [
[:first_name,'A'],
[:last_name,'B'],
[:email,'C']
],
:using => {
:tsearch => {:prefix => true}
}
Right now it gives a hit if the substring is in the start but it wont give a hit if the substring in the middle
example It gives a hit for sdate#example.com but not for example.com
I'm the author and maintainer of pg_search.
Unfortunately, PostgreSQL's tsearch by default doesn't split up email addresses and allow you to match against parts. It might work if you turned on :trigram search, though, since it matches arbitrary sub-strings that appear anywhere in the searchable text.
pg_search_scope :search_by_detail,
:against => [
[:first_name,'A'],
[:last_name,'B'],
[:email,'C']
],
:using => {
:tsearch => {:prefix => true},
:trigram => {}
}
I confirmed this by running the following command in psql:
grant=# SELECT plainto_tsquery('example.com') ## to_tsvector('english', 'name#example.com');
?column?
----------
f
(1 row)
I know that the parser does detect email addresses, so I think it must be possible. But it would involve building a text search dictionary in PostgreSQL that would properly split the email address up into tokens.
Here is evidence that the text search parser knows that it is an email address:
grant=# SELECT ts_debug('english', 'name#example.com');
ts_debug
-----------------------------------------------------------------------------
(email,"Email address",name#example.com,{simple},simple,{name#example.com})
(1 row)

Why does the 'amazon-ecs' gem return no results when the search string is too detailed?

I'm trying to pull search results from the Amazon Product Advertising API (amazon-ecs) gem. I am having issues with my search string, but only when it is too detailed.
Now assume that a user enters this search:
search_string = 'big book of birth'
In that case, this works:
res = Amazon::Ecs.item_search(search_string, {:response_group => 'Large', :search_index => 'Books'})
In other words, in my console I get the following:
res.has_error?
=> false
Even this works:
search_string = 'big book of birth by'
res = Amazon::Ecs.item_search(search_string, {:response_group => 'Large', :search_index => 'Books'})
res.has_error?
=> false
Mysteriously, this DOES NOT WORK:
search_string = 'big book of birth by erika lyons'
res = Amazon::Ecs.item_search(search_string, {:response_group => 'Large', :search_index => 'Books'})
res.has_error?
=> true
res.error
=> "We did not find any matches for your request."
Is there some options / param that I need to include to make this search "fuzzy" like the one on Amazon.com (e.g., spellchecker, truncating unnecessary words, etc.)? There, searching 'big book of birth by erika lyons' results in the exact book at the top of the list after truncating some words.

Resources