Thinking Sphinx sql like query in a condition for a field - ruby-on-rails

I had these queries, but now I'm trying to use sphinx, and I need to replace them, but I can't find a way to do this:
p1 = Product.where "category LIKE ?", "#{WORD}"
p2 = Product.where "category LIKE ?", "#{WORD}.%"
product_list = p1 + p2
I'm doing the search over a model named "Product" in "category" field; I need a way to replace "#" and "%" in sphinx. I have a basic idea of how to do that, but this isn't working:
Product.search conditions: {category: "('WORD' | 'WORD.*')"}

There's a few things to note.
If you want to match on prefixes, make sure you have min_prefix_len set to 1 or greater (the smaller, the more accurate, but also the slower your searches will be, and the larger your index files will get). Also, you need enable_star set to true. Both of these settings belong in config/thinking_sphinx.yml (there's examples in the docs).
Single quotes have no purpose in Sphinx searches, and will be ignored - but I don't think that's a problem with what you're trying to search with.
Full stops, however, are treated as word separators by default. You can change this with charset_table - but that means all full stops in all fields will be treated as part of words (say, at the end of sentences), so I wouldn't recommend it.
However, if full stops are ignored, then each word in the category field is indexed separately, and so without any extra settings, this should work:
Product.search conditions: {category: WORD}

Related

How to do synonym matching on regular expressions in solr 5.3.1?

I'm writing a sunspot application for a large gene database. Ligands and receptors for genes are named with the normal gene name, followed by an 'l' or an 'r', respectively, so for example a ligand for the gene 'MIP2' would be called 'MIP2l'. However, I want to account for instances in which the scientists will search for them using the syntax "MIP2 ligand". How can I combine the two tokens "MIP2" and "ligand" into one, and then concat them?
I tried using the Synonym Graph Filter Factory, but my solr is in 5.3.1, so it won't load. A quick update is not feasible. I also tried the technique illustrated in this article (https://lucidworks.com/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/), but the database is too large for a simple synonyms.txt doc. I want to use regular expressions for this, but I can't without combining the two tokens into one first.
This is my current search function, the sql lookup and weird hashing is because it's replacing an old search function, and the sql lookup is how I get the properly formatted data for the view.
search = GeneName.search do
fulltext params[:search][:search_str]
order_by(:use_name, :asc)
order_by(:score, :desc)
end
gene_ids = []
for gene_name in search.results
gene_ids << gene_name.gene_id unless gene_name.nil? or gene_ids.include? gene_name.gene_id
end
gene_ids_to_s = gene_ids.to_s.gsub("[","(").gsub("]",")")
#raise gene_ids_to_s.inspect
#genes = Gene.find_by_sql("select distinct g.id gene_id from genes g, gene_names gn where g.id = gn.gene_id and g.id in #{gene_ids_to_s} order by use_name desc") unless gene_ids_to_s == "()"
I believe I fixed it, but it's a lame workaround where I just added
#str.downcase!
#str.gsub!(" ligand", "l")
#str.gsub!(" receptor","r")
params[:search][:search_str] = #str
before the previously mentioned code section. #str is a parsed version of params[:search][:search_str]
I realize this isn't really your question. But, it seems like here:
gene_ids = []
for gene_name in search.results
gene_ids << gene_name.gene_id unless gene_name.nil? or gene_ids.include? gene_name.gene_id
end
You could be using map, compact, and uniq, like:
gene_ids = search.results.map do |result|
result.gene_id unless result.gene_name.nil?
end.compact.uniq
Also, I never use find_by_sql and I don't really understand what you're doing there. But, I wonder if you could use a standard ActiveRecord query there?

How to store regex or search terms in Postgres database and evaluate in Rails Query?

I am having trouble with a DB query in a Rails app. I want to store various search terms (say 100 of them) and then evaluate against a value dynamically. All the examples of SIMILAR TO or ~ (regex) in Postgres I can find use a fixed string within the query, while I want to look the query up from a row.
Example:
Table: Post
column term varchar(256)
(plus regular id, Rails stuff etc)
input = "Foo bar"
Post.where("term ~* ?", input)
So term is VARCHAR column name containing the data of at least one row with the value:
^foo*$
Unless I put an exact match (e.g. "Foo bar" in term) this never returns a result.
I would also like to ideally use expressions like
(^foo.*$|^second.*$)
i.e. multiple search terms as well, so it would match with 'Foo Bar' or 'Search Example'.
I think this is to do with Ruby or ActiveRecord stripping down something? Or I'm on the wrong track and can't use regex or SIMILAR TO with row data values like this?
Alternative suggestions on how to do this also appreciated.
The Postgres regular expression match operators have the regex on the right and the string on the left. See the examples: https://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-TABLE
But in your query you're treating term as the string and the 'Foo bar' as the regex (you've swapped them). That's why the only term that matches is the exact match. Try:
Post.where("? ~* term", input)

Postgresql text searching, matching multiple words

I don't know the name for this kind of search, but I see that it's getting pretty common.
Let's say I have records with the following file names:
'order_spec.rb', 'order.sass', 'orders_controller_spec.rb'
If I search with the following string 'oc' I would like the result to return 'orders_controller_spec.rb' due to match the o in orders and the c in controller.
If the string is 'os' then I'd like all 3 to match, 'order_spec.rb', 'order.sass', 'orders_controller_spec.rb'.
If the string is 'oco' then I'd like 'orders_controller_spec.rb'
What is the name for this kind of search and how would I go about getting this done in Postgresql?
This is a called a subsequence search. One simple way to do it in Postgres is to use the LIKE operator (or several of the other options in those docs) and fill the spaces between your letters with a wildcard, which for LIKE is %. To match anything with an o followed by an s in the words column, that would look like this:
SELECT * FROM table WHERE words LIKE '%o%s%';
This is a relatively expensive search, but you can improve performance with a varchar_pattern_ops or text_pattern_ops index to support faster pattern matching.
CREATE INDEX pattern_index ON table (words varchar_pattern_ops);

Configure Sphinx to index dash and search it with and without it

I have a record
Item id: 1, name: "wd-40"
How do I configure Sphinx to match this record on the following queries:
Item.search("wd40")
Item.search("wd-40")
To answer your title question, charset_table is what you want.
http://sphinxsearch.com/docs/current.html#charsets
But that doesnt actully solve the query of matching those two queries, indexing - wouldn't work, just be the inverse of indexing it.
Instead, you probably want ignore_chars
http://sphinxsearch.com/docs/current.html#conf-ignore-chars
First indexing:
By default, only ascii characters are indexed by Sphinx; the others are considered word separators. To fix that, you need to use the charset_table parameter to map the dash to the dash character.
Second searching:
AFAIK, it is not possible to make Sphinx to consider both searches like you are asking for. However, you can just use something like:
# in Python, but I believe is understandable
query = word
if '-' in word:
query += " | " + word.replace('-','')
Item.search(query) # if word = 'wd-40', query = 'wd-40 | wd40'

Sphinx, Rails, ThinkSphinx and making some words matter more than others in your query

I have a list of keywords that I need to search against, using ThinkingSphinx
Some of them being more important than others, i need to find a way to weight those words.
So far, the only solution i came up with is to repeat x number of times the same word in my query to increase its relevance.
Eg:
3 keywords, each of them having a level of importance: Blue(1) Recent(2) Fun(3)
I run this query
MyModel.search "Blue Recent Recent Fun Fun Fun", :match_mode => :any
Not very elegant, and quite limiting.
Does anyone have a better idea?
If you can get those keywords into a separate field, then you could weight those fields to be more important. That's about the only good approach I can think of, though.
MyModel.search "Blue Recent Fun", :field_weights => {"keywords" => 100}
Recently I've been using Sphinx extensively, and since the death of UltraSphinx, I started using Pat's great plugin (Thanks Pat, I'll buy you a coffee in Melbourne soon!)
I see a possible solution based on your original idea, but you need to make changes to the data at "index time" not "run time".
Try this:
Modify your Sphinx SQL query to replace "Blue" with "Blue Blue Blue Blue", "Recent" with "Recent Recent Recent" and "Fun" with "Fun Fun". This will magnify any occurrences of your special keywords.
e.g. SELECT REPLACE(my_text_col,"blue","blue blue blue") as my_text_col ...
You probably want to do them all at once, so just nest the replace calls.
e.g. SELECT REPLACE(REPLACE(my_text_col,"fun","fun fun"),"blue","blue blue blue") as my_text_col ...
Next, change your ranking mode to SPH_RANK_WORDCOUNT. This way maximum relevancy is given to the frequency of the keywords.
(Optional) Imagine you have a list of keywords related to your special keywords. For example "pale blue" relates to "blue" and "pleasant" relates to "fun". At run time, rewrite the query text to look for the target word instead. You can store these words easily in a hash, and then loop through it to make the replacements.
# Add trigger words as the key,
# and the related special keyword as the value
trigger_words = {}
trigger_words['pale blue'] = 'blue'
trigger_words['pleasant'] = 'fun'
# Now loop through each query term and see if it should be replaced
new_query = ""
query.split.each do |word|
word = trigger_words[word] if trigger_words.has_key?(word)
new_query = new_query + ' ' word
end
Now you have quasi-keyword-clustering too. Sphinx is really a fantastic technology, enjoy!

Resources