Fuzzy matching using sphinx in rails - ruby-on-rails

I am using Sphinx for searching in rails 2.3.4. And thinking-sphinx gem(1.4.4) AND thinking-sphinx-raspell(1.0.0)
In configuration i added the things below.
morphology = metaphone, stem_en, libstemmer_sv, soundex
min_stemming_len = 4
charset_type = utf-8
min_infix_len = 3
enable_star = 1
Now i give a string "sny" for search
It give the results like "syn" and not like "sony"
If i use double meta phone in postgresql means it give the result, that contains "sony"
how to configure the sphinx for fuzzy matching to get the result as above ?

Related

How to sort by salary with Ransack and ignoring the dollar symbol?

I am currently attempting to create multiple sort functions for my rails project using ransack gem. The issue that I am having with ransacker, is that I cannot read past the format of the string, because it has a ($) in some of the post and commas as well. What I would like to do is still sort the data attribute and ignore both the $ conditional dollar symbol and thousand position commas (may not be included in certain cases) & append current input from search box
For example:
string = "$30,000" -> parse to remove $ and leave only 30000 for the search engine to find the records that include the number & what was written in the search_form input (job.job_title). The code that I wrote is below, it may not be correct as I was trying multiple approaches. Final result: Ransack should search for "30000 marketing position"
rails view
<li>$30,000+ <%= sort_link(#q, :salary_between_30_and_40k, default_order: :desc) %></li>
job.rb
ransacker :salary_between_30_and_40k do
Arel.sql('SELECT * FROM JOBS WHERE job.hourly_wage_salary BETWEEN 30000 AND 40000')
end
The correct approach here is to migrate your database so that salary details are stored as a numeric value rather than a string with formatting.

Rails NOT IN query and regexp

I have array of strings:
a = ['*#foo.com', '*#bar.com', '*#baz.com']
I would like to query my model so I will get all the records where email isn't in any of above domains.
I could do:
Model.where.not(email: a)
If the list would be a list of strings but the list is more of a regexp.
It depends on your database adapter. You will probably be able to use raw SQL to write this type of query. For example in postgres you could do:
Model.where("email NOT SIMILAR TO '%#foo.com'")
I'm not saying thats exactly how you should be doing it but it's worth looking up your database's query language and see if anything matches your needs.
In your example you would have to join together your matchers as a single string and interpolate it into the query.
a = ['%#foo.com', '%#bar.com', '%#baz.com']
Model.where("email NOT SIMILAR TO ?", a.join("|"))
Use this code:
a = ['%#foo.com', '%#bar.com', '%#baz.com']
Model.where.not("email like ?",a.join("|"))
Replace * to % in array.

how to get solr results in given order specified in query

I have framed query to submit to solr which is of following format.
id:95154 OR id:68209 OR id:89482 OR id:94233 OR id:112481 OR id:93843
i want to get records according to order from starting. say i need to get document with id 95154 document first then id 68209 next and so on. but its not happening right now its giving last id 93843 first and some times random.i am using solr in grails 2.1 and my solr version is 1.4.0. here is sample way i am getting documents from solr
def server = solrService.getServer('provider')
SolrQuery sponsorSolrQuery = new SolrQuery(solarQuery)
def queryResponse = server.query(sponsorSolrQuery);
documentsList = queryResponse.getResults()
As #injecteer mentions, there is nothing built-in to Lucene to consider the sequence of clauses in a boolean query, but:
You are able to apply boosts to each term, and as long as the field is a basic field (meaning, not a TextField), the boosts will apply cleanly to give you a decent sort by score.
id:95154^6 OR id:68209^5 OR id:89482^4 OR id:94233^3 OR id:112481^2 OR id:93843
there's no such thing in Lucene (I strongly assume, that in Solr as well). In Lucene you can sort the results based on contents of documents' fields, but not on the order of clauses in a query.
that means, that you have to sort the results yourself:
documentsList = queryResponse.getResults()
def sordedByIdOrder = solarQueryAsList.collect{ id -> documentList.find{ it.id == id } }

Ruby On Rails - Geocoder - Near with condition

I'm using GeoCoder in my application. Now I need to search for objects in my database which are close to a position OR have specific attribute set. I would like to perform this action in one database query, because the database is realy huge.
I would like to have something like
Spot.near([lat,long],distance).where("visited = ?",true).
The distance and the visited attribute should be combined with an OR, not with an AND.
Does anyone have an idea how to do this?
Thank you!
Based off of this answer, you should be able to do something like:
near = Spot.near([lat, long], distance)
visited = Spot.where(visited: true)
near = near.where_values.reduce(:and)
visited = visited.where_values.reduce(:and)
Spot.where(near.or(visited))
I'm in the process of upgrading a Rails application from Rails 4 to Rails 7 and ran into this problem. While I have no doubt Luke's suggestion worked in earlier versions, it doesn't work in Rails 7 (I'm currently running activerecord-7.0.3.1.
In my particular case, I am using the geocoder near() method to return results that are within a 20 mile radius of the query, but I also wanted to use OR conditions to return results where the query was similar to the text values in either the name or location columns from the items table in an attempt to return relevant items that haven't been assigned latitude and longitude values.
In Rails 4, my solution was:
select("items.*").near(q, 20, select: :geo_only).tap do |near_query|
near_query.where_values.last << sanitize_sql([" OR items.location LIKE ? OR items.name LIKE ?", "%#{q}%", "%#{q}%"])
end
In Rails/ActiveRecord 7, the where_values() method no longer exists. Searching for an alternate solution led me to this post. I wound up spending a fair amount of time perusing the latest ActiveRecord and Arel code for a solution. Here's what I came up with,
Rails 7 solution:
t = Item.arel_table
arel_geo_conditions = Item.near(q, 20).where_clause.ast # ast: Abstract Syntax Tree
Item.where(arel_geo_conditions.or(t[:location].matches("%#{q}%").or(t[:name].matches("%#{q}%"))))

Thinking Sphinx : Relevance - infix vs. complete word

Using Thinking Sphinx in my rails app, I set it up to allow partial match with infix (for example, searching for "tray" would match "ashtray").
However, I'd like complete word match to have more weight (relevance) than infix match.
So, if my search for 'tray' returns these 3 results : "Silver Tray", "Ashtray" and "Some other tray" - I want the "Ashtray" to be the last result when sorting by relevance.
Is there a way to configure Sphinx to do that ?
You need to define your own ranker. Here's how the default ones look like:
SPH_RANK_PROXIMITY_BM25 = sum(lcs*user_weight)*1000+bm25
SPH_RANK_BM25 = bm25
SPH_RANK_NONE = 1
SPH_RANK_WORDCOUNT = sum(hit_count*user_weight)
SPH_RANK_PROXIMITY = sum(lcs*user_weight)
SPH_RANK_MATCHANY = sum((word_count+(lcs-1)*max_lcs)*user_weight)
SPH_RANK_FIELDMASK = field_mask
SPH_RANK_SPH04 = sum((4*lcs+2*(min_hit_pos==1)+exact_hit)*user_weight)*1000+bm25
http://sphinxsearch.com/docs/2.0.6/weighting.html

Resources