Thinking Sphinx : Relevance - infix vs. complete word - ruby-on-rails

Using Thinking Sphinx in my rails app, I set it up to allow partial match with infix (for example, searching for "tray" would match "ashtray").
However, I'd like complete word match to have more weight (relevance) than infix match.
So, if my search for 'tray' returns these 3 results : "Silver Tray", "Ashtray" and "Some other tray" - I want the "Ashtray" to be the last result when sorting by relevance.
Is there a way to configure Sphinx to do that ?

You need to define your own ranker. Here's how the default ones look like:
SPH_RANK_PROXIMITY_BM25 = sum(lcs*user_weight)*1000+bm25
SPH_RANK_BM25 = bm25
SPH_RANK_NONE = 1
SPH_RANK_WORDCOUNT = sum(hit_count*user_weight)
SPH_RANK_PROXIMITY = sum(lcs*user_weight)
SPH_RANK_MATCHANY = sum((word_count+(lcs-1)*max_lcs)*user_weight)
SPH_RANK_FIELDMASK = field_mask
SPH_RANK_SPH04 = sum((4*lcs+2*(min_hit_pos==1)+exact_hit)*user_weight)*1000+bm25
http://sphinxsearch.com/docs/2.0.6/weighting.html

Related

selecting range of values based upon first few characters in spss?

I know that through
select cases if char.substr(variable_name,1,3)="I22".
I can select values based on the first # of characters but this is not exactly my question. I need to select RANGE OF values that start with few characters, here is an example of what I want:
if I have the following cases:
I22A33
I22B33
I22C33
I22D33
So I want to select I22B33 and I22C33 out of the above 4 values, so it's like a range of cases between b and c.
One way to flag any cases that meet your criteria is using INDEX and a series of OR conditions. Not particularly modular, but if you just have a couple of conditions you're searching for it could get you on your way.
Edit: These searches are case-insensitive (due to UPCASE) and search for matches at the start of the string. To search for matches anywhere within the string set the condition to > 0 (instead of = 1).
COMPUTE f_I22 = (INDEX(UPCASE(var_name),'I22B33') = 1)
OR (INDEX(UPCASE(var_name),'I22C33') = 1) .
EXE .
Assuming in this range of values that you want to select, all the values will start with either "I22B" or "I22C", you can simply use:
select cases if char.substr(variable_name,1,4)="I22B" or
char.substr(variable_name,1,4)="I22C".

Configure Sphinx to index dash and search it with and without it

I have a record
Item id: 1, name: "wd-40"
How do I configure Sphinx to match this record on the following queries:
Item.search("wd40")
Item.search("wd-40")
To answer your title question, charset_table is what you want.
http://sphinxsearch.com/docs/current.html#charsets
But that doesnt actully solve the query of matching those two queries, indexing - wouldn't work, just be the inverse of indexing it.
Instead, you probably want ignore_chars
http://sphinxsearch.com/docs/current.html#conf-ignore-chars
First indexing:
By default, only ascii characters are indexed by Sphinx; the others are considered word separators. To fix that, you need to use the charset_table parameter to map the dash to the dash character.
Second searching:
AFAIK, it is not possible to make Sphinx to consider both searches like you are asking for. However, you can just use something like:
# in Python, but I believe is understandable
query = word
if '-' in word:
query += " | " + word.replace('-','')
Item.search(query) # if word = 'wd-40', query = 'wd-40 | wd40'

String array in database. match and return values

I have the following data in a column letters in a mysql database. I saved them in varchar:
letters
["a","b"]
["a","b","d"]
["a","d"]
["d","c","e"]
["e","c","f"]
["c","f"]
["f","e"]
I am trying to match some elements. When I have params[:lttrs] as"a", I want to return:
["a","b"]
["a","b","d"]
["a","d"]
When I have params[:lttrs] as "c,e", I want to return:
["d","c","e"]
["e","c","f"]
My attempt is to retrieve all the rows and then match each of them with include?('a'), but with that, I can only do one element at a time. Is that the approach?
You can use the LIKE operator with wild card matching. This solution will be faster than using ruby. But it will be slow if you have a large table. If you provide more details about the reason for your data design we will be able to suggest alternative approaches.
like_params = params[:lttrs].split(",").map{|letter| "%#{letter}%"}
like = like_params.map{ "LIKE ?"}.join(" OR ")
Model.where(like, like_params)
You could try doing something like:
query = "letters "
r = 1
letters_arr = params[:lttrs].split(",")
letters_arr.each do |l|
if r == 1:
query << " like '%#{l}%'"
else
query << " or like '%#{l}%'"
end
end
letters_found = WhateverModel.where(query)
Obviously you need to make sure that the params are handled more safely than that, but that should get you on your way.

Sphinx (via SphinxQL) match without asterisk, but not with asterisk

I have an index in Sphinx, one of the words in this index is an article number. In this case 04.007.00964.
When I query my index like this:
SELECT * FROM myIndex WHERE MATCH('04.007.00964')
I have one result, this is as expected.
However, when I query it like this:
SELECT * FROM myIndex WHERE MATCH('*04.007.00964*')
I have zero results.
My index configuration is:
index myIndex
{
source = myIndex
path = D:\Tools\Sphinx\data\myIndex
morphology = none
min_word_len = 3
min_prefix_len = 0
min_infix_len = 2
enable_star = 1
}
I'm using v2.0.4-release
What am I doing wrong, or what dont I understand?
Because of
min_word_len = 3
The first query will be effectivly:
SELECT * FROM myIndex WHERE MATCH('007 00964')
So short words are ignored. (indexing and querying)
Edit to add: And "." is not in the default charset_table, which is why its used as a seperator.
However "*04" is not stripped, because it 3 chars,
but then there is nothing to match, because "04" will not be in the index (its shorter than the min_word_len)
... so its a unfortunate combination of word and infix lengths. Can easily fix it by making min_word_len = 2
Edit to add: or adding '.' to charset tables, so that its no longer used to separate words, therefore the whole article number is used - and is longer than both min_word_len and min_infix_len)

Fuzzy matching using sphinx in rails

I am using Sphinx for searching in rails 2.3.4. And thinking-sphinx gem(1.4.4) AND thinking-sphinx-raspell(1.0.0)
In configuration i added the things below.
morphology = metaphone, stem_en, libstemmer_sv, soundex
min_stemming_len = 4
charset_type = utf-8
min_infix_len = 3
enable_star = 1
Now i give a string "sny" for search
It give the results like "syn" and not like "sony"
If i use double meta phone in postgresql means it give the result, that contains "sony"
how to configure the sphinx for fuzzy matching to get the result as above ?

Resources