How to Order by relevance in a SQL query in Rails?

How to Order by relevance in a SQL query in Rails? - ruby-on-rails

I am using a SQL query for a specified and faster search in my Rails app, currently I have the following query that works wonderfully for gathering the data from the initial search:
SELECT * FROM selected_tables
WHERE field1 LIKE '%#{present_param}'
AND field2 LIKE '%#{present_param2}'
And so on like that, with each LIKE line only appearing if the relevant parameter is present from the form.
So I am now able to get back a large amount of results from this query, but they're not ordered in any helpful way. I need some way of ordering the results based on their relevance to the original user input from the form, but I can't seem to find anything on google about it. Is there a way in SQL (specifically postgresql) that I can order the results based on this?
To be clear, when I say relevance I mean that a given search keyword should be in the title or company name for the result, not just present somewhere in the content.
For example: if you search "Sony" you get Sony Electronics first, not another listing containing Sony somewhere in the middle of its name.

I ended up using a series of Case/When statements that were weighted with various integer scores to apply priority to my results. They work wonderfully and turned out something like this:
SELECT title, company, user, CASE
WHEN upper(company_name) LIKE '%#{word[0].upcase}%' THEN 3
WHEN upper(company_name) LIKE '%#{company_name.upcase}%' THEN 2
ELSE 0 END as score
FROM selected_tables
WHERE company_name LIKE '%#{company_name}%'
ORDER BY score DESC;

Related

Indices not working on sqlite table

I am using indices on columns on which I am making a search. The indices are created like this:
CREATE INDEX index1 on <TABLE>(<col1> COLLATE NOCASE ASC)
CREATE INDEX index2 on <TABLE>(<col2> COLLATE NOCASE ASC)
CREATE INDEX index3 on <TABLE>(<col3> COLLATE NOCASE ASC)
Now, the select query to search for records is like this:
select <col1> from <TABLE> where <col1> like '%monit%' AND <col2> like '%84%' GROUP BY <col1> limit 0,501;
When I run EXPLAIN QUERY PLAN on my sqlite database like this:
EXPLAIN QUERY PLAN select <col1> from <TABLE> where <col1> like '%monit%' AND <col2> like '%84%' GROUP BY <col1> limit 0,501;
It returns the output as:
0|0|0|SCAN TABLE USING INDEX (~250000 rows)
and when I drop the index, the output this EXPLAIN QUERY PLAN produces is:
0|0|0|SCAN TABLE (~250000 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY
Isn't the number of rows that are scanned (~250000 rows) were supposed to be lesser when index was used in searching the table???
I guess the problem here is with LIKE keyword, because I have read somewhere that LIKE keyword nullifies the use if indices... Here is the link
EDIT: For indices to work on a query which is using LIKE, The right-hand side of the LIKE must be a string literal that does not begin with a wildcard character. So, in the above query, I tried using search parameter in like without '%' at the beginning:
EXPLAIN QUERY PLAN select <col1> from <TABLE> where <col1> like 'monit%' AND <col2> like '84%' GROUP BY <col1> limit 0,501;
and the output I got was this:
0|0|0|SEARCH TABLE partnumber USING INDEX model_index_partnumber (model>? AND model
so,you see. The number of rows being searched (rather than scan) are (~15625 rows) in this.
But the problem now is I cannot do away with % wild card at the beginning. Anyone pls suggest me an alternative way to achieve the same....
EDIT:
I have tried using FTS3 from terminal but when I typed this query:
CREATE VIRTUAL TABLE <tbl> USING FTS3 (<col_list>);
Its throwing error as:
Error: no such module: FTS3
Someone pls help me to enable FTS3 from terminal as well as XCode (need the steps I must perform for both tasks).
I am using sqlcipher and have already perform this from terminal:
CFLAGS="-DSQLITE_ENABLE_FTS3=1" ./configure
EDIT:
Please visit the question sqlite table taking time to fetch the records in LIKE query posted by me
EDIT:
Hey All, I got some success. I modified my select query to look like this:
select distinct description collate nocase as description from partnumber where rowid BETWEEN 1 AND (select max(rowid) from partnumber) AND description like '%a%' order by description;
And Bingo, the search time was like never before. But the problem now is when I execute the command EXPLAIN QUERY PLAN like this, it shows me using B-Tree for distinct which I dont want to use.
explain query plan select distinct description collate nocase as description from partnumber where rowid BETWEEN 1 AND (select max(rowid) from partnumber) AND description like '%a%' order by description;
Output:
0|0|0|SEARCH TABLE partnumber USING INTEGER PRIMARY KEY (rowid>? AND rowid<?) (~15625 rows)
0|0|0|EXECUTE SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE partnumber USING INTEGER PRIMARY KEY (~1 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT

A couple other options ...
Full Text Indexes:
http://sqlite.org/fts3.html
The most common (and effective) way to describe full-text searches is
"what Google, Yahoo and Altavista do with documents placed on the
World Wide Web".
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
Word Breaking:
If you're looking for words (or words that start with), you can break text blobs into words yourself and store your own indexed word tables. But even then, you'll be able to only do word like 'monit%' to get hits like "monitor"
If possible, use the full text - it will be much less code. But, if that's not an option for some reason, then you can fall back to your own word breaking tables but that's limited words begins with to avoid scans. (better than whole text block begins with).
Be aware that the sqlite that comes with iOS does not have Full Text enabled. You can work around that. There's instructions on that and it's use at:
http://longweekendmobile.com/2010/06/16/sqlite-full-text-search-for-iphone-ipadyour-own-sqlite-for-iphone-and-ipad/
The full docs on creating and querying full text tables are here: http://sqlite.org/fts3.html
To get FTS3 to also work from terminal, see:
Compiling the command line interface # http://www.sqlite.org/howtocompile.html
sqlite3 using fts3 create table in my mac terminal and how to use it in iphone xcode project?

This is quite simple. You are telling SQLITE to examine every record in the table. It is faster to do this without using an index, because using an index wuld involve additional IO. And index is used when you want to examine a subset of the records in a table where the extra IO of using the index is paid back by not having to examine every record in the table.
When you say LIKE "%something" that means all records with anything at all at the beginning of the field, followed by something. The only way to do this is to examine every single record. Note that indexes should still be used if you only use LIKE "something%" because in this case, SQLITE can use the index to find the subset of records beginning with "something". In the old days when databases where not so clever we used to write it like this to enforce the use of an index. SELECT * WHERE col1 >= "something" AND col1 < "somethinh", Note the intentional mispelling of something in the second condition.
If you can it is best to avoid using % at the beginning of a LIKE condition. In some cases you may be able to change your schema so that data is stored in two columns rather than one. Then you use a LIKE "something%" search on the second of the two columns. Of course this depends on your data being structured right.
But even if splitting into two columns is not possible, it may be possible to divide and conquer the data in another way. For instance you could split the search fields into words, and index every word in a single column in another search table. That way "look for something or other" becomes a list of records where "something" is an exact match on a record in the search table. No LIKE required. You would then get a record ID to retrieve the original record. This is one of the things that SOLR does internally so if you must stick with SQLITE and cannot leverage SOLR or LUCENE in any way, then you can always read up on how they build inverted indices and do the same thing yourself in your SQLITE db.
Remember that LIKE "%something%" must examine every record, but if you can select a subset of the data first, and then apply the LIKE search, this will run a lot faster. Filling the cache will have the same effect which is what your experiments with DISTINCT were doing. Maybe all you need to do is to enlarge the cache to get acceptable search times. The first search will still be slow, but people are often quite forgiving of problems which go away when you retry it.
When you use arbitrary wildcards like that you are getting very close to a full text search engine requirement like SOLR. These work by indexing the data 100% in RAM. With SQLITE you might be able to do something similar by creating a second in-memory database, reading all data from the disk tables into the in-memory db, then using the in-memory db for searching with wildcards. You would still have full-table scans with queries such as LIKE "%monit%" however that scan takes place in RAM where it is not as timeconsuming. You don't need to import all your data into RAM, only the parts where you need "%something%" searches, because SQLITE can do cross-database joins. SQLITE makes it easy to create an in-memory database, and the ATTACH DATABASE and DETACH DATABASE commands make it easy to connect a second database to your app. There is some example code for IOS in this question Can iPhone sqlite apps attach to other databases?
Not sure why you don't like EXPLAIN using B-Trees since the b-tree is probably the fastest possible search structure available when your data has to be read from a filesystem.

I have a MySQL book that suggests REVERSE() the text (and if your application permits, store in a column). Then search the reversed text using LIKE(REVERSE('%something')).

SOLR Analysis Query

I have a SOLR instance with millions of documents. The schema is well defined (i.e. all fields are typed). All the searching/faceting etc. works ok without any issues.
However, I am trying to do something new which I "think" is not supported in current version. I am running SOLR 3.5 on Windows using Jetty.
To simplify the question, my document contains some fields like:
Id,
Name,
City,
JobTitle
Lets say I have a sample data like:
P Wood, London, Director
J Smith, London, Project Manager
D Lock, Brighton, Developer
K Pracy, London, Developer
For the sake of example, assume that this is a matching system which allows people to find each other. Also assume that Id is unique Id.
I want to write a "sampling" query which should find me the set of records that will match other records for any criteria.
So for example, I want to define a criteria like:
Find me the people who will match people in different cities with differfent job titles:
If the above schema was a RDBMS-SQL table (lets say People), the approximate query would be something like this:
SELECT P.Id,
(
SELECT COUNT(1)
FROM People PI Where PI.Id != P.Id
AND PI.City != P.City
AND PI.JobTitle != P.JobTitle
) AS FindCount
FROM
People P
Well, the query may not be workable but you get the idea. Anyway, there are other requirements also that Findcount should be greater than x and less than y.
Can someone let me know if this is possible in SOLR or if this is something not meant for SOLR. I know SOLR 4 is coming with a Join operator but that seems to me more like an IN clause which limits the use. For example, consider that I want the matching Id's also in above query rather than counts.

I don't think that is doable in 1 query and you might end up with running "inner select" as separate query for every person

Search a relation without a second query

My question is about how to perform varying levels of search into a database while limiting the number of queries.
Let's start simple:
#companies = Company.where("active = ?", true)
Let's say we display records from this set. Then, we need:
#clientcompanies = #companies.where("client_id = ?", #client.id)
We display something from #clientcompanies. Then, we want to drill down further.
#searchcompanies = #clientcompanies.where("name LIKE ? OR notes LIKE ?", "#{params[:search]}%", "#{params[:search]}%")
Are these three statements the most efficient way to go about this?
If indeed the database is starting with the entire Company table each time around, is there a way to limit the scope so each of the above statements would take a shorter amount of time as the size of the set diminishes?
In case it matters, I'm running Rails 3 on both MySQL and PostgreSQL.

It doesn't get much more optimized then what you're already doing. Exactly zero of those statements will execute a SQL query until you try to iterate over the results. Calling methods like all, first, inspect, any?, each etc will be when the query is executed.
Each time you chain on a new where or other arel method, it appends to the sql query that it'll execute at the end. If, somewhere in the middle, you want to see the query that'll be executed you can do puts #searchcompanies.to_sql
Note that if you run these commands in the console each statement appears to run a SQL query only because the console automatically runs .inspect on the line you entered.
Hopefully I answered your question :)
There's a great railscast here: http://railscasts.com/episodes/239-activerecord-relation-walkthrough that explains how ActiveRelation works, and what you can do with it.
EDIT:
I may have mis-understood your question. You indicated that after each where call you were displaying information from the query. What's the use-case for this? Are you displaying all companies on the same page that you have filtered-out companies from a search? If you display something from that very first query then you will be pulling every single company row from your database (which is not going to be very scalable or performant at larger quantities of company entries).
Would it not make sense to only display information from the #searchcompanies variable?

How can I calculate successive elements in a database using Rails?

Is there a way I can query the database to return only the elements that have happened successively (in time) according to a set variable?
Specifically, I need to find out how many times the user has won a game in a row. The games are stored in the database with an attribute win_loss (1 is win, 0 is loss).
I would like to see if the user has won 3 games in a row. Is there a way to do this in the database?
If not, what would it look like in the application?
I apologize ahead of time if this is confusing. Please ask questions and I will try to clear it up. I'm using Ruby on Rails.

I would take different approach. I would add a column consecutive_wins into your User model (negative numbers could represent consecutive losses if you need that as well).
EDIT: if you really need to query it...
Two queries:
select the newest victory and newest loose for given user (could be done in one query, in SQL SELECT max(created_at) AS newest, win_loss FROM games GROUP BY win_loss (I can't recall how to do grouping functions in Rails right now but you should get the idea)
and then, regarding the result of query (1):
if both newest win and newest lose are found, count the games with condition that created_at > newest_win AND created_at > newest_lose
if only newest win or lose found, user was always wining or always loosing, just count the number of his games
in other case there were no games
I recommend adding the column to user model, it will be better when you need to display the whole table of gamers and their consecutive wins/loses.

How to Sum calulated fields

I‘d like to ask I question that here that I think would be easy to some people.
Ok I have query that return records of two related tables. (One to many)
In this query I have about 3 to 4 calculated fields that are based on the fields from the 2 tables.
Now I want to have a group by clause for names and sum clause to sum the calculated fields but it ends up in error message saying:
“You tried to execute a query that is not part of aggregate function”
So I decided to just run the query without the totals *(ie no group by , sum etc,,,)
:
And then I created another query that totals my previous query. ( i.e. using group by clause for names and sum for calculated fields… no calculation here) This is fine ( I use to do this) but I don’t like having two queries just to get summary total. Is their any other way of doing this in the design view and create only one query?.
I would very much appreciate.
Thankyou:
JM

Sounds like the query is thinking the calculated fields need to be part of the grouping or something. You might need to look into sub-querying.
Can you post the sql (before and after). It would help in getting an understanding of what the issue is.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart