Rails & Mongoid - Merging multiple criteria - ruby-on-rails

It took me a bit to figure this out and I'm sure others out there are curious how to do this as well..
I have a case where I need to run an .and() query using user input that I converted to an array. My problem was that the query was searching each and every field for BOTH words that was read in from the input.

So what I did is broke up the queries based on the fields. I.e. if you have the fields :tags, :story, :author you would have 3 queries, tag_query = Book.any_in(:tags => #user_search)
I created an empty hash conditions = {}
Then I would merge each query to the conditions hash using conditions.merge!(tag_query.selector)
I decided which queries to merge by checking whether the query returned any Book documents: tag_query.exists ? conditions.merge!(tag_query.selector) : nil. If the query returned a Book document it was merged to the hash, if not then nothing happens.
The final step is running the actual query we care about.. #book = Book.where(conditions). Doing this is combining all the queries that actually found something and smashing them together just like an .and() query!
Instead of returning 0 because both words weren't found in each field, this intelligently brings together the fields that actually did find something and makes it so it only counts where both words have been found in the whole document.

Related

Analyze similarities in model data using Elasticsearch and Rails

I would like to use Elasticsearch to analyze data and display it to the user.
When a user views a record for a model, I want to display a list of 'similar' records in the database for that model, and the percentage of similarity. This would match against every field on the model.
I am aware that with the Searchkick gem I can use a command to find similar records:
product = Product.first
product.similar(fields: ["name"], where: {size: "12 oz"})
I would like to take this further and compare entire records (and eventually associations).
Is this feasible with Elasticsearch / Searchkick in Rails, or should I use another method to analyze the data?
There is a feature built exactly for this purpose in Elasticsearch called more_like_this. The documentation for the mlt query goes into great details about how you can achieve exactly what you want to do.
The content you provide to the like field will be analyzed and the most relevant terms for each field will be used to retrieve documents with as many of those relevant terms. If you have all your records stored in Elasticsearch, you can use the Multi GET syntax to specify a document already in your index as content of the like field like this:
"like" : [
{
"_index" : "model",
"_type" : "model",
"_id" : "1"
}
]
Remember that you cannot use index aliases when using this syntax (so you'll have to do a document lookup first if you are not sure which index your document is currently residing in).
If you don't specify the fields field, all fields in the source document will be used. My suggestion to avoid bad surprises, is to always specify the list of fields you want your similar documents to match.
If you have non-textual fields that you want to match perfectly with the source document, you might want to consider using a bool query, programmatically creating the filter section to limit documents returned by the mlt query to only a filtered subset of your entire index.
You can build these queries in Searchkick using the advanced search feature, manually specifying the body of search requests.
Read up on using More Like This Query. This is the query produced by product.similar(). It operates only on text fields. If you also want to compare numeric or date fields, you'll have to incorporate these rules into a scoring script to do what you're asking.

Chain of multiple where Active Record clauses not working

I am fairly new to Ruby on Rails and ActiveRecord. I have a database model named location which describes a point of interest on a map. A location has a location_type field that can have three different location types (business, dispensary or contact). A location also has an owner_id as well which is the user_id of the user who created the location.
In the controller the user requests all of their locations by providing their ID. The dispensary and business locations are public so all users should be able to view them, while the contacts should only be shown to the user who is the owner of them. Therefore I am tasked with creating an ActiveRecord query that returns all dispensaries and businesses in the database and all contacts that were created by that user. I was trying to do this by chaining together where clauses but for some reason this has failed:
#locations = Location.where(:location_type => ["business", "dispensary"]).where(:location_type => "contact", :owner_id => params[:id])
Which generates this PostgreSQL:
SELECT "locations".* FROM "locations" WHERE "locations"."location_type" IN ('business', 'dispensary') AND "locations"."location_type" = 'contact' AND "locations"."owner_id" = 1
I suspect that this failed because the first where returns just the locations of type business and dispensary and the second where queries that returned data which has no locations of type contact within it. How can I query for all dispensaries and businesses combined with a set of filtered contacts?
Chaining where calls like that will result in ANDs at the SQL level. where can take raw SQL for an argument, in which you could explicitly add an OR, but parameterizing it properly is rather messy IMO (although it can be done). So, for this type of query, I think it would probably be best to drop down into using raw SQL with sanitized inputs (to guard against SQL injection).
i.e. something like this:
x = ActiveRecord::Base.connection.raw_connection.prepare(
"SELECT * FROM locations
WHERE location_type IN ('business', 'dispensary') OR
(location_type = 'contact' AND owner_id = ?)")
x.execute(params[:id])
x.close
This will select all items from the locations table where the location_type is either 'business' or 'dispensary', regardless of the owner_id, and all items where the location_typeis 'contact' where theowner_id` matches the one passed in.
Edit in response to comment from OP:
I tend to prefer raw SQL whenever possible for more complex queries, as I find it easier to control the behavior (ORMs can sometimes do things that are less than desirable, such as executing the same query 1000 times to get 1000 entries instead of one SQL query once, resulting in terrible performance), however, if you'd prefer stay within the bounds of ActiveRecord, you can use the form of where that takes arguments. It'll be somewhat raw SQL, in that you need to specify the where clause yourself, but you won't need to get a raw_connection and explicitly execute -- it'll work within the framework of the ActiveRecord query you were doing.
So, that would look something like this:
#locations = Location.where("location_type IN ('business', 'dispensary') OR
(location_type = 'contact' AND owner_id = ?)", params[:id])
See this Active Record guide page for more info, section 2.2.
Edit in response to follow-up question from OP:
Regarding the ? in the SQL, you can think of it as a placeholder of sorts (there's really no formatting to be done with it, but rather signifies a parameter goes there).
The reason it's important is that when a ? is placed in the query and then the actual value you want to use is passed as an argument to where (and certain other functions as well), the underlying SQL driver will interpolate the parameter into the query in such a way that prevents SQL injection, which could allow for all kinds of different problems. If you were to instead do the interpolation yourself directly into the query string, you would still be potentially susceptible to SQL injection. So not only is ? safe from SQLI, it's specifically intended to prevent it.
You can have a bunch of ? in your query, as long as you pass the corresponding number of parameters as arguments after the query string (otherwise the SQL driver should error out).

Ruby map/collect is slow on a large Mongoid collection

I have a Mongoid collection on which I run a where query.
Now, I would like to build an array containing a values of a specific field from all the documents in the collection.
e.g. if my Monogid model is
class Foo
field :color, type: String
end
I'd like to do something like this -
red_ducks = Foo.where(color: 'red')
red_duck_ids = red_ducks.map(&:_id)
Unfortunately, when the result of the query is large it takes a long time. It takes 6 seconds for 10,000 documents in my case, for example.
Is there any way to speed this up?
Can't you just call distinct on your scope with _id as an attribute?
red_duck_ids = Foo.where(color: 'red').distinct(:_id)
Which will return you a list of all _ids that meet your conditions. You can find more information on Mongo's distinct documentation.
You can also have a look at only and if you are using version 3.1 or newer you can also use Criteria#pluck.
have you tried
Foo.where(color: 'red').pluck(:id)
might be faster (not sure)

How to check for uniqueness on existing data in Ruby on Rails?

validates :uniqueness => true is the quick and effective way to enforce uniqueness before saving a model.
Is there an easy way to check uniqueness of existing data? Iterating through each item and comparing it to the others seems so clunky...
I most often do this with direct SQL - but here's a Rails (version 3, using arel) way of doing it - use the group operation and get a count of the field you want.
e.g. if I have a bunch of Events - and I wanted to get a count of events that have unique titles:
Event.group(:title).count
That returns a ActiveSupport::OrderedHash of the title and the count found. You can then do a select on the hash to filter out the list of tiles. e.g.
Event.group(:title).count.select{|title,count| count >= 2}
That gives you the titles that you can go back and do something with, finding each and deleting one, etc.
You can also do a "having" operation (which I do in raw sql) e.g.
Event.group(:title).having('count(title) >= 2')
which is:
SELECT `events`.* FROM `events` GROUP BY title HAVING count(title) >= 2
in SQL
The nice thing about that is you get the full object list that you can enumerate over without going back to the db and delete items, print timestamps, whatever.
It's just a little harder to look at in the console because the full record is loaded and not just a hash of the titles and counts.

Ruby on Rails/will_paginate: Ordering by custom columns

I've got a rather complicated SQL-Query whose results should be paginated and displayed to the user. I don't want to go into details here, but to put it simple I just select all columns of a model, and an additional column, that is used just for sorting purposes.
Note.select( ['notes.*', "<rather complicated clause> AS 'x'"] ).joins(...)...
After some .join()'s and a .group(), I finally call .order() and .paginate on the relation. If I order by any of the model's natural columns everything works fine, if I however order by the "artificial" column x, rails gives me the error:
no such column: x
This seems to occur because will_paginate seems to do a COUNT(*)-statement before getting the actual data, simply to get the amounts of data it has to process. This COUNT(*)-statement is (of course) generated from the original statement, which includes the ORDER BY x. The problem now is, that will_paginate keeps the ORDER BY in the statement but simply replaces the column definitions with COUNT(*) and thus cannot find the column x.
Using Rails 3 and will_paginate 3.0.pre2.
Is there any way to get around this?
thx for any help
You can disable the initial count query by passing in the value manually:
total_entries = Note.select(...).joins(...).count
Note.select( ... ).joins(...).group().paginate(:total_entries => total_entries)
The above is not a literal code sample and you may need to tweak your count query to get the correct results for your join. This should let you do a straight up complex query on the pagination.

Resources