Ruby map/collect is slow on a large Mongoid collection - ruby-on-rails

I have a Mongoid collection on which I run a where query.
Now, I would like to build an array containing a values of a specific field from all the documents in the collection.
e.g. if my Monogid model is
class Foo
field :color, type: String
end
I'd like to do something like this -
red_ducks = Foo.where(color: 'red')
red_duck_ids = red_ducks.map(&:_id)
Unfortunately, when the result of the query is large it takes a long time. It takes 6 seconds for 10,000 documents in my case, for example.
Is there any way to speed this up?

Can't you just call distinct on your scope with _id as an attribute?
red_duck_ids = Foo.where(color: 'red').distinct(:_id)
Which will return you a list of all _ids that meet your conditions. You can find more information on Mongo's distinct documentation.
You can also have a look at only and if you are using version 3.1 or newer you can also use Criteria#pluck.

have you tried
Foo.where(color: 'red').pluck(:id)
might be faster (not sure)

Related

Ruby's .where vs. detect

I'm looking for a method that is faster and uses less server processing. In my application, I can use both .where and .detect:
Where:
User.where(id: 1)
# User Load (0.5ms)
Detect:
User.all.detect{ |u| u.id == 1 }
# User Load (0.7ms). Sometimes increases more than .where
I understand that .detect returns the first item in the list for which the block returns TRUE but how does it compares with .where if I have thousands of Users?
Edited for clarity.
.where is used in this example because I may not query for the id alone. What if I have a table column called "name"?
In this example
User.find(1) # or
User.find_by(id: 1)
will be the fastest solutions. Because both queries tell the database to return exactly one record with a matching id. As soon as the database finds a matching record, it doesn't look further but returns that one record immediately.
Whereas
User.where(id: 1)
would return an array of objects matching the condition. That means: After a matching record was found the database would continue looking for other records to match the query and therefore always scan the whole database table. In this case – since id is very likely a column with unique values – it would return an array with only one instance.
In opposite to
User.all.detect { |u| u.id == 1 }
that would load all users from the database. This will result in loading thousands of users into memory, building ActiveRecord instances, iterating over that array and then throwing away all records that do not match the condition. This will be very slow compared to just loading matching records from the database.
Database management systems are optimized to run selection queries and you can improve their ability to do so by designing a useful schema and adding appropriate indexes. Every record loaded from the database will need to be translated into an instance of ActiveRecord and will consume memory - both operations are not for free. Therefore the rule of thumb should be: Whenever possible run queries directly in the database instead of in Ruby.
NB One should use ActiveRecord#find in this particular case, please refer to the answer by #spickermann instead.
User.where is executed on DB level, returning one record.
User.all.detect will return all the records to the application, and only then iterate through on ruby level.
That said, one must use where. The former is resistant to an amount of records, there might be billions and the execution time / memory consumption would be nearly the same (O(1).) The latter might even fail on billions of records.
Here's a general guide:
Use .find(id) whenever you are looking for a unique record. You can use something like .find_by_email(email) or .find_by_name(name) or similar (these finders methods are automatically generated) when searching non-ID fields, as long as there is only one record with that particular value.
Use .where(...).limit(1) if your query is too complex for a .find_by query or you need to use ordering but you are still certain that you only want one record to be returned.
Use .where(...) when retrieving multiple records.
Use .detect only if you cannot avoid it. Typical use cases for .detect are on non-ActiveRecord enumerables, or when you have a set of records but are unable to write the matching condition in SQL (e.g. if it involves a complex function). As .detect is the slowest, make sure that before calling .detect you have used SQL to narrow down the query as much as possible. Ditto for .any? and other enumerable methods. Just because they are available for ActiveRecord objects doesn't mean that they are a good idea to use ;)

Rails & Mongoid - Merging multiple criteria

It took me a bit to figure this out and I'm sure others out there are curious how to do this as well..
I have a case where I need to run an .and() query using user input that I converted to an array. My problem was that the query was searching each and every field for BOTH words that was read in from the input.
So what I did is broke up the queries based on the fields. I.e. if you have the fields :tags, :story, :author you would have 3 queries, tag_query = Book.any_in(:tags => #user_search)
I created an empty hash conditions = {}
Then I would merge each query to the conditions hash using conditions.merge!(tag_query.selector)
I decided which queries to merge by checking whether the query returned any Book documents: tag_query.exists ? conditions.merge!(tag_query.selector) : nil. If the query returned a Book document it was merged to the hash, if not then nothing happens.
The final step is running the actual query we care about.. #book = Book.where(conditions). Doing this is combining all the queries that actually found something and smashing them together just like an .and() query!
Instead of returning 0 because both words weren't found in each field, this intelligently brings together the fields that actually did find something and makes it so it only counts where both words have been found in the whole document.

Get array of attribute values, along with an attribute from a join

I've gotten close, I believe. My current query is this
items = Item.select("items.icon, items.name, item_types.name AS type, items.level, items.rarity, items.vendor_value")
.joins(:item_type)
.where("item_types.name = '#{params[:item_type]}'")
This gets me an array of Item objects that at least respond to :type with the item_type.name.
What I am looking for is an array of arrays that look so:
[icon, name, item_type.name, level, rarity, vendor_value]
I've already had it working fairly easily, but it is important to me that this be done in one fell swoop via sql, instead of creating a map afterwards, because there are times where I need to respond with 40k+ items and need this to be as fast as possible.
Not sure how to go from the above to an array of attributes, without performing a map.
Thanks for your help!
The pluck method does precisely what you want. In your case, it would look like this:
items = Item.joins(:item_type)
.where("item_types.name = ?", params[:item_type])
.pluck("items.icon", "items.name", "item_types.name AS type",
"items.level", "items.rarity", "items.vendor_value")
I also changed the where call to use parameterization instead of string interpolation—interpolation isn't recommended, especially when you're getting a value from the user.
Further reading:
Official documentation for pluck
An in-depth explanation of how to use pluck

Can I use limits, orders or date queries with the Ruby DBF gem?

I'm referring to this gem: https://github.com/infused/dbf
I've read the readme and scanned through the API documentation. It feels like I should be able to use ActiveRecord style queries on a DBF Table but it doesn't look like it.
I'm hoping to get the last X records, query by date or use order in some way to help with synching the DBF file to another database without going through all of it.
The only examples seem to be simple "finds" and I can't get any comparison or otherwise to work:
widgets.find :first, :slot_number => 's42'
Does anyone know how to do this? (another gem/technique is fair suggestion too).
While you cannot use ActiveRecord style queries, DBF::Table is an Enumerable, so you can utilize any of the enumerable methods to do most of what you want.
For example, let's say you only want widgets that were created between 1/1/2005 and 7/15/2005. Let's also assume that the widgets table has a created_date column. You could do something like:
range = (Date.new(2005,1,1)..Date.new(2005,7,15))
widgets.select { |w| range.includes?(w.created_date) }
Or to sort all widgets by their size column:
sorted_widgets = widgets.sort_by { |w| w.size }
If your DBF file is not too huge, you can also convert it to an array to get access to all of Ruby's Array methods. For example, let's get the last 10 records:
widgets.to_a.last(10)

Select Rails objects that include value in array attribute

I have a Rails model called Box. Each Box object has a column :products, which is an array of strings that includes all the products that are being stored inside it at the time.
For each Box object, it is possible that the same value was stored in another Box.
Is there a query I can use to return all the Boxes that have value x stored in :products?
I know "where" works for finding objects with certain values, and with an array you might use "include?", but I'm having trouble working out a way to use either in this case, if it's at all possible.
There was an answer posted here before that worked well enough, but I looked around and found another query that was more succinct.
selected_boxes = Box.where("?=ANY(products)", x)
Where x is the value you are seeking in each object.
Scope!
scope :contains, ->(items) { where("products LIKE ?", "%#{items.to_yaml}%") } # items is an array of your potential strings
So you'd call this as Box.contains(%w(foo bar)) or Box.contains(['some thing'])
Passing the array should let you search for multiple items at a time...
You can name the scope anything you want, obviously
LIKE for mySQL... ILIKE for postgreSQL

Resources