Query array vs. Active Record relation - ruby-on-rails

How can I call something like all on this? I would like to call the check_other_notification method on all the notifications that query return.
Could someone suggest a good description on this Active Relation vs array topic? I read about it in many different places but I'm still a bit confused.
Notification
.between_other_recipient(current_user, #user)
.last
.check_other_notification

As I understand, you want to call a check_other_notification method on each object, returned by the query.
If so, use find_each for this:
Notification
.between_other_recipient(current_user, #user)
.find_each do |notification|
notification.check_other_notification
end
find_each if very efficient method, since it process objects in batches (by default the batch size is 1000 records, but you can specify any other amount).
In your case each would work, since I do not think there are hundreds of thousands of notifications, but if so - find_each is a perfect match.
Edit
Difference between collect and find_each.
Quoting docs on find_each:
find_each is only intended to use for batch processing of large
amounts of records that wouldn’t fit in memory all at once. If you
just need to loop over less than 1000 records, it’s probably better
just to use the regular find methods.
If you use collect (map), which is a method from Array class - it would first load the whole collection of records into the memory before processing. This can eat to much memory and lead to problems, when the collection is big.
Important point is: do not use Ruby to process database stuff, when it is possible to use ORM (when not, SQL will do).
Here is a short article showing few examples of using Array's vs AR's methods, and also describing few other things to be aware of when querying AR collection.

Related

Is there a lazy select_all option in Rails?

I've got a complicated query that I need to run, and it can potentially yield a large result set. I need to iterate linearly through this result set in order to crunch some numbers.
I'm executing the query like so:
ActiveRecord::Base.connection.select_all(query)
find_in_batches Won't work for my use case, as it's critical that I get the records in a custom order. Also, my query returns some fields that aren't part of any models, so I need to get the records as hashes.
The problem is, select_all is not lazy (from what I can tell). It loads all of the records into memory. Does Rails have a way to lazily get the results for a custom SQL query? .lazy doesn't seem applicable here, as I need custom ordering of the results.
This is possible in other languages (C#, Haskell, JavaScript), so it seems like it would be possible in Ruby.
Not sure but maybe you're asking for eager_load or preload.
http://blog.arkency.com/2013/12/rails4-preloading/
Hope this can help you.
You can try find_each or find_in_batches ActiveRecord methods.
Both query database in configurable-sized batches.
The difference it that find_each yields objects one-by-one to block (they are lazy initialized).
find_in_batches yields whole batch group.
If you can't use above methods due to custom sorting, what you can do is query the database using limit and offset. This way you will deal with data in portions. Memory consumption will decrease, but number of queries will increase.
Other solution may be to let database engine perform arithmetic operations, that you need and return calculated result.

Ruby's .where vs. detect

I'm looking for a method that is faster and uses less server processing. In my application, I can use both .where and .detect:
Where:
User.where(id: 1)
# User Load (0.5ms)
Detect:
User.all.detect{ |u| u.id == 1 }
# User Load (0.7ms). Sometimes increases more than .where
I understand that .detect returns the first item in the list for which the block returns TRUE but how does it compares with .where if I have thousands of Users?
Edited for clarity.
.where is used in this example because I may not query for the id alone. What if I have a table column called "name"?
In this example
User.find(1) # or
User.find_by(id: 1)
will be the fastest solutions. Because both queries tell the database to return exactly one record with a matching id. As soon as the database finds a matching record, it doesn't look further but returns that one record immediately.
Whereas
User.where(id: 1)
would return an array of objects matching the condition. That means: After a matching record was found the database would continue looking for other records to match the query and therefore always scan the whole database table. In this case – since id is very likely a column with unique values – it would return an array with only one instance.
In opposite to
User.all.detect { |u| u.id == 1 }
that would load all users from the database. This will result in loading thousands of users into memory, building ActiveRecord instances, iterating over that array and then throwing away all records that do not match the condition. This will be very slow compared to just loading matching records from the database.
Database management systems are optimized to run selection queries and you can improve their ability to do so by designing a useful schema and adding appropriate indexes. Every record loaded from the database will need to be translated into an instance of ActiveRecord and will consume memory - both operations are not for free. Therefore the rule of thumb should be: Whenever possible run queries directly in the database instead of in Ruby.
NB One should use ActiveRecord#find in this particular case, please refer to the answer by #spickermann instead.
User.where is executed on DB level, returning one record.
User.all.detect will return all the records to the application, and only then iterate through on ruby level.
That said, one must use where. The former is resistant to an amount of records, there might be billions and the execution time / memory consumption would be nearly the same (O(1).) The latter might even fail on billions of records.
Here's a general guide:
Use .find(id) whenever you are looking for a unique record. You can use something like .find_by_email(email) or .find_by_name(name) or similar (these finders methods are automatically generated) when searching non-ID fields, as long as there is only one record with that particular value.
Use .where(...).limit(1) if your query is too complex for a .find_by query or you need to use ordering but you are still certain that you only want one record to be returned.
Use .where(...) when retrieving multiple records.
Use .detect only if you cannot avoid it. Typical use cases for .detect are on non-ActiveRecord enumerables, or when you have a set of records but are unable to write the matching condition in SQL (e.g. if it involves a complex function). As .detect is the slowest, make sure that before calling .detect you have used SQL to narrow down the query as much as possible. Ditto for .any? and other enumerable methods. Just because they are available for ActiveRecord objects doesn't mean that they are a good idea to use ;)

.map(&:dup) Calculations Slow

I have an ActiveRecord query user.loans, and am using user.loans.map(&:dup) to duplicate the result. This is so that I can loop through each Loan (100+ times) and run several calculations.
These calculations take several seconds longer compared to when I run them directly on user.loans or user.loans.dup. If I do this however, all queries user.loans are affected, even when querying with different methods.
Is there an alternative to .map(&:dup) that can achieve the same result with faster calculations? I'd like to preserve the relations so that I can retrieve associated records to each Loan.
The fastest way you can achieve what you want is making calculations directly on ActiveRecord, this way you would not have to loop through resulting Array.
If you still want to loop through Array elements, maybe you should not use map to duplicate each Array element. You could use each instead, which does not affect original Array element. Here is what I think you should do:
def calculate_loans
calculated_loans = Array.new
user.loans.each do |loan|
# Here you make your calculations. For example:
calculated_loans.push(loan.value += 10)
end
calculated_loans
end
This way, you will have original user.loans elements, and a duplicated Array with calculated_loans.
Please, let me know if this improve your performance :)
To resolve conflicts with other calls to user.loans, I wound up using user.loans.reload in the Presenter I have for this particular view. This way I was able to continue making calculations directly on Active Record elsewhere(per Daniel Batalla's suggestion), but without the conflicts I mentioned in my original question.

Avoid heroku R14 error with large create loop

I have an async Resque job that creates many associated objects inside a loop that I can't seem to avoid heroku's ever-popular R14 error with.
has_many :associated_things
...
def populate_things
reference_things = ReferenceThings.where(some_criteria).map(&:name) # usually between 10 k and 20k strings
reference_things.each do |rt|
self.associated_things << AssociatedThing.create name: rt
end
end
Some things I've tried:
wrapping the create loop in an ActiveRecord::Base.uncached block
manually running GC.start at the end of the loop
adding an each_slice before .each
Is there a way to rewrite this loop to minimize memory usage?
#Alex Peachey had some good suggestions, but ultimately, #mu had the right idea in the first comment.
Transitioning to raw SQL is the only way I could find to make this work. Some suggested methods are here:
http://coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
I used the mass insert method and it works fine.
It should be said that it's far from clear to me why this is necessary. Apparently instantiating hundreds of thousands of AR objects -- even outside of a web request, asynchronously -- causes a memory leak. Maybe this just simply isn't the sort of thing Rails/AR was designed to do.
Related question, perhaps the same issue: ActiveRecord bulk data, memory grows forever
Some ideas that may help:
Since you are just pulling name from ReferenceThings, don't grab the full object and then just grab the name. Instead do something like this:
reference_things = ReferenceThings.where(some_criteria).pluck(:name)
That will do a better query grabbing just the names and give you an array. Much cheaper memory wise.
I noticed you are putting all the AssociatedThings you are creating into an array as you go. If you don't actually need an array of them then just creating them will be better. If you do need them, depending on what you need them for you could create them all and then query the database to grab them again and loop over them with find_each which will grab them in batches.

why is Model.all different to Model.where('true') in rails 3

I have a query, which works fine:
ModelName.where('true')
I can chain this with other AR calls such as where, order etc. However when I use:
ModelName.all
I receive the "same" response but can't chain a where or order to it as it's an array rather than a AR collection.
Whereas I have no pragmatic problem using the first method it seems a bit ugly/unnecessary. Is there a cleaner way of doing this maybe a .to_active_record_collection or something?
There is an easy solution. Instead of using
ModelName.where('true')
Use:
ModelName.scoped
As you said:
ModelName.where('true').class #=> ActiveRecord::Relation
ModelName.all.class #=> Array
So you can make as many lazy loading as long as you don't use all, first or last which trigger the query.
It's important to catch these differences when you consider caching.
Still I can't understand what kind of situation could lead you to something like:
ModelName.all.where(foobar)
... Unless you need the whole bunch of assets for one purpose and get it loaded from the database and need a subset of it to other purposes. For this kind of situation, you'd need to use ruby's Array filtering methods.
Sidenote:
ModelName.all
should never be used, it's an anti-pattern since you don' control how many items you'll retrieve. And hopefully:
ModelName.limit(20).class #=> ActiveRecord::Relation
As you said, the latter returns an array of elements, while the former is an ActiveRecord::Relation. You can order and filter array using Ruby methods. For example, to sort by id you can call sort_by(&:id). To filter elements you can call select or reject. For ActiveRecord::Relation you can chain where or order to it, as you said.
The difference is where the sorting and processing goes. For Array, it is done by the application; for Relation - by the database. The latter is usually faster, when there is more records. It is also more memory efficient.

Resources