Avoid heroku R14 error with large create loop - ruby-on-rails

I have an async Resque job that creates many associated objects inside a loop that I can't seem to avoid heroku's ever-popular R14 error with.
has_many :associated_things
...
def populate_things
reference_things = ReferenceThings.where(some_criteria).map(&:name) # usually between 10 k and 20k strings
reference_things.each do |rt|
self.associated_things << AssociatedThing.create name: rt
end
end
Some things I've tried:
wrapping the create loop in an ActiveRecord::Base.uncached block
manually running GC.start at the end of the loop
adding an each_slice before .each
Is there a way to rewrite this loop to minimize memory usage?

#Alex Peachey had some good suggestions, but ultimately, #mu had the right idea in the first comment.
Transitioning to raw SQL is the only way I could find to make this work. Some suggested methods are here:
http://coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
I used the mass insert method and it works fine.
It should be said that it's far from clear to me why this is necessary. Apparently instantiating hundreds of thousands of AR objects -- even outside of a web request, asynchronously -- causes a memory leak. Maybe this just simply isn't the sort of thing Rails/AR was designed to do.
Related question, perhaps the same issue: ActiveRecord bulk data, memory grows forever

Some ideas that may help:
Since you are just pulling name from ReferenceThings, don't grab the full object and then just grab the name. Instead do something like this:
reference_things = ReferenceThings.where(some_criteria).pluck(:name)
That will do a better query grabbing just the names and give you an array. Much cheaper memory wise.
I noticed you are putting all the AssociatedThings you are creating into an array as you go. If you don't actually need an array of them then just creating them will be better. If you do need them, depending on what you need them for you could create them all and then query the database to grab them again and loop over them with find_each which will grab them in batches.

Related

Hash/Array to Active Record

I have been searching everywhere but I can't seem to find this problem anywhere. In Rails 5.0.0.beta3 I need to sort a my #record = user.records with an association and it's record.
The sort goes something like this.
#record = #record.sort_by { |rec|
If user.fav_record.find(rec.id)
User.fav_record(rec.id).created_at
Else
rec.created_at
End
This is just an example of what I do. But everything sorts fine.
The problem:
This returns an array and not an Active Record Class.
I've tried everything to get this to return an Active Record Class. I've pushed the sorted elements into an ID array and tried to extract it them in that order, I've tried mapping. Every result that I get turns my previous active record into an array or hash. Now I need it to go back into an active record. Does anyone know how to convert an array or hash of that active record back into an Active Record class?
There isn't a similarly easy way to convert ActiveRecord to array.
If you want to optimize the performance of your app, you should try to avoid converting arrays to ActiveRecord queries. Try and keep the object as a query as long as possible.
That being said, working with arrays is generally easier than queries, and it can feel like a hassle to convert a lot of array operations to ActiveRecord query (SQL) code.
It'd be better to write the sort code using ActiveRecord::Query methods or even writing it in plain SQL using find_by_sql.
I don't know what code you should specifically use here, but I do see that your code could be refactored to be clearer. First of all, If and Else should not be capitalized, but I'm assuming that this is just pseudocode and you already realize this. Second, your variable names should be pluralized if they are queries or arrays (i.e. #record.sort_by should be #records.sort_by instead).
It's worth mentioning that ActiveRecord queries are difficult to master and a lot of people just use array operations instead since they're easier to write. If "premature optimization is the root of all evil", it's really not the end of the world if you sacrifice a bit of performance and just keep your array implementation if you're just trying to make an initial prototype. Just make sure that you're not making "n+1" SQL calls, i.e. do not make a database call every iteration of your loop.
Here's an example of an array implementation which avoids the N+1 SQL issue:
# first load all the user's favorites into memory
user_fav_records = user.fav_records.select(:id, :created_at)
#records = #records.sort_by do |record|
matching_rec = user.fav_records.find { |x| x.id.eql?(rec.id) }
# This is using Array#find, not the ActiveRecord method
if matching_rec
matching_rec.created_at
else
rec.created_at
end
end
The main difference between this implementation and the code in your question is that I'm avoiding calling ActiveRecord's find each iteration of the loop. SQL read/writes are computationally expensive, and you want your code to make as little of them as possible.

Query array vs. Active Record relation

How can I call something like all on this? I would like to call the check_other_notification method on all the notifications that query return.
Could someone suggest a good description on this Active Relation vs array topic? I read about it in many different places but I'm still a bit confused.
Notification
.between_other_recipient(current_user, #user)
.last
.check_other_notification
As I understand, you want to call a check_other_notification method on each object, returned by the query.
If so, use find_each for this:
Notification
.between_other_recipient(current_user, #user)
.find_each do |notification|
notification.check_other_notification
end
find_each if very efficient method, since it process objects in batches (by default the batch size is 1000 records, but you can specify any other amount).
In your case each would work, since I do not think there are hundreds of thousands of notifications, but if so - find_each is a perfect match.
Edit
Difference between collect and find_each.
Quoting docs on find_each:
find_each is only intended to use for batch processing of large
amounts of records that wouldn’t fit in memory all at once. If you
just need to loop over less than 1000 records, it’s probably better
just to use the regular find methods.
If you use collect (map), which is a method from Array class - it would first load the whole collection of records into the memory before processing. This can eat to much memory and lead to problems, when the collection is big.
Important point is: do not use Ruby to process database stuff, when it is possible to use ORM (when not, SQL will do).
Here is a short article showing few examples of using Array's vs AR's methods, and also describing few other things to be aware of when querying AR collection.

.map(&:dup) Calculations Slow

I have an ActiveRecord query user.loans, and am using user.loans.map(&:dup) to duplicate the result. This is so that I can loop through each Loan (100+ times) and run several calculations.
These calculations take several seconds longer compared to when I run them directly on user.loans or user.loans.dup. If I do this however, all queries user.loans are affected, even when querying with different methods.
Is there an alternative to .map(&:dup) that can achieve the same result with faster calculations? I'd like to preserve the relations so that I can retrieve associated records to each Loan.
The fastest way you can achieve what you want is making calculations directly on ActiveRecord, this way you would not have to loop through resulting Array.
If you still want to loop through Array elements, maybe you should not use map to duplicate each Array element. You could use each instead, which does not affect original Array element. Here is what I think you should do:
def calculate_loans
calculated_loans = Array.new
user.loans.each do |loan|
# Here you make your calculations. For example:
calculated_loans.push(loan.value += 10)
end
calculated_loans
end
This way, you will have original user.loans elements, and a duplicated Array with calculated_loans.
Please, let me know if this improve your performance :)
To resolve conflicts with other calls to user.loans, I wound up using user.loans.reload in the Presenter I have for this particular view. This way I was able to continue making calculations directly on Active Record elsewhere(per Daniel Batalla's suggestion), but without the conflicts I mentioned in my original question.

Display a record sequentially with every refresh

I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.

Ruby on Rails: Model.all.each vs find_by_sql("SELECT * FROM model").each?

I'm fairly new to RoR. In my controller, I'm iterating over every tuple in the database. For every table, for every column I used to call
SomeOtherModel.find_by_sql("SELECT column FROM model").each {|x| #etc }
which worked fine enough. When I later changed this to
Model.all(:select => "column").each {|x| #etc }
the loop starts out at roughly the same speed but quickly slows down to something like 100 times slower than the the find_by_sql command. These calls should be identical so I really don't know what's happening.
I know these calls are not the most efficient but this is just an intermediate step and I will optimize it more once this works correctly.
So to clarify: Why in the world does calling Model.all.each run so much slower than using find_by_sql.each?
Thanks!
Both calls do make the same SQL call, so they both should be roughly the same speed. Model.all does go through an extra level of indirection. I think that Rails in all collects all the rows from the DB connection, then calls that collection as a param to another method that loops over that collection and creates the model objects. In find_by_sql it does it all in 1 method. Maybe that is a little faster. How large is your data set? If the set is over 100 or so records you shouldn't use all, but use find_in_batches. It uses limits and offsets to run your code in batches and not load the entire table at once into memory. You do need to include the primary key in the select since it uses that to do the order and limit. Example:
Model.find_in_batches(:batch_size => 100) do |group|
group.each {|item| item.do_something_interesting }
end
ScottD is correct. The find_by_sql call does not create the slowdown you are seeing B_. The speed issues are due to the fact that Model.all loads a model instance into memory for every single record fetched. This can result in complete failure with a large enough result set. FYI, this is covered very well in the Rails Guides here: http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects

Resources