Ruby on Rails #find_each and #each. is this bad practice? - ruby-on-rails

I know that find_each has been designed to consume smaller memory than each.
I found some code that other people wrote long ago. and I think that it's wrong.
Think about this codes.
users = User.where(:active => false) # What this line does actually? Nothing?
users.find_each do |user|
# update or do something..
user.update(:do_something => "yes")
end
in this case, It will store all user objects to the users variable. so we already populated the full amount of memory space. There is no point using find_each later on.
Am I correct?
so in other words, If you want to use find_each, you always need to use it with ActiveRecord::Relation object. Like this.
User.where(:active => false).find_each do |user|
# do something...
end
What do you think guys?
Update
in users = User.where(:active => false) line,
Some developer insists that rails never execute query unless we don't do anything with that variable.
What if we have a class with initialize method that has query?
class Test
def initialize
#users = User.where(:active => true)
end
def do_something
#user.find_each do |user|
# do something really..
end
end
end
If we call Test.new, what would happen? Nothing will happen?

users = User.where(:active => false) doesn't run a query against the database and it doesn't return an array with all inactive users. Instead, where returns an ActiveRecord::Relation. Such a relation basically describes a database query that hasn't run yet. The defined query is only run against the database when the actual records are needed. This happens for example when you run one of the following methods on that relation: find, to_a, count, each, and many others.
That means the change you did isn't a huge improvement, because it doesn't change went and how the database is queried.
But IMHO that your code is still slightly better because when you do not plan to reuse the relation then why assign it to a variable in the first place.

users = User.where(:active => false)
users.find_each do |user|
User.where(:active => false).find_each do |user|
Those do the same thing.
The only difference is the first one stores the ActiveRecord::Relation object in users before calling #find_each on it.
This isn't a Rails thing, it applies to all of Ruby. It's method chaining common to most object-oriented languages.
array = Call.some_method
array.each{ |item| do_something(item) }
Call.some_method.each{ |item| do_something(item) }
Again, same thing. The only difference is in the first the intermediate array will persist, whereas in the second the array will be built and then eventually deallocated.
If we call Test.new, what would happen? Nothing will happen?
Exactly. Rails will make an ActiveRecord::Relation and it will defer actually contacting the database until you actually do a query.
This lets you chain queries together.
#inactive_users = User.where(active: false).order(name: :asc)
Later you can to the query
# Inactive users whose favorite color is green ordered by name.
#inactive_users.where(favorite_color: :green).find_each do |user|
...
end
No query is made until find_each is called.
In general, pass around relations rather than arrays of records. Relations are more flexible and if it's never used there's no cost.
find_each is special in that it works in batches to avoid consuming too much memory on large tables.
A common mistake is to write this:
User.where(:active => false).each do |user|
Or worse:
User.all.each do |user|
Calling each on an ActiveRecord::Relation will pull all the results into memory before iterating. This is bad for large tables.
find_each will load the results in batches of 1000 to avoid using too much memory. It hides this batching from you.
There are other methods which work in batches, see ActiveRecord::Batches.
For more see the Rails Style Guide and use rubocop-rails to scan your code for issues and make suggestions and corrections.

Related

Ruby on Rails - ActiveRecord::Relation count method is wrong?

I'm writing an application that allows users to send one another messages about an 'offer'.
I thought I'd save myself some work and use the Mailboxer gem.
I'm following a test driven development approach with RSpec. I'm writing a test that should ensure that only one Conversation is allowed per offer. An offer belongs_to two different users (the user that made the offer, and the user that received the offer).
Here is my failing test:
describe "after a message is sent to the same user twice" do
before do
2.times { sending_user.message_user_regarding_offer! offer, receiving_user, random_string }
end
specify { sending_user.mailbox.conversations.count.should == 1 }
end
So before the test runs a user sending_user sends a message to the receiving_user twice. The message_user_regarding_offer! looks like this:
def message_user_regarding_offer! offer, receiver, body
conversation = offer.conversation
if conversation.nil?
self.send_message(receiver, body, offer.conversation_subject)
else
self.reply_to_conversation(conversation, body)
# I put a binding.pry here to examine in console
end
offer.create_activity key: PublicActivityKeys.message_received, owner: self, recipient: receiver
end
On the first iteration in the test (when the first message is sent) the conversation variable is nil therefore a message is sent and a conversation is created between the two users.
On the second iteration the conversation created in the first iteration is returned and the user replies to that conversation, but a new conversation isn't created.
This all works, but the test fails and I cannot understand why!
When I place a pry binding in the code in the location specified above I can examine what is going on... now riddle me this:
self.mailbox.conversations[0] returns a Conversation instance
self.mailbox.conversations[1] returns nil
self.mailbox.conversations clearly shows a collection containing ONE object.
self.mailbox.conversations.count returns 2?!
What is going on there? the count method is incorrect and my test is failing...
What am I missing? Or is this a bug?!
EDIT
offer.conversation looks like this:
def conversation
Conversation.where({subject: conversation_subject}).last
end
and offer.conversation_subject:
def conversation_subject
"offer-#{self.id}"
end
EDIT 2 - Showing the first and second iteration in pry
Also...
Conversation.all.count returns 1!
and:
Conversation.all == self.mailbox.conversations returns true
and
Conversation.all.count == self.mailbox.conversations.count returns false
How can that be if the arrays are equal? I don't know what's going on here, blown hours on this now. Think it's a bug?!
EDIT 3
From the source of the Mailboxer gem...
def conversations(options = {})
conv = Conversation.participant(#messageable)
if options[:mailbox_type].present?
case options[:mailbox_type]
when 'inbox'
conv = Conversation.inbox(#messageable)
when 'sentbox'
conv = Conversation.sentbox(#messageable)
when 'trash'
conv = Conversation.trash(#messageable)
when 'not_trash'
conv = Conversation.not_trash(#messageable)
end
end
if (options.has_key?(:read) && options[:read]==false) || (options.has_key?(:unread) && options[:unread]==true)
conv = conv.unread(#messageable)
end
conv
end
The reply_to_convesation code is available here -> http://rubydoc.info/gems/mailboxer/frames.
Just can't see what I'm doing wrong! Might rework my tests to get around this. Or ditch the gem and write my own.
see this Rails 3: Difference between Relation.count and Relation.all.count
In short Rails ignores the select columns (if more than one) when you apply count to the query. This is because
SQL's COUNT allows only one or less columns as parameters.
From Mailbox code
scope :participant, lambda {|participant|
select('DISTINCT conversations.*').
where('notifications.type'=> Message.name).
order("conversations.updated_at DESC").
joins(:receipts).merge(Receipt.recipient(participant))
}
self.mailbox.conversations.count ignores the select('DISTINCT conversations.*') and counts the join table with receipts, essentially counting number of receipts with duplicate conversations in it.
On the other hand, self.mailbox.conversations.all.count first gets the records applying the select, which gets unique conversations and then counts it.
self.mailbox.conversations.all == self.mailbox.conversations since both of them query the db with the select.
To solve your problem you can use sending_user.mailbox.conversations.all.count or sending_user.mailbox.conversations.group('conversations.id').length
I have tended to use the size method in my code. As per the ActiveRecord code, size will use a cached count if available and also returns the correct number when models have been created through relations and have not yet been saved.
# File activerecord/lib/active_record/relation.rb, line 228
def size
loaded? ? #records.length : count
end
There is a blog on this here.
In Ruby, #length and #size are synonyms and both do the same thing: they tell you how many elements are in an array or hash. Technically #length is the method and #size is an alias to it.
In ActiveRecord, there are several ways to find out how many records are in an association, and there are some subtle differences in how they work.
post.comments.count - Determine the number of elements with an SQL COUNT query. You can also specify conditions to count only a subset of the associated elements (e.g. :conditions => {:author_name => "josh"}). If you set up a counter cache on the association, #count will return that cached value instead of executing a new query.
post.comments.length - This always loads the contents of the association into memory, then returns the number of elements loaded. Note that this won't force an update if the association had been previously loaded and then new comments were created through another way (e.g. Comment.create(...) instead of post.comments.create(...)).
post.comments.size - This works as a combination of the two previous options. If the collection has already been loaded, it will return its length just like calling #length. If it hasn't been loaded yet, it's like calling #count.
It is also worth mentioning to be careful if you are not creating models through associations, as the related model will not necessarily have those instances in its association proxy/collection.
# do this
mailbox.conversations.build(attrs)
# or this
mailbox.conversations << Conversation.new(attrs)
# or this
mailbox.conversations.create(attrs)
# or this
mailbox.conversations.create!(attrs)
# NOT this
Conversation.new(mailbox_id: some_id, ....)
I don't know if this explains what's going on, but the ActiveRecord count method queries the database for the number of records stored. The length of the Relation could be different, as discussed in http://archive.railsforum.com/viewtopic.php?id=6255, although in that example, the number of records in the database was less than the number of items in the Rails data structure.
Try
self.mailbox.conversations.reload; self.mailbox.conversations.count
or perhaps
self.mailbox.reload; self.mailbox.conversations.count
or, if neither of those work, just try reloading as many of the objects as possible to see if you can get it to work (self, mailbox, conversations, etc.).
My guess is that something is messed up between memory and the DB. This is definitely a really weird error though, might wanna put in an issue on Rails to see why this would be the case.
The result of mailbox.conversations is cached after the first call. To reload it write mailbox.conversations(true)

Rails - given an array of Users - how to get a output of just emails?

I have the following:
#users = User.all
User has several fields including email.
What I would like to be able to do is get a list of all the #users emails.
I tried:
#users.email.all but that errors w undefined
Ideas? Thanks
(by popular demand, posting as a real answer)
What I don't like about fl00r's solution is that it instantiates a new User object per record in the DB; which just doesn't scale. It's great for a table with just 10 emails in it, but once you start getting into the thousands you're going to run into problems, mostly with the memory consumption of Ruby.
One can get around this little problem by using connection.select_values on a model, and a little bit of ARel goodness:
User.connection.select_values(User.select("email").to_sql)
This will give you the straight strings of the email addresses from the database. No faffing about with user objects and will scale better than a straight User.select("email") query, but I wouldn't say it's the "best scale". There's probably better ways to do this that I am not aware of yet.
The point is: a String object will use way less memory than a User object and so you can have more of them. It's also a quicker query and doesn't go the long way about it (running the query, then mapping the values). Oh, and map would also take longer too.
If you're using Rails 2.3...
Then you'll have to construct the SQL manually, I'm sorry to say.
User.connection.select_values("SELECT email FROM users")
Just provides another example of the helpers that Rails 3 provides.
I still find the connection.select_values to be a valid way to go about this, but I recently found a default AR method that's built into Rails that will do this for you: pluck.
In your example, all that you would need to do is run:
User.pluck(:email)
The select_values approach can be faster on extremely large datasets, but that's because it doesn't typecast the returned values. E.g., boolean values will be returned how they are stored in the database (as 1's and 0's) and not as true | false.
The pluck method works with ARel, so you can daisy chain things:
User.order('created_at desc').limit(5).pluck(:email)
User.select(:email).map(&:email)
Just use:
User.select("email")
While I visit SO frequently, I only registered today. Unfortunately that means that I don't have enough of a reputation to leave comments on other people's answers.
Piggybacking on Ryan's answer above, you can extend ActiveRecord::Base to create a method that will allow you to use this throughout your code in a cleaner way.
Create a file in config/initializers (e.g., config/initializers/active_record.rb):
class ActiveRecord::Base
def self.selected_to_array
connection.select_values(self.scoped)
end
end
You can then chain this method at the end of your ARel declarations:
User.select('email').selected_to_array
User.select('email').where('id > ?', 5).limit(4).selected_to_array
Use this to get an array of all the e-mails:
#users.collect { |user| user.email }
# => ["test#example.com", "test2#example.com", ...]
Or a shorthand version:
#users.collect(&:email)
You should avoid using User.all.map(&:email) as it will create a lot of ActiveRecord objects which consume large amounts of memory, a good chunk of which will not be collected by Ruby's garbage collector. It's also CPU intensive.
If you simply want to collect only a few attributes from your database without sacrificing performance, high memory usage and cpu cycles, consider using Valium.
https://github.com/ernie/valium
Here's an example for getting all the emails from all the users in your database.
User.all[:email]
Or only for users that subscribed or whatever.
User.where(:subscribed => true)[:email].each do |email|
puts "Do something with #{email}"
end
Using User.all.map(&:email) is considered bad practice for the reasons mentioned above.

Empty Scope with Ruby on Rails

Following Problem:
I need something like an empty scope. Which means that this scope is emtpy, but responds to all methods a scope usually responds to.
I'm currently using a little dirty hack. I simply supply "1=0" as conditions. I find this realy ugly, since it hits the database. Simply returning an empty array won't work, since the result must respond to the scoped methods.
Is there a better existing solution for this or will I need to code this myself?
Maybe some example code could help explain what i need:
class User < ActiveRecord::Base
named_scope :admins, :conditions => {:admin => true }
named_scope :none_dirty, :conditions => "1=0" # this scope is always empty
def none_broken
[]
end
def self.sum_score # okay, a bit simple, but a method like this should work!
total = 0
self.all.each do |user|
total += user.score
end
return total
end
end
User.admin.sum_score # the score i want to know
User.none_drity.sum_score # works, but hits the db
User.none_broken.sum_score # ...error, since it doesn't respond to sum_score
Rails 4 introduces the none scope.
It is to be used in instances where you have a method which returns a relation, but there is a condition in which you do not want the database to be queried.
If you want a scope to return an unaltered scope use all:
No longer will a call to Model.all execute a query immediately and return an array of records. In Rails 4, calls to Model.all is equivalent to now deprecated Model.scoped. This means that more relations can be chained to Model.all and the result will be lazily evaluated.
User.where('false')
returns an ActiveRecord::Relation with zero elements, that is a chain-able scope that won't hit the database until you actually try to access one of its elements. This is similar to PhilT's solution with ('1=0') but a little more elegant.
Sorry User.scoped is not what you want. As commented this returns everything. Should have paid more attention to the question.
I've seen where('1 = 0') suggested before and Rails should probably cache it as well.
Also, where('1 = 0') won't hit the database until you do .all, .each, or one of the calculations methods.
I thing you need User.scoped({})
How about User.where(id: nil) ?
Or User.where(_id: nil) for mongoid.
The thing you are looking for does not exist. You could implement something like this by monky patching the find method. Yet, this would be an overkill, so I recomend keeping this unless it's performance critical.
Looking at your example code indicates you may not know about aggregated queries in SQL which are exposed as calculations methods in Rails:
User.sum(:score) will give you the sum of all users' scores
Take a look at Rails Guides for more info:
http://guides.rubyonrails.org/active_record_querying.html#sum

Retrieving all objects in code upfront for performance reasons

How do you folks retrieve all objects in code upfront?
I figure you can increase performance if you bundle all the model calls together?
This makes for a bigger deal, especially if your DB cannot keep everything in memory
def hitDBSeperately {
get X users
...code
get Y users... code
get Z users... code
}
Versus:
def hitDBInSingleCall {
get X+Y+Z users
code for X
code for Y...
}
Are you looking for an explanation between the approach where you load in all users at once:
# Loads all users into memory simultaneously
#users = User.all
#users.each do |user|
# ...
end
Where you could load them individually for a smaller memory footprint?
#user_ids = User.connection.select_values("SELECT id FROM users")
#user_ids.each do |user_id|
user = User.find(user_id)
# ...
end
The second approach would be slower since it requires N+1 queries for N users, where the first loads them all with 1 query. However, you need to have sufficient memory for creating model instances for each and every User record at the same time. This is not a function of "DB memory", but of application memory.
For any application with a non-trivial number of users, you should use an approach where you load users either individually or in groups. You can do this using:
#user_ids.in_groups_of(10) do |user_ids|
User.find_all_by_id(user_ids).each do |user|
# ...
end
end
By tuning to use an appropriate grouping factor, you can balance between memory usage and performance.
Can you give a snipet of actual ruby on rails code, our pseudo code is a little confusing?
You can avoid the n+1 problem by using eager loading. You can accomplish this by using :includes => tag in your model.find method.
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html

rails/activerecord search eager loaded associations

I have a simple find statement as such:
m = MyModel.find(1, :include => :my_children)
With m.mychildren being an Array; is there anyway to find a particular record from within the array without having to iterate over the entire thing. If I do mychildren.find(1), a new DB query is issues, which doesn't make sense, since they are all loaded already
It looks like there's a little Rails magic going on here. Where Enumerable#find is being overridden by ActiveRecord::Base#find on methods created for associations.
On the upside Enumerable#find is aliased to Enumerable#detect.
Unfortunately Enumerable#find/Enumerable#detect have significantly different syntax from ActiveRecord::Base#find.
So you can't just do mychildren.find(1), instead you've got to do mychildren.detect{|c| c.id == 1} if you want to avoid hitting the database again. You may also want to consider extending Array for a more DRY way of doing this.
class Array
def id_find id
self.detect{|element| element.id == id}
end
end
I'm not quite sure what your asking, but have you tried select:
m.mychildren.select{ |child| child == <<some_statement>> }
This won't hit the database assuming you've used the :include option as you stated in your question.
Alternatively, if you know the number of the child you want, you should be able to just use
m.mychildren[1]

Resources