I'm trying to understand the Active Record find_each method to retrieve records in batches.
If a Post has_many comments (thousands of comments) and I want to render each comment, I can do:
#post.comments.each { |comment| ... }
But is it faster if I do :
#post.comments.find_each { |comment| ... } ?
Is the second query valid?
The way I'm seeing this is that by doing #post.comments before the find_each is already getting all the comments for each post so ending the query with find_each is probably redundant ?!
This is the full block of code in a rails app views folder:
json.comments #post.comments.find_each do |comment|
json.rating comment.rating
json.content comment.content
end
Thank you
#post.comments.find_each is correct. When you call #post.comments the query that it creates does not execute right away. That's why you can chain other query methods like where and order etc. to this association. (sql log truncated for readability)
>> comments = Post.first.comments; nil;
Post Load (0.8ms)
=> nil
# NOTE: ^ no comments are loaded yet, until we need to output the result
>> comments
Comment Load (2.0ms)
=> [#<Comment:0x00007f2ef1733328 id: 1, post_id: 1>, ...] # <= array of 2000 comments
When each method is chained, all of the comments are loaded into memory which is an issue when you have thousands of records.
>> Post.first.comments.each{|c| print "." }
Post Load (0.6ms)
# NOTE: executes one query for comments
Comment Load (1.4ms)
.................................................. # <= 2000 dots
=> [#<Comment:0x00007f2ef1560be0 id: 1, post_id: 1>, ...]
When find_each method is chained, comments are retrieved in batches of 1000 by default:
>> Post.first.comments.find_each{|c| print "." }
Post Load (0.6ms)
# NOTE: executes query to get 1000 comments at a time
Comment Load (0.8ms)
.................................................. # <= 1000 dots
Comment Load (0.8ms)
.................................................. # <= 1000 dots
Comment Load (0.4ms)
# no more comments
=> nil
You can also specify the :batch_size:
>> Post.first.comments.find_each(batch_size: 100){|c| print "." }
Post Load (0.7ms)
# NOTE: get comments in batches of 100
Comment Load (0.6ms)
.................................................. # <= 100 dots
Comment Load (0.5ms)
.................................................. # <= 100 dots
# 21 queries in total
=> nil
https://api.rubyonrails.org/classes/ActiveRecord/Batches.html#method-i-find_each
Related
So let's say I have the following models in Rails.
class Post < ApplicationRecord
belongs_to :user
end
class User < ApplicationRecord
has_many :posts
end
When I put the Post instance in a variable and call user on it, the following sql query runs once, after that the result it is saved/cached.
post.user
User Load (0.9ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 1 LIMIT 1
#<User id: 1 ...>
post.user
#<User id: 1 ...>
etc.
However, when I go the other way around with user.posts, it always runs the query.
user.posts
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
user.posts
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
etc.
Unless I convert it to an array, in which case it does get saved.
user.posts.to_a
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
user.posts
# No query here.
But user.posts still produces a ActiveRecord::Associations::CollectionProxy object on which I can call other activerecord methods. So no real downside here.
Why does this work like this? It seems like an unnecessary impact on sql optimization. The obvious reason I can think of is that user.posts updates correctly when a Post is created after user is set. But in reality that doesn't happen so much imo, since variables are mostly set and reset in consecutively ran Controller actions. Now I know there is a caching system in place that shows a CACHE Post sql in the server logs, but I can't really rely on that when I'm working with activerecord objects between models or with more complex queries.
Am I missing some best practice here? Or an obvious setting that fixes exactly this?
You're examining this in irb and that's why you're seeing the query always running.
In an actual block of code in a controller or other class if you were to write
#users_posts = user.posts
The query is NOT executed... not until you iterate through the collection, or request #count
irb, to be helpful, always runs queries immediately
When I make the following call in my controller's action, I can see the first time I load the page I see the output "posts hitting the db" in console output.
cache_key = "posts"
#posts = Rails.cache.fetch(cache_key, expires_in: 5.minutes) do
puts "posts hitting the db"
Post.include(:tags).where("post_status = 1").order("id desc")
end
If I reload the page, I don't see the "posts hitting the db" message but I can still see the queries like:
Processing by PostsController#index as HTML Rendering
posts/index.html.erb within layouts/main Post Load (0.5ms) SELECT
"posts".* FROM "posts" WHERE (post_status = 1) ORDER BY id desc ↳
app/views/posts/index.html.erb:6 Label Load (0.3ms) SELECT "tags".*
FROM "tags" WHERE "tags"."id" = $1 [["id", 7]] ↳
app/views/posts/index.html.erb:6 Rendered posts/index.html.erb
within layouts/posts (31.4ms) Completed 200 OK in 87ms (Views: 62.6ms
| ActiveRecord: 8.2ms)
Is this because it is caching the #posts but since the #posts object wasn't actually used it didn't even make the db call? So when the view page does the #posts.each then it hits the db?
For example, if I remove all my html in my view page then it doesn't hit the db at all.
When you do
Post.include(:tags).where("post_status = 1").order("id desc")
You aren't actually calling the database. You are creating a ActiveRecord scope -- so that is what you are caching. If you want to cache the results of the database call, you need to do something like:
cache_key = "posts"
#posts = Rails.cache.fetch(cache_key, expires_in: 5.minutes) do
puts "posts hitting the db"
Post.include(:tags).where("post_status = 1").order("id desc").all.to_a
end
Instead, which makes ActiveRecord actually populate #posts with table data.
As #bkimble said you need to add .all to turn your query into a database call. However, since Rails 4, .all is executed lazily. As I explained in this Memcache guide for Rails, you need to force its execution, e.g., by adding .to_a which converts the result into an array.
Your code would then look like:
cache_key = "posts"
#posts = Rails.cache.fetch(cache_key, expires_in: 5.minutes) do
puts "posts hitting the db"
Post.include(:tags).where("post_status = 1").order("id desc").all.to_a
end
I would like to write a class function for my model that returns one random record that meets my condition and excludes some records. The idea is that I will make a "random articles section."
I would like my function to look like this
Article.randomArticle([1, 5, 10]) # array of article ids to exclude
Some pseudo code:
ids_to_exclude = [1,2,3]
loop do
returned_article = Article.where(published: true).sample
break unless ids_to_exclude.include?(returned_article.id)
do
Lets look at DB specific option.
class Article
# ...
def self.random(limit: 10)
scope = Article.where(published: true)
# postgres, sqlite
scope.limit(limit).order('RANDOM()')
# mysql
scope.limit(limit).order('RAND()')
end
end
Article.random asks the database to get 10 random records for us.
So lets look at how we would add an option to exclude some records:
class Article
# ...
def self.random(limit: 10, except: nil)
scope = Article.where(published: true)
if except
scope = scope.where.not(id: except)
end
scope.limit(limit).order('RANDOM()')
end
end
Now Article.random(except: [1,2,3]) would get 10 records where the id is not [1,2,3].
This is because .where in rails returns a scope which is chain-able. For example:
> User.where(email: 'test#example.com').where.not(id: 1)
User Load (0.7ms) SELECT "users".* FROM "users" WHERE "users"."email" = $1 AND ("users"."id" != $2) [["email", "test#example.com"], ["id", 1]]
=> #<ActiveRecord::Relation []>
We could even pass a scope here:
# cause everyone hates Bob
Article.random( except: Article.where(author: 'Bob') )
See Rails Quick Tips - Random Records for why a DB specific solution is a good choice here.
You can use some like this:
ids_to_exclude = [1,2,3,4]
Article.where("published = ? AND id NOT IN (?)", true , ids_to_exclude ).order( "RANDOM()" ).first
I'm running into a strange issue creating a scope and using the first finder. It seems as though using first as part of the query in a scope will make it return all results if no results are found. If any results are found, it will correctly return the first result.
I have setup a very simple test to demonstrate this:
class Activity::MediaGroup < ActiveRecord::Base
scope :test_fail, -> { where('1 = 0').first }
scope :test_pass, -> { where('1 = 1').first }
end
Note for this test, I have set where conditions to match records or not. In reality, I am querying based on real conditions, and getting the same strange behavior.
Here are the results from the failing scope. As you can see, it makes the correct query, which has no results, so it then queries for all matching records and returns that instead:
irb(main):001:0> Activity::MediaGroup.test_fail
Activity::MediaGroup Load (0.0ms) SELECT "activity_media_groups".* FROM "activity_media_groups" WHERE (1 = 0) ORDER BY "activity_media_groups"."id" ASC LIMIT 1
Activity::MediaGroup Load (0.0ms) SELECT "activity_media_groups".* FROM "activity_media_groups"
=> #<ActiveRecord::Relation [#<Activity::MediaGroup id: 1, created_at: "2014-01-06 01:00:06", updated_at: "2014-01-06 01:00:06", user_id: 1>, #<Activity::MediaGroup id: 2, created_at: "2014-01-06 01:11:06", updated_at: "2014-01-06 01:11:06", user_id: 1>, #<Activity::MediaGroup id: 3, created_at: "2014-01-06 01:26:41", updated_at: "2014-01-06 01:26:41", user_id: 1>, #<Activity::MediaGroup id: 4, created_at: "2014-01-06 01:28:58", updated_at: "2014-01-06 01:28:58", user_id: 1>]>
The other scope operates as expected:
irb(main):002:0> Activity::MediaGroup.test_pass
Activity::MediaGroup Load (1.0ms) SELECT "activity_media_groups".* FROM "activity_media_groups" WHERE (1 = 1) ORDER BY "activity_media_groups"."id" ASC LIMIT 1
=> #<Activity::MediaGroup id: 1, created_at: "2014-01-06 01:00:06", updated_at: "2014-01-06 01:00:06", user_id: 1>
If I perform this same logic outside of a scope, I get the expected results:
irb(main):003:0> Activity::MediaGroup.where('1=0').first
Activity::MediaGroup Load (0.0ms) SELECT "activity_media_groups".* FROM "activity_media_groups" WHERE (1=0) ORDER BY "activity_media_groups"."id" ASC LIMIT 1
=> nil
Am I missing something here? This seems like a bug in Rails/ActiveRecord/Scopes to me unless there is some unknown behavior expectations I am unaware of.
This is not a bug or weirdness, after some research i've found its designed on purpose.
First of all,
The scope returns an ActiveRecord::Relation
If there are zero records its programmed to return all records
which is again an ActiveRecord::Relation instead of nil
The idea behind this is to make scopes chainable (i.e) one of the key difference between scope and class methods
Example:
Lets use the following scenario: users will be able to filter posts by statuses, ordering by most recent updated ones. Simple enough, lets write scopes for that:
class Post < ActiveRecord::Base
scope :by_status, -> status { where(status: status) }
scope :recent, -> { order("posts.updated_at DESC") }
end
And we can call them freely like this:
Post.by_status('published').recent
# SELECT "posts".* FROM "posts" WHERE "posts"."status" = 'published'
# ORDER BY posts.updated_at DESC
Or with a user provided param:
Post.by_status(params[:status]).recent
# SELECT "posts".* FROM "posts" WHERE "posts"."status" = 'published'
# ORDER BY posts.updated_at DESC
So far, so good. Now lets move them to class methods, just for the sake of comparing:
class Post < ActiveRecord::Base
def self.by_status(status)
where(status: status)
end
def self.recent
order("posts.updated_at DESC")
end
end
Besides using a few extra lines, no big improvements. But now what happens if the :status parameter is nil or blank?
Post.by_status(nil).recent
# SELECT "posts".* FROM "posts" WHERE "posts"."status" IS NULL
# ORDER BY posts.updated_at DESC
Post.by_status('').recent
# SELECT "posts".* FROM "posts" WHERE "posts"."status" = ''
# ORDER BY posts.updated_at DESC
Oooops, I don’t think we wanted to allow these queries, did we? With scopes, we can easily fix that by adding a presence condition to our scope:
scope :by_status, -> status { where(status: status) if status.present? }
There we go:
Post.by_status(nil).recent
# SELECT "posts".* FROM "posts" ORDER BY posts.updated_at DESC
Post.by_status('').recent
# SELECT "posts".* FROM "posts" ORDER BY posts.updated_at DESC
Awesome. Now lets try to do the same with our beloved class method:
class Post < ActiveRecord::Base
def self.by_status(status)
where(status: status) if status.present?
end
end
Running this:
Post.by_status('').recent
NoMethodError: undefined method `recent' for nil:NilClass
And :bomb:. The difference is that a scope will always return a relation, whereas our simple class method implementation will not. The class method should look like this instead:
def self.by_status(status)
if status.present?
where(status: status)
else
all
end
end
Notice that I’m returning all for the nil/blank case, which in Rails 4 returns a relation (it previously returned the Array of items from the database). In Rails 3.2.x, you should use scoped there instead. And there we go:
Post.by_status('').recent
# SELECT "posts".* FROM "posts" ORDER BY posts.updated_at DESC
So the advice here is: never return nil from a class method that should work like a scope, otherwise you’re breaking the chainability condition implied by scopes, that always return a relation.
Long Story Short:
No matter what, scopes are intended to return ActiveRecord::Relation to make it chainable. If you are expecting first, last or find results you should use class methods
Source: http://blog.plataformatec.com.br/2013/02/active-record-scopes-vs-class-methods/
You can use limit instead of first because -
When data not found then first returns nil or first(<number>) returns array which is non chain-able object.
Whereas, limit returns ActiveRecord::Relation object.
More details in this post -
https://sagarjunnarkar.github.io/blogs/2019/09/15/activerecord-scope/
I always get an error message like this:
undefined method 'firm_size' for nil:NilClass
When iterating over a collection and come upon some nil case.
I usually have to go into my view and add an if statement around this particular attribute to handle nil cases.
This seems like a very un-DRY approach.
Is there a more elegant way to handle these types of cases in Rails? Not just nil objects in a collection, but objects that may have an attribute that is nil - which is actually what is happening here.
Thanks.
Edit 1
For more context on this particular instance of the error:
This is in my Scores#index view -
<% #clients.each do |client| %>
<tr>
<td><%= "First Client" %></td>
</tr>
<tr>
<td>Firm Size</td>
<td><%= best_in_place client.score, :firm_size, :type => :input, :nil => "Add Score for Firm Size" %></td>
This is my scores_controller.rb the relevant parts:
class ScoresController < ApplicationController
before_filter :load_clients
def index
#weight = current_user.weight
#scores = current_user.scores
respond_to do |format|
format.html # index.html.erb
format.json { render json: #scores }
end
end
private
def load_clients
#clients = current_user.clients
end
end
This is the server log:
Started GET "/scores" for 127.0.0.1 at 2012-10-10 18:38:05 -0500
Processing by ScoresController#index as HTML
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1
Weight Load (0.2ms) SELECT "weights".* FROM "weights" WHERE "weights"."user_id" = 1 LIMIT 1
Client Load (0.3ms) SELECT "clients".* FROM "clients" WHERE "clients"."user_id" = 1
Score Load (10.9ms) SELECT "scores".* FROM "scores" WHERE "scores"."client_id" = 1 LIMIT 1
Score Load (0.3ms) SELECT "scores".* FROM "scores" WHERE "scores"."client_id" = 2 LIMIT 1
Rendered scores/index.html.erb within layouts/application (276.6ms)
Completed 500 Internal Server Error in 283ms
This is the record in question (i.e. client.score when client.id = 2)
1.9.3p194 :089 > d = Client.find(2)
Client Load (0.2ms) SELECT "clients".* FROM "clients" WHERE "clients"."id" = ? LIMIT 1 [["id", 2]]
=> #<Client id: 2, name: "Jack Daniels", email: "jack#abc.com", phone: 1234567890, firm_id: 2, created_at: "2012-09-05 19:26:07", updated_at: "2012-10-07 02:44:51", user_id: 1, last_contact: "2012-02-10", vote: false, vote_for_user: false, next_vote: "2012-07-12", weighted_score: nil>
1.9.3p194 :090 > d.score
Score Load (0.4ms) SELECT "scores".* FROM "scores" WHERE "scores"."client_id" = 2 LIMIT 1
=> nil
As I said before, this error is raised whenever a nil record (or attribute) is encountered. In this particular case, the nil record is for the 2nd Client record that has been assigned no score.
In your particular case, assuming best_in_place is a helper, you could handle the nil in the helper. Just pass in client, rather than the client.score.
The point is that somewhere you will have an if statement. Just put it in the best spot.
If your code is consistently returning nil for something that shouldn't be nil, then perhaps it might be best to rethink how the functions are being called. It is often nice to have a (or numerous) custom nil class(es) though. You could write extensions on NilCLass to catch the functions that you have, but this can be an issue if you get function names that are the same for different situations. You could also potentially create a singleton in all your classes that helper methods point to if particular attributes are nil