eager loading the first record of an association - ruby-on-rails

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code

This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.

To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.

There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

Related

Rails: Why are has_many relations not saved in the variable/attribute like belongs_to?

So let's say I have the following models in Rails.
class Post < ApplicationRecord
belongs_to :user
end
class User < ApplicationRecord
has_many :posts
end
When I put the Post instance in a variable and call user on it, the following sql query runs once, after that the result it is saved/cached.
post.user
User Load (0.9ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 1 LIMIT 1
#<User id: 1 ...>
post.user
#<User id: 1 ...>
etc.
However, when I go the other way around with user.posts, it always runs the query.
user.posts
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
user.posts
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
etc.
Unless I convert it to an array, in which case it does get saved.
user.posts.to_a
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
user.posts
# No query here.
But user.posts still produces a ActiveRecord::Associations::CollectionProxy object on which I can call other activerecord methods. So no real downside here.
Why does this work like this? It seems like an unnecessary impact on sql optimization. The obvious reason I can think of is that user.posts updates correctly when a Post is created after user is set. But in reality that doesn't happen so much imo, since variables are mostly set and reset in consecutively ran Controller actions. Now I know there is a caching system in place that shows a CACHE Post sql in the server logs, but I can't really rely on that when I'm working with activerecord objects between models or with more complex queries.
Am I missing some best practice here? Or an obvious setting that fixes exactly this?
You're examining this in irb and that's why you're seeing the query always running.
In an actual block of code in a controller or other class if you were to write
#users_posts = user.posts
The query is NOT executed... not until you iterate through the collection, or request #count
irb, to be helpful, always runs queries immediately

Includes still result in second database query when using relation with limited columns

I'm trying to use includes on a query to limit the number of subsequent database calls that fire when rendering but I also want the include calls to select a subset of columns from the related tables. Specifically, I want to get a set of posts, their comments, and just the name of the user who wrote each comment.
So I added
belongs_to :user
belongs_to :user_for_display, :select => "users.id, user.name", :class_name => "User", :foreign_key => "user_id"
to my comments model.
From the console, when I do
p = Post.where(:id => 1).includes(comments: [:user_for_display])
I see that the correct queries fire:
SELECT posts.* FROM posts WHERE posts.id = 1
SELECT comments.* FROM comments comments.attachable_type = "Post" AND comments.attachable_id IN (1)
SELECT users.id, users.name FROM users WHERE users.id IN (1,2,3)
but calling
p.first.comments.first.user.name
still results in a full user load database call:
User Load (0.5ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 11805 LIMIT 1
=> "John"
Referencing just p.first.comments does not fire a second comments query. And if I include the full :user relation instead of :user_for_display, the call to get the user name doesn't fire a second users query (but i'd prefer not to be loading the full user record).
Is there anyway to use SELECT to limit fields in an includes?
You need to query with user_for_display instead of user.
p.first.comments.first.user_for_display.name

Rails 4 Eager load limit subquery

Is there a way to avoid the n+1 problem when eager loading and also applying a limit to the subquery?
I want to avoid lots of sql queries like this:
Category.all.each do |category|
category.posts.limit(10)
end
But I also want to only get 10 posts per category, so the standard eager loading, which gets all the posts, does not suffice:
Category.includes(:posts).all
What is the best way to solve this problem? Is N+1 the only way to limit the amount of posts per category?
From the Rails docs
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects
So given the following model definition
class Category < ActiveRecord::Base
has_many :posts
has_many :included_posts, -> { limit 10 }, class_name: "Post"
end
Calling Category.find(1).included_posts would work as expected and apply the limit of 10 in the query. However, if you try to do Category.includes(:included_posts).all the limit option will be ignored. You can see why this is the case if you look at the SQL generated by an eager load
Category.includes(:posts).all
Category Load (0.2ms) SELECT "categories".* FROM "categories"
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."category_id" IN (1, 2, 3)
If you added the LIMIT clause to the posts query, it would return a total of 10 posts and not 10 posts per category as you might expect.
Getting back to your problem, I would eager load all posts and then limit the loaded collection using first(10)
categories = Category.includes(:posts).all
categories.first.posts.first(10)
Although you're loading more models into memory, this is bound to be more performant since you're only making 2 calls against the database vs. n+1. Cheers.

Rails --> an n+1 database issue that won't go away

I'm trying to optimise some N+1 queries in active record for the first time. There are 3 to kill - 2 went very easily with a .includes call, but I can't for the life of me figure out why the third is still calling a bunch of queries. Relevant code below - if anyone has any suggestions, I'd be really appreciative.
CONTROLLER:
#enquiries = Comment.includes(:children).faqs_for_project(#project)
MODEL;
def self.faqs_for_project(project)
Comment.for_project_and_enquiries(project, project.enquiries).where(:published => true).order("created_at DESC")
end
(and the relevant scope)
scope :for_project_and_enquiries, lambda{|p, qs| where('(commentable_type = ? and commentable_id = ?) or (commentable_type = ? and commentable_id IN (?))', "Project", p.id, "Enquiry", qs.collect{|q| q.id})}
VIEW:
...
= render :partial => 'comments/comment', :collection => #enquries
...
(and that offending line in the partial)
...
= 'Read by ' + pluralize(comment.acknowledgers.count, 'lead')
...
Two SQL queries are called for each comment. The 2 queries are:
SQL (2.8ms) SELECT COUNT(*) FROM "users" INNER JOIN "acknowledgements" ON "users".id = "acknowledgements".user_id WHERE (("acknowledgements".feedback_type = 'Comment') AND ("acknowledgements".feedback_id = 177621))
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1295 LIMIT 1
I would have thought appending (:user, :acknowledgements) into the controller's .includes would have solved the problem, but it doesn't seem to have any effect. If anyone has any suggestions on what I'm missing, I'd be really appreciative
I believe in your Comment table you want to add a :acknowledgers_count column as a counter cache
has_many :acknowledgers, ....., counter_cache: true
You will need to create a migration to add the :acknowledgers_count column to the comments table. Rails should take care of the rest.
You can learn more about the ActiveRecord::CounterCache api here.
The count method in comment.acknowledgers.count is overloaded in ActiveRecord to first check if a counter cache column exists, and if it does, it returns that directly from the model (in this case the Comment model) without having to touch the database again.
Finally, there was very recently a great Railscast about a gem call Bullet that can help you identify these query issues and guide you toward a solution. It covers both counter caches and N+1 queries.
As #ismaelga pointed out in a comment to this answer, it's a generally better practice to call .size instead of .count on a relation. Check out the source for size:
def size
loaded? ? #records.length : count
end
If the relation is already loaded it will just call length on it, otherwise it will call count. It's an extra check to try and prevent the database from unnecessarily being queried.

Rails 3 Limiting Included Objects

For example I have a blog object, and that blog has many posts. I want to do eager loading of say the first blog object and include say the first 10 posts of it. Currently I would do #blogs = Blog.limit(4) and then in the view use #blogs.posts.limit(10). I am pretty sure there is a better way to do this via an include such as Blog.include(:posts).limit(:posts=>10). Is it just not possible to limit the number of included objects, or am I missing something basic here?
Looks like you can't apply a limit to :has_many when eager loading associations for multiple records.
Example:
class Blog < ActiveRecord::Base
has_many :posts, :limit => 5
end
class Post < ActiveRecord::Base
belongs_to :blog
end
This works fine for limiting the number of posts for a single blog:
ruby-1.9.2-p290 :010 > Blog.first.posts
Blog Load (0.5ms) SELECT `blogs`.* FROM `blogs` LIMIT 1
Post Load (0.6ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` = 1 LIMIT 5
However, if you try to load all blogs and eager load the posts with them:
ruby-1.9.2-p290 :011 > Blog.includes(:posts)
Blog Load (0.5ms) SELECT `blogs`.* FROM `blogs`
Post Load (1.1ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` IN (1, 2)
Note that there's no limit on the second query, and there couldn't be - it would limit the number of posts returned to 5 across all blogs, which is not at all what you want.
EDIT:
A look at the Rails docs confirms this. You always find these things the minute you've figured them out :)
If you eager load an association with a specified :limit option, it
will be ignored, returning all the associated objects
You need to limit the number of posts in your blog model like this:
class Blog < ActiveRecord::Base
has_many :included_posts, :class_name => 'Post', :limit => 10
has_many :posts
end
So then you can do:
$ Blog.first.included_posts.count
=> 10
$ Blog.first.posts.count
=> 999

Resources