Rails 4 Eager load limit subquery - ruby-on-rails

Is there a way to avoid the n+1 problem when eager loading and also applying a limit to the subquery?
I want to avoid lots of sql queries like this:
Category.all.each do |category|
category.posts.limit(10)
end
But I also want to only get 10 posts per category, so the standard eager loading, which gets all the posts, does not suffice:
Category.includes(:posts).all
What is the best way to solve this problem? Is N+1 the only way to limit the amount of posts per category?

From the Rails docs
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects
So given the following model definition
class Category < ActiveRecord::Base
has_many :posts
has_many :included_posts, -> { limit 10 }, class_name: "Post"
end
Calling Category.find(1).included_posts would work as expected and apply the limit of 10 in the query. However, if you try to do Category.includes(:included_posts).all the limit option will be ignored. You can see why this is the case if you look at the SQL generated by an eager load
Category.includes(:posts).all
Category Load (0.2ms) SELECT "categories".* FROM "categories"
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."category_id" IN (1, 2, 3)
If you added the LIMIT clause to the posts query, it would return a total of 10 posts and not 10 posts per category as you might expect.
Getting back to your problem, I would eager load all posts and then limit the loaded collection using first(10)
categories = Category.includes(:posts).all
categories.first.posts.first(10)
Although you're loading more models into memory, this is bound to be more performant since you're only making 2 calls against the database vs. n+1. Cheers.

Related

Rails: Why are has_many relations not saved in the variable/attribute like belongs_to?

So let's say I have the following models in Rails.
class Post < ApplicationRecord
belongs_to :user
end
class User < ApplicationRecord
has_many :posts
end
When I put the Post instance in a variable and call user on it, the following sql query runs once, after that the result it is saved/cached.
post.user
User Load (0.9ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 1 LIMIT 1
#<User id: 1 ...>
post.user
#<User id: 1 ...>
etc.
However, when I go the other way around with user.posts, it always runs the query.
user.posts
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
user.posts
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
etc.
Unless I convert it to an array, in which case it does get saved.
user.posts.to_a
Post Load (1.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`user_id` = 1 LIMIT 11
user.posts
# No query here.
But user.posts still produces a ActiveRecord::Associations::CollectionProxy object on which I can call other activerecord methods. So no real downside here.
Why does this work like this? It seems like an unnecessary impact on sql optimization. The obvious reason I can think of is that user.posts updates correctly when a Post is created after user is set. But in reality that doesn't happen so much imo, since variables are mostly set and reset in consecutively ran Controller actions. Now I know there is a caching system in place that shows a CACHE Post sql in the server logs, but I can't really rely on that when I'm working with activerecord objects between models or with more complex queries.
Am I missing some best practice here? Or an obvious setting that fixes exactly this?
You're examining this in irb and that's why you're seeing the query always running.
In an actual block of code in a controller or other class if you were to write
#users_posts = user.posts
The query is NOT executed... not until you iterate through the collection, or request #count
irb, to be helpful, always runs queries immediately

Count records after where clause

I have three models: Catalog, Upload and Product. A product belongs to a catalog, and an upload belongs to a product.
I need to count the number of uploads for all the products of a given catalog.
This is the way I've been doing it so far, which is incredibly slow for a large amount of uploads or products:
#products = Product.where(catalog_id: 123)
#uploads_count = Upload.where(product_id: #products.pluck(:id)).count
I'd like to avoid loading all the products just for a count.
Should I use raw SQL or is there a better way to do this with ActiveRecord ?
This should do it for you:
Upload.joins(:product).where(products: { catalog_id: 123 }).count
Using joins creates an INNER JOIN between the two tables, allowing you to query the products table as above.
Note the singular and plural uses of product - the joins should reflect the association (the upload belongs to one product), while the where clause always uses the table name, typically pluralised.
The SQL will look similar to:
SELECT "uploads".* FROM "uploads"
INNER JOIN "products"
ON "products"."id" = "uploads"."product_id"
WHERE "products"."catalog_id" = 123
If you need to have more information on the catalog you can also include this, something like the following:
Upload.joins(product: :catalog).where(products: { catalogs: { whatever: 'you want to query' } }).count
Bear in mind, using joins is just for a query such as this. If you need to access attributes of the product or catalog, you should use another approach, such as includes, to preload the data and avoid N + 1 queries. There's a good read here if you're interested.
Another way to avoid selecting records is to use sub-query. This can be done the following way:
query = User.where(id: 1..100)
User.where(id: query.select(:id)).count
# [DEBUG] (10.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" IN (SELECT "users"."id" FROM "users" WHERE ("users"."id" BETWEEN $1 AND $2)) [["id", 1], ["id", 100]]
# => 33
So, User.where(id: 1..100) prepares a query, that can be used as a sub-select. .select(:field) tells what field you are interested in.
Though for a basic count, SRack provides a good answer.

eager loading the first record of an association

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code
This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.
To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.
There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

Eager loading generating slower queries

I'm optimizing my app and noticed something interesting. I originally had this statement in my controller
#votes = Vote.paginate(:page => params[:page], :order=>"created_at DESC")
and this in my view
<% #votes.each do |vote| %>
<tr>
<td><%= vote.user.display_name %></td>
...
I tried changing the controller to use eager loading:
#votes = Vote.includes(:user).paginate(:page => params[:page],
:order=>"created_at DESC")
In doing so, I noticed that my ActiveRecord query time to load votes/index doubled from 180 ms to 440 ms. The number of queries was successfully cut down with eager loading. However, I found this one time-consuming query in the eager load situation only:
SQL (306.5ms) SELECT COUNT(DISTINCT "votes"."id") FROM "votes" LEFT OUTER JOIN "users" ON "users"."id" = "votes"."user_id"
Why is my code requesting a count on a left outer join? It's not present in the non-eager-load case. In the non-eager-load case, this is the closest statement I can find:
SQL (30.5ms) SELECT COUNT(*) FROM "votes"
Is this something related to paginate? Is it some combination of the two?
Yes, that query seems to be generated by the pagination plugin. This query is necessary to estimate the total number of pages.
But if you know the number of records anyway (by doing a simple SELECT COUNT(*) FROM "votes" before), you can pass that number to will_paginate with the :total_entries option!
(See WillPaginate::Finder::ClassMethods for more info.)
Btw, have you created an index for votes.user_id? May be that is slowing down the query. I'm wondering why the DISTINCT clause should take up so much time as id probably already has a unique constraint (if not, try adding one).

How do I do a join in ActiveRecord after records have been returned?

I am using ActiveRecord in Rails 3 to pull data from two different tables in two different databases. These databases can not join on each other, but I have the need to do a simple join after-the-fact. I would like to preserve the relation so that I can chain it down the line.
here is a simplified version of what I am doing
browsers = Browser.all # <-- this is fairly small and can reside in memory
events = Event.where(:row_date=>Date.today).select(:name, :browser_id)
So as you can see, I want to join browsers in on the events relation, where browser_id should equal browsers.name. events is a relation and I can still add clauses to it down the line, so I dont want to run the query on the db just yet. How would I accomplish this?
Edit
For those that would like to see some code for the answer I accepted below, here is what I came up with:
class EventLog < ActiveRecord::Base
belongs_to :browser
def get_todays_events
Event.where(:row_date=>Date.today).select(:name, :browser_id).includes(:browser)
end
end
would let me get the browser name in the following manner
get_todays_events.browser.name
I would accomplish this by using an :include. Attempting to do this in Ruby will cause you nothing but grief. You can chain onto an include just fine.
joins does create SQL joins as expected in the current Rails 5:
pry(main)> Customer.joins(:orders).limit(5)
Customer Load (0.2ms) SELECT `customers`.* FROM `customers` INNER JOIN `orders` ON `orders`.`customer_id` = `customers`.`id` LIMIT 5
=> [#<Customer:0x007fb869f11fe8
...
This should be vastly faster, because it only requires a single database query, whereas includes will perform 1 + <number of rows in first table> + <number of rows in second table>...
Here's an example where includes requires 1750x as long as joins:
pry(main)> benchmark do
Order.joins(:address, :payments, :customer, :packages).all.size
> 0.02456 seconds
pry(main)> benchmark do
[14] pry(main)* Order.includes(:address, :payments, :customer, :packages).all.map(&:zip).max
[14] pry(main)*end
=> 35.607257 seconds

Resources