Eager loading generating slower queries - ruby-on-rails

I'm optimizing my app and noticed something interesting. I originally had this statement in my controller
#votes = Vote.paginate(:page => params[:page], :order=>"created_at DESC")
and this in my view
<% #votes.each do |vote| %>
<tr>
<td><%= vote.user.display_name %></td>
...
I tried changing the controller to use eager loading:
#votes = Vote.includes(:user).paginate(:page => params[:page],
:order=>"created_at DESC")
In doing so, I noticed that my ActiveRecord query time to load votes/index doubled from 180 ms to 440 ms. The number of queries was successfully cut down with eager loading. However, I found this one time-consuming query in the eager load situation only:
SQL (306.5ms) SELECT COUNT(DISTINCT "votes"."id") FROM "votes" LEFT OUTER JOIN "users" ON "users"."id" = "votes"."user_id"
Why is my code requesting a count on a left outer join? It's not present in the non-eager-load case. In the non-eager-load case, this is the closest statement I can find:
SQL (30.5ms) SELECT COUNT(*) FROM "votes"
Is this something related to paginate? Is it some combination of the two?

Yes, that query seems to be generated by the pagination plugin. This query is necessary to estimate the total number of pages.
But if you know the number of records anyway (by doing a simple SELECT COUNT(*) FROM "votes" before), you can pass that number to will_paginate with the :total_entries option!
(See WillPaginate::Finder::ClassMethods for more info.)
Btw, have you created an index for votes.user_id? May be that is slowing down the query. I'm wondering why the DISTINCT clause should take up so much time as id probably already has a unique constraint (if not, try adding one).

Related

Count records after where clause

I have three models: Catalog, Upload and Product. A product belongs to a catalog, and an upload belongs to a product.
I need to count the number of uploads for all the products of a given catalog.
This is the way I've been doing it so far, which is incredibly slow for a large amount of uploads or products:
#products = Product.where(catalog_id: 123)
#uploads_count = Upload.where(product_id: #products.pluck(:id)).count
I'd like to avoid loading all the products just for a count.
Should I use raw SQL or is there a better way to do this with ActiveRecord ?
This should do it for you:
Upload.joins(:product).where(products: { catalog_id: 123 }).count
Using joins creates an INNER JOIN between the two tables, allowing you to query the products table as above.
Note the singular and plural uses of product - the joins should reflect the association (the upload belongs to one product), while the where clause always uses the table name, typically pluralised.
The SQL will look similar to:
SELECT "uploads".* FROM "uploads"
INNER JOIN "products"
ON "products"."id" = "uploads"."product_id"
WHERE "products"."catalog_id" = 123
If you need to have more information on the catalog you can also include this, something like the following:
Upload.joins(product: :catalog).where(products: { catalogs: { whatever: 'you want to query' } }).count
Bear in mind, using joins is just for a query such as this. If you need to access attributes of the product or catalog, you should use another approach, such as includes, to preload the data and avoid N + 1 queries. There's a good read here if you're interested.
Another way to avoid selecting records is to use sub-query. This can be done the following way:
query = User.where(id: 1..100)
User.where(id: query.select(:id)).count
# [DEBUG] (10.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" IN (SELECT "users"."id" FROM "users" WHERE ("users"."id" BETWEEN $1 AND $2)) [["id", 1], ["id", 100]]
# => 33
So, User.where(id: 1..100) prepares a query, that can be used as a sub-select. .select(:field) tells what field you are interested in.
Though for a basic count, SRack provides a good answer.

Additive scope conditions for has_many :through

I want a user to be able to find all posts that have one or more tags. And I'd like the tags to be additive criteria, so for example you could search for posts that have just the 'News' tag, or you could search for posts that have both the 'News' and 'Science' tags.
Currently what I have, and it works, is a Post model, a Tag model, and a join model called Marking. Post has_many :tags, through: :markings. I get what I need by passing an array of Tag ids to a Post class method:
post.rb
def self.from_tag_id_array array
post_array = []
Marking.where(tag_id: array).group_by(&:post_id).each do |p_id,m_array|
post_array << p_id if m_array.map(&:tag_id).sort & array.sort == array.sort
end
where id: post_array
end
This seems like a clunky way to get there. Is there a way I can do this with a scope on an association or something of the like?
So the general rule of thumb with building these kinds of queries is to minimize work in "Ruby-land" and maximize work in "Database-land". In your solution above, you're fetching a set of markings with any tags in the set array, which presumably will be a very large set (all posts that have any of those tags). This is represented in a ruby array and processed (group_by is in Ruby-world, group is the equivalent in Database-land).
So aside from being hard-to-read, that solution is going to be slow for any large set of markings.
There are a couple ways to solve the problem without doing any heavy lifting in Ruby-world. One way is using subqueries, like this:
scope :with_tag_ids, ->(tag_ids) {
tag_ids.map { |tag_id|
joins(:markings).where(markings: { tag_id: tag_id })
}.reduce(all) { |scope, subquery| scope.where(id: subquery) }
}
This generates a query like this (again for tag_ids 5 and 8)
SELECT "posts".*
FROM "posts"
WHERE "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 5)
AND "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 8)
Note that since everything here is calculated directly in SQL, no arrays are generated or processed in Ruby. This will generally scale much better.
Alternatively, you can use COUNT and do it in a single query without subqueries:
scope :with_tag_ids, ->(tag_ids) {
joins(:markings).where(markings: { tag_id: tag_ids }).
group(:post_id).having('COUNT(posts.id) = ?', tag_ids.count)
}
Which generates SQL like this:
SELECT "posts".*
FROM "posts"
INNER JOIN "markings" ON "markings"."post_id" = "posts"."id"
WHERE "markings"."tag_id" IN (5, 8)
GROUP BY "post_id"
HAVING (COUNT(posts.id) = 2)
This assumes that you don't have multiple markings with the same pair of tag_id and post_id, which would throw off the count.
I would imagine that the last solution is probably the most efficient, but you should try different solutions and see what works best for your data.
See also: Query intersection with activerecord

Rails 4 Eager load limit subquery

Is there a way to avoid the n+1 problem when eager loading and also applying a limit to the subquery?
I want to avoid lots of sql queries like this:
Category.all.each do |category|
category.posts.limit(10)
end
But I also want to only get 10 posts per category, so the standard eager loading, which gets all the posts, does not suffice:
Category.includes(:posts).all
What is the best way to solve this problem? Is N+1 the only way to limit the amount of posts per category?
From the Rails docs
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects
So given the following model definition
class Category < ActiveRecord::Base
has_many :posts
has_many :included_posts, -> { limit 10 }, class_name: "Post"
end
Calling Category.find(1).included_posts would work as expected and apply the limit of 10 in the query. However, if you try to do Category.includes(:included_posts).all the limit option will be ignored. You can see why this is the case if you look at the SQL generated by an eager load
Category.includes(:posts).all
Category Load (0.2ms) SELECT "categories".* FROM "categories"
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."category_id" IN (1, 2, 3)
If you added the LIMIT clause to the posts query, it would return a total of 10 posts and not 10 posts per category as you might expect.
Getting back to your problem, I would eager load all posts and then limit the loaded collection using first(10)
categories = Category.includes(:posts).all
categories.first.posts.first(10)
Although you're loading more models into memory, this is bound to be more performant since you're only making 2 calls against the database vs. n+1. Cheers.

eager loading the first record of an association

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code
This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.
To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.
There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

Rails --> an n+1 database issue that won't go away

I'm trying to optimise some N+1 queries in active record for the first time. There are 3 to kill - 2 went very easily with a .includes call, but I can't for the life of me figure out why the third is still calling a bunch of queries. Relevant code below - if anyone has any suggestions, I'd be really appreciative.
CONTROLLER:
#enquiries = Comment.includes(:children).faqs_for_project(#project)
MODEL;
def self.faqs_for_project(project)
Comment.for_project_and_enquiries(project, project.enquiries).where(:published => true).order("created_at DESC")
end
(and the relevant scope)
scope :for_project_and_enquiries, lambda{|p, qs| where('(commentable_type = ? and commentable_id = ?) or (commentable_type = ? and commentable_id IN (?))', "Project", p.id, "Enquiry", qs.collect{|q| q.id})}
VIEW:
...
= render :partial => 'comments/comment', :collection => #enquries
...
(and that offending line in the partial)
...
= 'Read by ' + pluralize(comment.acknowledgers.count, 'lead')
...
Two SQL queries are called for each comment. The 2 queries are:
SQL (2.8ms) SELECT COUNT(*) FROM "users" INNER JOIN "acknowledgements" ON "users".id = "acknowledgements".user_id WHERE (("acknowledgements".feedback_type = 'Comment') AND ("acknowledgements".feedback_id = 177621))
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1295 LIMIT 1
I would have thought appending (:user, :acknowledgements) into the controller's .includes would have solved the problem, but it doesn't seem to have any effect. If anyone has any suggestions on what I'm missing, I'd be really appreciative
I believe in your Comment table you want to add a :acknowledgers_count column as a counter cache
has_many :acknowledgers, ....., counter_cache: true
You will need to create a migration to add the :acknowledgers_count column to the comments table. Rails should take care of the rest.
You can learn more about the ActiveRecord::CounterCache api here.
The count method in comment.acknowledgers.count is overloaded in ActiveRecord to first check if a counter cache column exists, and if it does, it returns that directly from the model (in this case the Comment model) without having to touch the database again.
Finally, there was very recently a great Railscast about a gem call Bullet that can help you identify these query issues and guide you toward a solution. It covers both counter caches and N+1 queries.
As #ismaelga pointed out in a comment to this answer, it's a generally better practice to call .size instead of .count on a relation. Check out the source for size:
def size
loaded? ? #records.length : count
end
If the relation is already loaded it will just call length on it, otherwise it will call count. It's an extra check to try and prevent the database from unnecessarily being queried.

Resources