Rails --> an n+1 database issue that won't go away - ruby-on-rails

I'm trying to optimise some N+1 queries in active record for the first time. There are 3 to kill - 2 went very easily with a .includes call, but I can't for the life of me figure out why the third is still calling a bunch of queries. Relevant code below - if anyone has any suggestions, I'd be really appreciative.
CONTROLLER:
#enquiries = Comment.includes(:children).faqs_for_project(#project)
MODEL;
def self.faqs_for_project(project)
Comment.for_project_and_enquiries(project, project.enquiries).where(:published => true).order("created_at DESC")
end
(and the relevant scope)
scope :for_project_and_enquiries, lambda{|p, qs| where('(commentable_type = ? and commentable_id = ?) or (commentable_type = ? and commentable_id IN (?))', "Project", p.id, "Enquiry", qs.collect{|q| q.id})}
VIEW:
...
= render :partial => 'comments/comment', :collection => #enquries
...
(and that offending line in the partial)
...
= 'Read by ' + pluralize(comment.acknowledgers.count, 'lead')
...
Two SQL queries are called for each comment. The 2 queries are:
SQL (2.8ms) SELECT COUNT(*) FROM "users" INNER JOIN "acknowledgements" ON "users".id = "acknowledgements".user_id WHERE (("acknowledgements".feedback_type = 'Comment') AND ("acknowledgements".feedback_id = 177621))
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1295 LIMIT 1
I would have thought appending (:user, :acknowledgements) into the controller's .includes would have solved the problem, but it doesn't seem to have any effect. If anyone has any suggestions on what I'm missing, I'd be really appreciative

I believe in your Comment table you want to add a :acknowledgers_count column as a counter cache
has_many :acknowledgers, ....., counter_cache: true
You will need to create a migration to add the :acknowledgers_count column to the comments table. Rails should take care of the rest.
You can learn more about the ActiveRecord::CounterCache api here.
The count method in comment.acknowledgers.count is overloaded in ActiveRecord to first check if a counter cache column exists, and if it does, it returns that directly from the model (in this case the Comment model) without having to touch the database again.
Finally, there was very recently a great Railscast about a gem call Bullet that can help you identify these query issues and guide you toward a solution. It covers both counter caches and N+1 queries.
As #ismaelga pointed out in a comment to this answer, it's a generally better practice to call .size instead of .count on a relation. Check out the source for size:
def size
loaded? ? #records.length : count
end
If the relation is already loaded it will just call length on it, otherwise it will call count. It's an extra check to try and prevent the database from unnecessarily being queried.

Related

eager loading the first record of an association

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code
This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.
To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.
There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

Eager loading generating slower queries

I'm optimizing my app and noticed something interesting. I originally had this statement in my controller
#votes = Vote.paginate(:page => params[:page], :order=>"created_at DESC")
and this in my view
<% #votes.each do |vote| %>
<tr>
<td><%= vote.user.display_name %></td>
...
I tried changing the controller to use eager loading:
#votes = Vote.includes(:user).paginate(:page => params[:page],
:order=>"created_at DESC")
In doing so, I noticed that my ActiveRecord query time to load votes/index doubled from 180 ms to 440 ms. The number of queries was successfully cut down with eager loading. However, I found this one time-consuming query in the eager load situation only:
SQL (306.5ms) SELECT COUNT(DISTINCT "votes"."id") FROM "votes" LEFT OUTER JOIN "users" ON "users"."id" = "votes"."user_id"
Why is my code requesting a count on a left outer join? It's not present in the non-eager-load case. In the non-eager-load case, this is the closest statement I can find:
SQL (30.5ms) SELECT COUNT(*) FROM "votes"
Is this something related to paginate? Is it some combination of the two?
Yes, that query seems to be generated by the pagination plugin. This query is necessary to estimate the total number of pages.
But if you know the number of records anyway (by doing a simple SELECT COUNT(*) FROM "votes" before), you can pass that number to will_paginate with the :total_entries option!
(See WillPaginate::Finder::ClassMethods for more info.)
Btw, have you created an index for votes.user_id? May be that is slowing down the query. I'm wondering why the DISTINCT clause should take up so much time as id probably already has a unique constraint (if not, try adding one).

PostgreSQL, Rails + Heroku, Column must appear in "group by"

I'm getting this error when I deploy my app on Heroku:
Started GET "/collections/transect/search?utf8=%E2%9C%93&search%5Btagged_with%5D=village&commit=Search" for 98.201.59.6 at 2011-03-27 17:02:12 -0700
ActionView::Template::Error (PGError: ERROR: column "photos.custom_title" must appear in the GROUP BY clause or be used in an aggregate function
: SELECT "photos".* FROM "photos" INNER JOIN "taggings" ON "photos"."id" = "taggings"."photo_id" INNER JOIN "tags" ON "tags"."id" = "taggings"."tag_id" WHERE "tags"."name" IN ('village') AND ("photos".collection_id = 1) GROUP BY photos.id LIMIT 20 OFFSET 0):
17:
18: - #bodyclass = 'dark'
19: #search_view.photo_tiles
20: = render :partial => 'collections/photos/alt_tiles', :collection => #photos, :as => :photo
app/views/collections/search.html.haml:20:in `_app_views_collections_search_html_haml__2343730670144375006_16241280__2249843891577483539'
I saw these similar questions (1,2).
The problem is, nothing in this view is asking for the custom_title attribute, nor am I executing a query with a "group_by" clause.
Here's the partial that seems to trigger the error:
- ((photo_counter+1) % 5 == 0) ? #class = 'last' : #class = ''
.photo{ :class => #class }
.alt_tile
= link_to( image_tag(photo.file.url(:tile)), collection_photo_path(#collection,photo), :class => 'img_container' )
.location= photo.location(:min)
.tags= photo.tag_array.join(' | ')
Here's the collections#search action which is what raised the error:
def search
#curator_toolbar = true
#collection = Collection.find(params[:id])
#search = #collection.photos.search(params[:search])
#photos = #search.page(params[:page]).per(20)
end
So it looks like maybe this is a plugin issue? I'm using MetaSearch for search functionality and Kaminari for pagination. Does anyone have any ideas or suggestions as to what would cause this specifically and how I can possibly fix it?
--EDIT--
Ok, I seem to have found the real problem:
Using MetaSearch with my keyword tags model, I created a search method that looks like this:
def self.tagged_with( string )
array = string.split(',').map{ |s| s.lstrip }
joins(:tags).where('tags.name' => array ).group('photos.id')
end
Now, I was given a lot of help in creating this method -- as I mentioned before I'm a total SQL moron.
This method works on SQLite but not on PostgreSQL because whenever keywords are included in a search it triggers the "group_by" problem.
So, in this question it seems to indicate that I need to put every column that is part of my photo model in the "group" argument or Postgre will break.
That horrifies me for several reasons:
My photo model is pretty complex and has a ton of fields.
My app is still in development and the photo model changes more than any other.
I don't want to have my code breaking every time someone touches the photo model in the future if they forget to add the columns to the group statement on the tag searching argument.
So, can anyone help me understand how to rewrite this method so that it won't break PostgreSQL -- and ideally so that I won't have to include a list of all the fields that belong to this model in the solution, or at least not a manually maintained list?
So, it turns out I could solve this problem by replacing "group" with "select" in my tagged_with method.
def self.tagged_with( string )
array = string.split(',').map{ |s| s.lstrip }
select('distinct photos.*').joins(:tags).where('tags.name' => array )
end
Problem solved! See this article for a great explanation as to why this is a better idea anyway. (Sorry, web site was removed later on and I don't recall what it said.) Also, thanks to Mark Westling for his answer on a spinoff question that solved my problem.

Rails 3 Conditions on Eager Loaded Association

I'm having trouble with Rails 3 using conditions on an associated table while eager loading. It appears that Rails is applying the condition when it loads the original model data, so it won't load the parent model unless a non-zero number of the child/associated models match the condition. This is easier to explain in code (simplified for example):
#post = Post.includes(:comments).where(:comments => { :approved => true }).find(1)
This would generate a SQL query similar to:
SELECT DISTINCT `posts`.id FROM `posts`
LEFT OUTER JOIN `comments` ON `comments`.`post_id` = `posts`.`id`
WHERE (`comments`.`approved` = 1) AND (`posts`.`id` = '1')
LIMIT 1
In the case that there aren't any comments that meet the approved = 1 condition, no rows are returned, and thus the Post never gets loaded at all.
What is the right way to load a post and the associated comments eagerly with a condition on the comments?
Update
I'd stil love to hear a better way of doing this, but for now I'm using the following to work around it (works with deeply nested eager loading):
#post = Post.find(1)
#comments = #post.comments.where(:approved => true).all
# allows deeper/more complex nesting without getting into SQL:
#post = Post.includes(:author => [ :websites, :photo ]).find(1)
#comments = #post.comments.includes(:editor).where(:approved => true).all
I guess what you are looking for is joins method, it will let you put your condition within join definition, not outside of it. For example:
#post = Post.joins("LEFT JOIN comments on posts.id = comments.post_id AND comments.approved = 1").first
Not sure about the correctness of the condition itself but you get my point.
Unfortunately you have to use that ugly string as joins is using INNER JOIN if you pass array/hash.
There's more about joins at rails guides
Update: There might be some nugget of wisdom in this post on includes vs eager_load vs preload.
I'd still love to hear a better way of doing this, but for now I'm using the following to work around it (works with deeply nested eager loading, unlike using joins):
#post = Post.find(1)
#comments = #post.comments.where(:approved => true).all
# allows deeper/more complex nesting without getting into SQL:
#post = Post.includes(:author => [ :websites, :photo ]).find(1)
#comments = #post.comments.includes(:editor).where(:approved => true).all

How to ensure sqlite isn't caching specific select queries?

I'm in the situation that I'm using sqlite with ActiveRecord and Rails (also, this is JRuby and so I'm actually using the jdbcsqlite adapter, in case that matters). Now, I'm trying to insert a row into the table attention_seekers, but only if there is no other existing similar row. Accordingly,
unless AttentionSeeker.find(:first, :conditions => {:key_id => key.id, :locale_id => l.id})
item = AttentionSeeker.new(:key_id => key.id, :locale_id => l.id)
item.save
end
This is the generated output in the log:
CACHE (0.0ms) SELECT * FROM attention_seekers WHERE (attention_seekers.key_id = 318 AND attention_seekers.locale_id = 20)
AttentionSeeker Create (1.0ms) INSERT INTO attention_seekers (key_id, locale_id) VALUES(318, 20)
CACHE (0.0ms) SELECT * FROM attention_seekers WHERE (attention_seekers.key_id = 318 AND attention_seekers.locale_id = 20)
AttentionSeeker Create (2.0ms) INSERT INTO attention_seekers (key_id, locale_id) VALUES(318, 20)
As you can see, for some reason the find is being cached, even though I'm inserting elements which affect it. What am I doing wrong/how can I stop this behaviour?
I did some digging and came across this helpful blog post, with more information available here. My solution (using the validation that Mike Buckbee suggested - thanks!):
AttentionSeeker.uncached do
item = AttentionSeeker.new(:key_id => key.id, :locale_id => l.id)
item.save
end
Instead of putting this code in your controller (which is where I'm guessing it is), you may want to consider using a validation instead, which I think would solve the problem:
class AttentionSeeker < ActiveRecord::Base
validates_uniqueness_of :key_id, :scope => :locale_id
end
Please note the "scope" option in the validation rule.
Failing that you could try wrapping the query in a Transaction
Failing that, and this seems incredibly janky you could add a cachebuster to the query itself. Something like
buster = rand(Time.now)
attention_seeker = AttentionSeeker.find(:first, :conditions => ["#{buster} = #{buster}"])
Which should give you a unique query every time through your loop.
Unique indices on the schema-level are safer than validates_uniqueness_of.
See http://railswarts.blogspot.com/2007/11/validatesuniquenessof-is-broken-and.html
Stephan

Resources