Rails Eager Loading Question Find(:all, :include => [:model]) - ruby-on-rails

I have a Topic and a Project model. I have a Many-to-many ass between them (HABTM one).
In the Topic's Index Page, I want to display the number of projects that each topic have. So I have
#topics = Topic.all(:include => [:projects])
In my controller, and so far so good. The problem is that the Project Model is so big that the query is still really slow
Topic Load (1.5ms) SELECT * FROM "topics"
Project Load (109.2ms) SELECT "projects".*, t0.topic_id as the_parent_record_id FROM "projects" INNER JOIN "projects_topics" t0 ON "projects".id = t0.project_id WHERE (t0.topic_id IN (1,2,3,4,5,6,7,8,9,10,11))
Is there a way to make the second query not to select * but just the name or the ID? Because the counter_cache is not supported by the HABTM Ass, and I don't really want to implement it by myself... so is there a way to make this second query faster?
I just need to pull the count without loading the whole project object...
Thanks in advance,
Nicolás Hock Isaza

counter_cache is very easy to implement
you can convert habtm to double has_many, i.e. has_many :projects_topics in both project and topic model (and belongs_to in projects_topics) and then use counter_cache or do eager loading only on projects_topics
you can do :select => "count(projects_topics.id)", :group => "topics.id" but this won't work well with postgresql if you care about it...
The second option is the best IMO, I usually don't use habtm at all, only double has_many :)

To expand on Devenv's answer counter cache is what you would typically use for this kind of scenario.
From the api docs:
Caches the number of belonging objects
on the associate class through the use
of increment_counter and
decrement_counter. The counter cache
is incremented when an object of this
class is created and decremented when
it‘s destroyed. This requires that a
column named #{table_name}_count (such
as comments_count for a belonging
Comment class) is used on the
associate class (such as a Post
class). You can also specify a custom
counter cache column by providing a
column name instead of a true/false
value to this option (e.g.,
:counter_cache => :my_custom_counter.)
Note: Specifying a counter cache will
add it to that model‘s list of
readonly attributes using
attr_readonly.
Here is a screen cast from ryan bates' railscasts on counter_cache.
Here is an answer to a question I asked half a year ago where the solution was an easily implemented home-brew counter cache.

Related

Rails subquery reduce amount of raw SQL

I have two ActiveRecord models: Post and Vote. I want a make a simple query:
SELECT *,
(SELECT COUNT(*)
FROM votes
WHERE votes.id = posts.id) AS vote_count
FROM posts
I am wondering what's the best way to do it in activerecord DSL. My goal is to minimize the amount of SQL I have to write.
I can do Post.select("COUNT(*) from votes where votes.id = posts.id as vote_count")
Two problems with this:
Raw SQL. Anyway to write this in DSL?
This returns only attribute vote_count and not "*" + vote_count. I can append .select("*") but I will be repeating this every time. Is there an much better/DRY way to do this?
Thanks
Well, if you want to reduce amount of SQL, you can split that query into smaller two end execute them separately. For instance, the votes counting part could be extracted to query:
SELECT votes.id, COUNT(*) FROM votes GROUP BY votes.id;
which you may write with ActiveRecord methods as:
Vote.group(:id).count
You can store the result for later use and access it directly from Post model, for example you may define #votes_count as a method:
class Post
def votes_count
##votes_count_cache ||= Vote.group(:id).count
##votes_count_cache[id] || 0
end
end
(Of course every use of cache raises a question about invalidating or updating it, but this is out of the scope of this topic.)
But I strongly encourage you to consider yet another approach.
I believe writing complicated queries like yours with ActiveRecord methods — even if would be possible — or splitting queries into two as I proposed earlier are both bad ideas. They result in extremely cluttered code, far less readable than raw SQL. Instead, I suggest introducing query objects. IMO there is nothing wrong in using raw, complicated SQL when it's hidden behind nice interface. See: M. Fowler's P of EAA and Brynary's post on Code Climate Blog.
How about doing this with no additional SQL at all - consider using the Rails counter_cache feature.
If you add an integer votes_count column to the posts table, you can get Rails to automatically increment and decrement that counter by changing the belongs_to declaration in Vote to:
belongs_to :post, counter_cache: true
Rails will then keep each Post updated with the number of votes it has. That way the count is already calculated and no sub-query is needed.
Maybe you can create mysql view and just map it to new AR model. It works similar way to table, you just need to specify with set_table_name "your_view_name"....maybe on DB level it will work faster and will be automatically re-calculating.
Just stumbled upon postgres_ext gem which adds support for Common Table Expressions in Arel and ActiveRecord which is exactly what you asked. Gem is not for SQLite, but perhaps some portions could be extracted or serve as examples.

Sorting elements of model depending upon habtm association

I have to models named attachments and users associated by has_and_belongs_to_many. now i have to find all attachments sorted in such a way that attachments having association will be displayed first and then those with no association. How can i do this?
One simple and relatively efficient way to do this would be to add a counter cache to your Attachment model. The counter cache would store and keep up-to-date the number of associations in a column on your attachments table, so you could do Attachment.order( 'user_attachments_count DESC' ).
Unfortunately HABTM does not support counter cache, so you would have to pop up a "middle-man" model between the two others just to get access to the join table.
Another way (yet with poor performance) is to simply use :
#attachments = Attachment.includes(:users)
#sorted = #attachments.sort_by {|r| r.to_a.size }.reverse!
Well, if it doesn't fit, you can always start sweating over a SQL query...

Rails, data structure and performance

Let's say I have a rails app with 3 tables, one for questions, one for options (possible answers to this question), and one for votes.
Currently, when requesting the statistics on a given question, I have to make a SQL query for each option which will look in the "votes" table (around 1.5 million entries) and count the number of times this option has been selected. It's slow and takes 4/5 seconds.
I was thinking of adding a column directly in the question table which would store the statistics and update them each time someone makes a vote. Is that good practice ? Because it seems redundant to the information that is already in the votes table, only it would be faster to load.
Or maybe I should create another table which would save these statistics for each question ?
Thanks for your advice !
Rails offers a feature called counter_cache which will serve your purpose
Add the counter_cache option to votes model
class Vote < AR::Base
belongs_to :question, :counter_cache => true
end
and the following migration
add_column :questions, :votes_count, :integer, :default => 0
This should increment the votes_count field in questions table for every new record in votes table
For more info: RailsCast
It would be a wise decision, ActiveRecord:CounterCache is made just for that purpose.
Also, there's a Railscast for that
You probably can do a "clever" SQL query using GROUP BY that will give you the expected result in one query. If you are query is that slow you'll probably need to add some indexes on your table.

How do I get Rails to eager load counts?

This is related to a question a year and change ago.
I put up an example of the question that should work out of the box, provided you have sqlite3 available: https://github.com/cairo140/rails-eager-loading-counts-demo
Installation instructions (for the main branch)
git clone git://github.com/cairo140/rails-eager-loading-counts-demo.git
cd rails-eager-loading-counts-demo
rails s
I have a fuller write-up in the repository, but my general question is this.
How can I make Rails eager load counts in a way that minimizes db queries across the board?
The n+1 problem emerges whenever you use #count on an association, despite having included that association via #includes(:associated) in the ActiveRelation. A workaround is to use #length, but this works well only when the object it's being called on has already been loaded up, not to mention that I suspect it duplicates something that the Rails internals have done already. Also, an issue with using #length is that it results in an unfortunate over-loading when the association was not loaded to begin with and the count is all you need.
From the readme:
We can dodge this issue by running #length on the posts array (see appendix), which is already loaded, but it would be nice to have count readily available as well. Not only is it more consistent; it provides a path of access that doesn't necessarily require posts to be loaded. For instance, if you have a partial that displays the count no matter what, but half the time, the partial is called with posts loaded and half the time without, you are faced with the following scenario:
Using #count
n COUNT style queries when posts are already loaded
n COUNT style queries when posts are not already loaded
Using #length
Zero additional queries when posts are already loaded
n * style queries when posts are not already loaded
Between these two choices, there is no dominant option. But it would be nice to revise #count to defer to #length or access the length that is some other way stored behind the scenes so that we can have the following scenario:
Using revised #count
Zero additional queries when posts are already loaded
n COUNT style queries when posts are not already loaded
So what's the correct approach here? Is there something I've overlooked (very, very likely)?
As #apneadiving suggested, counter_cache works well because the counter column gets automatically updated when records are added or removed. So when you load the parent object, the count is included in the object without needing to access the other table.
However, if for whatever reason you don't like that approach, you could do this:
Post.find(:all,
:select => "posts.*, count(comments.id) `comments_count`",
:joins => "left join comments on comments.post_id = posts.id")
An alternative approach to the one of Zubin:
Post.select('posts.*, count(comments.id) `comments_count`').joins(:comments).group('posts.id')
It appears that the best way to implement this sort of facility might be to create SQL Views (ref: here and here) for the seperate model-and-child-count objects that you want; and their associated ActiveRecord models.
You might be able to be very clever and use subclassing on the original model combined with set_table_name :sql_view_name to retain all the original methods on the objects, and maybe even some of their associations.
For instance, say we were to add 'Post.has_many :comments' to your example, like in #Zubin's answer above; then one might be able to do:
class CreatePostsWithCommentsCountsView < ActiveRecord::Migration
def self.up
#Create SQL View called posts_with_comments_counts which maps over
# select posts.*, count(comments.id) as comments_count from posts
# left outer join comments on comments.post_id = posts.id
# group by posts.id
# (As zubin pointed out above.)
#*Except* this is in SQL so perhaps we'll be able to do further
# reducing queries against it *as though it were any other table.*
end
end
class PostWithCommentsCount < Post #Here there be cleverness.
#The class definition sets up PWCC
# with all the regular methods of
# Post (pointing to the posts table
# due to Rails' STI facility.)
set_table_name :posts_with_comment_counts #But then we point it to the
# SQL view instead.
#If you don't really care about
# the methods of Post being in PWCC
# then you could just make it a
# normal subclass of AR::Base.
end
PostWithCommentsCount.all(:include => :user) #Obviously, this sort of "upward
# looking" include is best used in big lists like "latest posts" rather than
# "These posts for this user." But hopefully it illustrates the improved
# activerecordiness of this style of solution.
PostWithCommentsCount.all(:include => :comments) #And I'm pretty sure you
# should be able to do this without issue as well. And it _should_ only be
# the two queries.
I have set up a small gem that adds an includes_count method to ActiveRecord, that uses a SELECT COUNT to fetch the number of records in an association, without resorting to a JOIN which might be expensive (depending on the case).
See https://github.com/manastech/includes-count
Hope it helps!

How to find all items not related to another model - Rails 3

I have a fairly complicated lookup i'm trying to do in Rails and I'm not entirely sure how hoping someone can help.
I have two models, User and Place.
A user is related to Place twice. Once for visited_places and once for planned_places. Its a many to many relationship but using has_many :through. Here's the relationship from User.
has_many :visited_places
has_many :visited, :class_name=>"Place", :through=>:visited_places, :source=>:place
has_many :planned_places
has_many :planned, :class_name=>"Place", :through=>:planned_places, :source=>:place
In place the relationship is also defined. Here's the definition there
has_many :visited_users, :class_name=>"User", :through=>:visited_places
has_many :planned_users, :class_name=>"User", :through=>:planned_places
I'm trying to write a find on Place that returns all places in the database that aren't related to a User through either visited or planned. Right now I'm accomplishing this by simply querying all Places and then subtracting visited and planned from the results but I want to add in pagination and I'm worried this could complicate that. Here's my current code.
all_places = Place.find(:all)
all_places = all_places - user.visited - user.planned
Anyone know how i can accomplish this in just a call to Place.find. Also this is a Rails 3 app so if any of the active record improvements make this easier they are an option.
How about something like:
unvisited_places = Place.find(:all, :conditions => "id NOT IN(#{visited_places.map(&:place_id)})")
That's the general idea -- it can be made more efficient and convenient depending on your final needs.
You don't show it but if I am right in assuming that the VisitedPlace and PlannedPlace models have a belongs_to :user relationships then those tables have a user_id secondary key, right?
So in that case I would think it would be most efficient to do this in the database in which case you are looking for a select across a table join of places, visited_places and planned_places where users.id is not in either of visited_places or planned_places
in sql:
select * from places where id not in
(
(select place_id from visited_places where user_id = ?)
union
(select place_id from planned_places where user_id=?)
)
If that query works, you can use as follows:
Places.find_by_sql(...the complete sql query ...)
I would not know how to write such a query, with an exclusion, in Rails 3 otherwise.
I ran into a similar desire recently... I wanted to get all Model1s that weren't associated with a Model2. Using Rails 4.1, here's what I did:
Model1.where.not(id: Model2.select(:user_id).uniq)
This creates a nested SELECT, like #nathanvda suggested, effectively letting the database do all the work. Example SQL produced is:
SELECT "model1s".* FROM "model1s" WHERE ("model1s"."id" NOT IN (SELECT DISTINCT "model2s"."model1_id" FROM "model2s"))

Resources