Rails ActiveRecord helper find method not eager loading association - ruby-on-rails

I have the following models: Game and Pick. There's a one to many association between Game and Pick. There's a third model called Player, a Player has many Picks.
There's a method in the Player class that finds a pick for a given game or creates a new one if it doesn't exist.
class Player < ActiveRecord::Base
has_many :picks
def pick_for_game(game)
game_id = game.instance_of?(Game) ? game.id : game
picks.find_or_initialize_by_game_id(game_id)
end
end
I want to eager load the games for each pick. However if I do
picks.find_or_initialize_by_game_id(game_id, :include => :game)
It first fetches the picks when this query is run (the method is run multiple times), then fetches the games as each pick is accessed. If I add a default_scope to the Pick class
class Pick < ActiveRecord::Base
belongs_to :game
belongs_to :player
default_scope :include => :game
end
It still generates 2 select statements for each pick, but now it loads the game right after the pick, but it still doesn't do a join like I'm expecting.
Pick Load (0.2ms) SELECT "picks".* FROM "picks" WHERE "picks"."game_id" = 1 AND ("picks".player_id = 1) LIMIT 1
Game Load (0.4ms) SELECT "games".* FROM "games" WHERE ("games"."id" = 1)

First, find doesn't support having include or join as a parameter. (As mipsy said, it doesn't make sense for find to support include as it would be the same number of queries as loading it later.)
Second, include eagerly loads the association, so something like
Person.includes(:company)
is roughly equivalent to doing:
Person.all.each { |person| Company.find(person.company_id) }
I say roughly equivalent to because the former has O(1) (really two) queries whereas the latter is O(n) queries, where n is the number of people.
A join, however, would be just one query, but the downside of a join is you can't always use the retrieved data to update the model. To do a join you would do:
Person.join(:companies)
You can read more on joining tables in the Rails Guide.
To sum up, joining isn't eagerly loading because it's not loading the association, it's loading both pieces of data together at once. I realize there's a weird fine line between the two, but eagerly loading is getting other data preemptively, but you wouldn't be getting that data later via a join, or you'd have already gotten it in your original query! Hope that makes sense.

This is the way it's meant to work, I think. Eager loading is primarily used to make iterations over large collections of models more efficient by fetching them all at once-- it won't make any difference if you're just dealing with a single object.

Related

Query Optimization with ActiveRecord for each method

Below mentioned query is taking too much time, not able to understand how to optimized it.
Code and Associations :
temp = []
platforms = current_user.company.advisory_platforms
platforms.each{ |x| temp << x.advisories.published.collect(&:id) }
class Advisory
has_many :advisory_platforms,:through =>:advisory_advisory_platforms
end
class AdvisoryPlatform
has_many :companies,:through => :company_advisory_platforms
has_many :company_advisory_platforms,:dependent => :destroy
has_many :advisory_advisory_platforms,:dependent => :destroy
has_many :advisories, :through => :advisory_advisory_platforms
end
There are three glaring performance issues in your example.
First, you are iterating the records using each which means that you are loading the entire record set into memory at once. If you must iterate records in this way you should always use find_each so it is done in batches.
Second, every iteration of your each loop is performing an additional SQL call to get its results. You want to limit SQL calls to the bare minimum.
Third, you are instantiating entire Rails models simply to collect a single value, which is very wasteful. Instantiating Rails models is expensive.
I'm going to solve these problems in two ways. First, construct an ActiveRecord relation that will access all the data you need in one query. Second, use pluck to grab the id you need without paying the model instantiation cost.
You didn't specify what published is doing so I am going to assume it is a scope on Advisory. You also left out some of the data model so I am going to have to make assumptions about your join models.
advisory_ids = AdvisoryAdvisoryPlatform
.where(advisory_platform_id: current_user.company.advisory_platforms)
.where(advisory_id: Advisory.published)
.pluck(:advisory_id)
If you pass a Relation object as the value of a field, ActiveRecord will convert it into a subquery.
So
where(advisory_id: Advisory.published)
is analogous to
WHERE advisory_id IN (SELECT id FROM advisories WHERE published = true)
(or whatever it is published is doing).

Queries with include in Rails

I have the following problem. I need to do a massive query of table named professionals but I need to optimize the query because for each professional I call the associated tables.
But I have a problem with two associated tables: comments and tariffs.
Comments:
I need to call 3 comments for each professional. I try with:
#professionals.includes(:comments).where(:comments => { type: 0 } ).last(3)
The problem the query only brings 3 professionals, not what I need, all the professionals with only three comments where type be equal to zero.
And when I try:
#professionals.includes(:comments).where(:comments => { type: 0 } )
The result is only professionals with (all the) comments when I need all the professional with or without comments. But if the professional have comments I only need the last three comments where the type be equals zero
Tariffs:
With tariffs I have a similar problem, in this case I need the last 4 tariffs for each professional. I try with:
#professionals.includes(:tariffs).last(4)
But only brings the last 4 professionals.
Models:
class Comment < ActiveRecord::Base
belongs_to :client
belongs_to :professional
end
class Professionals < ActiveRecord::Base
has_many :comment
end
You can't use limit on the joining table in ActiveRecord. The limit is applied to the first relation, which in this case happens to be #professionals.
You have a few choices choices:
Preload all comments for each professional and limit them on output (reduces the number of queries needed but increases memory consumption since you are potentially preloading a lot of AR objects).
Lazy load the required number of comments (increases the number of queries by n+1, but reduces the potential memory consumption).
Write a custom query with raw SQL.
If you preload everything, then you don't have to change much. Just limit the number of comments white iterating through each #professional.
#professionals.each do |professional|
#professional.comments.limit(3)
end
If you lazy load only what you need, then you would apply the limit scope to the comments relation.
#professionals.all
#professionals.each do |professional|
#professional.comments.where(type: 0).limit(3)
end
Writing a custom query is a bit more complex. But you might find that it might be less performant depending on the number of joins you have to make, and how well indexed your table is.
I suggest you take approach two, and use query and fragment caching to improve performance. For example:
- cache #professionals do
- #professionals.each do |professional|
- cache professional do
= professional.name
This approach will hit the database the first time, but after subsequent loads comments will be read from the cache, avoiding the DB hit. You can read more about caching in the Rails Guides.

Nested association/join in rails

I have a seat object that has a car object that has a owner that has a name. I want to display the car brand and the car's owner's name together. How do I do this in one query?
eg:
class Seat < ActiveRecord::Base
belongs_to :car
def description
"I am in a #{car.brand} belonging to #{car.owner.name}"
# --> how do I replace this with one query?
end
end
I'll note that this is a highly contrived example to simplify my question. I'm doing this thousands of times in a row, hence the need for more efficiency.
Let us say you are trying to query the Seat model, and you want to eager load the car and owner objects, you can use the includes clause.
Seat.includes(:car => :owner).where(:color => :red).each do |seat|
"I am in a #{seat.car.brand} belonging to #{seat.car.owner.name}"
end
Use default_scope
class Seat
default_scope includes([:car])
end
class Car
default_scope includes([:owner, :seats])
end
For multi-table joins that are often used in my application, I create a View in MySQL. Then create an ActiveRecord Rails model based on the view.
Depending on the SQL statement that powers the view, MySQL may even let the View be read/write. But I just go the simple route and always treat the view as being read-only. You can set the AR model as read only.
By using the Active Record model which uses the view, you get quick single query reads of the database. And they're even faster than normal since MySQL computes the SQL "plan" once for the view, enabling faster use of it.
Remember to also check that your foreign keys are all indexed. You don't want any table scans.

Is it possible to use a condition with count on a has_many relation?

I have a project list and want to display only projects that have tasks. Is it possible to use a condition with count on a has_many relation?
# get my project list
Project.includes(:tasks).where(...)
class Project < ActiveRecord::Base
has_many :tasks
class Task < ActiveRecord::Base
belongs_to :project
Currently i am doing this through a loop but i dont think that this is the right way.
Since you are already eager loading the tasks for a project you can use the following statement to get the projects with tasks.
# get my project list
Project.includes(:tasks).where("tasks.id IS NOT NULL")
This works because includes uses LEFT OUTER JOIN.
On the other hand if you don't want to eager load the tasks, you can use joins as it uses INNER JOIN.
Project.joins(:tasks).where(...)
The includes directive often indicates to simply eager-load those associations, not JOIN them in database-wise, so you can't really do conditions here without some additional work.
One way that scales well is to use the counter_cache feature of the association so you always have a numerical count of the number of tasks. You can even add an index on these to further improve the performance of your query.
The alternative is to try and work backwards from the tasks table, perhaps like:
Project.where('id IN (SELECT DISTINCT project_id FROM tasks)')
Presumably you have an index on project_id in your tasks table to make that a fairly inexpensive operation.
If the question is as simple as the title suggests, not sure why this wouldn't do the trick:
Project.joins(:tasks)
Unless specified otherwise, the join will be an inner join, and thus exclude any results whose projects do not have tasks, so perhaps that's all you need ... if you want to display all projects with tasks.
If you have some condition (for example, projects whose status is active) you can also specify a condition like
Project.joins(:tasks).where("status = 'active')
Or have I missed something?

Rails Eager Loading Question Find(:all, :include => [:model])

I have a Topic and a Project model. I have a Many-to-many ass between them (HABTM one).
In the Topic's Index Page, I want to display the number of projects that each topic have. So I have
#topics = Topic.all(:include => [:projects])
In my controller, and so far so good. The problem is that the Project Model is so big that the query is still really slow
Topic Load (1.5ms) SELECT * FROM "topics"
Project Load (109.2ms) SELECT "projects".*, t0.topic_id as the_parent_record_id FROM "projects" INNER JOIN "projects_topics" t0 ON "projects".id = t0.project_id WHERE (t0.topic_id IN (1,2,3,4,5,6,7,8,9,10,11))
Is there a way to make the second query not to select * but just the name or the ID? Because the counter_cache is not supported by the HABTM Ass, and I don't really want to implement it by myself... so is there a way to make this second query faster?
I just need to pull the count without loading the whole project object...
Thanks in advance,
Nicolás Hock Isaza
counter_cache is very easy to implement
you can convert habtm to double has_many, i.e. has_many :projects_topics in both project and topic model (and belongs_to in projects_topics) and then use counter_cache or do eager loading only on projects_topics
you can do :select => "count(projects_topics.id)", :group => "topics.id" but this won't work well with postgresql if you care about it...
The second option is the best IMO, I usually don't use habtm at all, only double has_many :)
To expand on Devenv's answer counter cache is what you would typically use for this kind of scenario.
From the api docs:
Caches the number of belonging objects
on the associate class through the use
of increment_counter and
decrement_counter. The counter cache
is incremented when an object of this
class is created and decremented when
it‘s destroyed. This requires that a
column named #{table_name}_count (such
as comments_count for a belonging
Comment class) is used on the
associate class (such as a Post
class). You can also specify a custom
counter cache column by providing a
column name instead of a true/false
value to this option (e.g.,
:counter_cache => :my_custom_counter.)
Note: Specifying a counter cache will
add it to that model‘s list of
readonly attributes using
attr_readonly.
Here is a screen cast from ryan bates' railscasts on counter_cache.
Here is an answer to a question I asked half a year ago where the solution was an easily implemented home-brew counter cache.

Resources