How do I avoid multiple queries with :include in Rails? - ruby-on-rails

If I do this
post = Post.find_by_id(post_id, :include => :comments)
two queries are performed (one for post data and and another for the post's comments). Then when I do post.comments, another query is not performed because data is already cached.
Is there a way to do just one query and still access the comments via post.comments?

No, there is not. This is the intended behavior of :include, since the JOIN approach ultimately comes out to be inefficient.
For example, consider the following scenario: the Post model has 3 fields that you need to select, 2 fields for Comment, and this particular post has 100 comments. Rails could run a single JOIN query along the lines of:
SELECT post.id, post.title, post.author_id, comment.id, comment.body
FROM posts
INNER JOIN comments ON comment.post_id = post.id
WHERE post.id = 1
This would return the following table of results:
post.id | post.title | post.author_id | comment.id | comment.body
---------+------------+----------------+------------+--------------
1 | Hello! | 1 | 1 | First!
1 | Hello! | 1 | 2 | Second!
1 | Hello! | 1 | 3 | Third!
1 | Hello! | 1 | 4 | Fourth!
...96 more...
You can see the problem already. The single-query JOIN approach, though it returns the data you need, returns it redundantly. When the database server sends the result set to Rails, it will send the post's ID, title, and author ID 100 times each. Now, suppose that the Post had 10 fields you were interested in, 8 of which were text blocks. Eww. That's a lot of data. Transferring data from the database to Rails does take work on both sides, both in CPU cycles and RAM, so minimizing that data transfer is important for making the app run faster and leaner.
The Rails devs crunched the numbers, and most applications run better when using multiple queries that only fetch each bit of data once rather than one query that has the potential to get hugely redundant.
Of course, there comes a time in every developer's life when a join is necessary in order to run complex conditions, and that can be achieved by replacing :include with :joins. For prefetching relationships, however, the approach Rails takes in :include is much better for performance.

If you use this behaviour of eagerly-loaded associations, you'll get a single (and efficient) query.
Here is an example:
Say you have the following model (where :user is the foreign reference):
class Item < ActiveRecord::Base
attr_accessible :name, :user_id
belongs_to :user
end
Then executing this (note: the where part is crucial as it tricks Rails to produce that single query):
#items = Item.includes(:user).where("users.id IS NOT NULL").all
will result in a single SQL query (the syntax below is that of PostgreSQL):
SELECT "items"."id" AS t0_r0, "items"."user_id" AS t0_r1,
"items"."name" AS t0_r2, "items"."created_at" AS t0_r3,
"items"."updated_at" AS t0_r4, "users"."id" AS t1_r0,
"users"."email" AS t1_r1, "users"."created_at" AS t1_r4,
"users"."updated_at" AS t1_r5
FROM "measurements"
LEFT OUTER JOIN "users" ON "users"."id" = "items"."user_id"
WHERE (users.id IS NOT NULL)

Related

postgresql get list of unique list with order by another table column

Tables:
#leads
id | user_id |created_at | updated_at
#users
id | first_name
#todos
id | deadline_at | target_id
I want to get unique list of leads between two dates(deadline_at) with ordering by todos.deadline_at desc
I do:
SELECT distinct(leads.*), todos.deadline_at
FROM leads
INNER JOIN users ON users.id = leads.user_id
LEFT JOIN todos ON todos.target_id = leads.user_id
WHERE (todos.deadline_at between '2015-11-26T00:00:00+00:00' and '2015-11-26T23:59:59+00:00')
ORDER BY todos.deadline_at DESC;
This query returns right ordered list but with duplicates. If I use distinct or distinct on with leads.id, then postgresql requires me use it in order by - In that case I got wrong ordered list.
How do I can achive expected result?
Since you don't really need the users table.
Maybe try this?
Lead.joins("INNER JOIN todos ON leads.user_id = todos.target_id")
.where("leads.deadline_at" => (date_a..date_b))
.select("leads.*, todos.deadline_at")
.order("todos.deadline_at desc")
It seams that you're confusing the result of a sql table with joins and the same result after ActiveRecord treatment on an association.
I presume Lead has_many :todos, through: :user so you can do this :
Lead.eager_load(:todos).
where("leads.deadline_at" => (date_a..date_b)).
order("todos.deadline_at")
No need to apply distinct or whatever, ActiveRecord will sort out the leads from the todosand you'll have them in the right order with no duplicates. The raw sql result however will have plenty of duplicates.
If you want to achieve something similar in sql alone, you can use distinct or group by on leads.id, but then you'll lose all the todos it "contains". However you can use aggregate function to calculate/extract things on the "lost todo data".
For example :
Lead.joins(:todos).
group("leads.id").
select("leads.*, min(todos.deadline_at) as first_todo_deadline")
order("first_todo_deadline")
Notice that todos data is only available in the aggregate functions (min, count, avg, etc) since the todos are "compressed" if you wish in each lead!
Hope it makes sense.

eager loading the first record of an association

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code
This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.
To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.
There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

Chaining Rails 3 scopes in has_many through association

Is this doable?
I have the following scope:
class Thing < ActiveRecord::Base
scope :with_tag, lambda{ |tag| joins(:tags).where('tags.name = ?', tag.name)
.group('things.id') }
def withtag_search(tags)
tags.inject(scoped) do |tagged_things, tag|
tagged_things.with_tag(tag)
end
end
I get a result if there's a single tag in the array of tags passed in with Thing.withtag_search(array_of_tags) but if I pass multiple tags in that array I get an empty relation as the result. In case it helps:
Thing.withtag_search(["test_tag_1", "test_tag_2"])
SELECT "things".*
FROM "things"
INNER JOIN "things_tags" ON "things_tags"."thing_id" = "things"."id"
INNER JOIN "tags" ON "tags"."id" = "things_tags"."tag_id"
WHERE (tags.name = 'test_tag_1') AND (tags.name = 'test_tag_2')
GROUP BY things.id
=> [] # class is ActiveRecord::Relation
whereas
Thing.withtag_search(["test_tag_1"])
SELECT "things".*
FROM "things"
INNER JOIN "things_tags" ON "things_tags"."thing_id" = "things"."id"
INNER JOIN "tags" ON "tags"."id" = "things_tags"."tag_id"
WHERE (tags.name = 'test_tag_1')
GROUP BY things.id
=> [<Thing id:1, ... >, <Thing id:2, ... >] # Relation including correctly all
# Things with that tag
I want to be able to chain these relations together so that (among other reasons) I can use the Kaminari gem for pagination which only works on relations not arrays - so I need a scope to be returned.
I also ran into this problem. The problem is not Rails, the problems is definitely MySQL:
Your SQL will create following temporary JOIN-table (only neccesary fields are shown):
+-----------+-------------+---------+------------+
| things.id | things.name | tags.id | tags.name |
+-----------+-------------+---------+------------+
| 1 | ... | 1 | test_tag_1 |
+-----------+-------------+---------+------------+
| 1 | ... | 2 | test_tag_2 |
+-----------+-------------+---------+------------+
So instead joining all Tags to one specific Thing, it generates one row for each Tag-Thing combination (If you don't believe, just run COUNT(*) on this SQL statement).
The problem is that you query criteria looks like this: WHERE (tags.name = 'test_tag_1') AND (tags.name = 'test_tag_2') which will be checked against each of this rows, and never will be true. It's not possible for tags.name to equal both test_tag_1 and test_tag_2 at the same time!
The standard SQL solution is to use the SQL statement INTERSECT... but unfortunately not with MySQL.
The best solution is to run Thing.withtag_search for each of your tags, collect the returning objects, and select only objects which are included in each of the results, like so:
%w[test_tag_1 test_tag_2].collect do |tag|
Thing.withtag_search(tag)
end.inject(&:&)
If you want to get this as an ActiveRecord relation you can probably do this like so:
ids = %w[test_tag_1 test_tag_2].collect do |tag|
Thing.withtag_search(tag).collect(&:id)
end.inject(&:&)
Things.where(:id => ids)
The other solution (which I'm using) is to cache the tags in the Thing table, and do MySQL boolean search on it. I will give you more details on this solution if you want.
Anyways I hope this will help you. :)
This is rather complicated at a glance, but based on your SQL, you want:
WHERE (tags.name IN ( 'test_tag_1', 'test_tag_2'))
I haven't dealt much with Rails 3, but if you can adjust your JOIN appropriately, this should fix your issue. Have you tried a solution akin to:
joins(:tag).where('tags.name IN (?), tags.map { |tag| tag.name })
This way, you will JOIN the way you are expecting (UNION instead of INTERSECTION). I hope this is a helpful way of thinking about this problem.
Don't seem to be able to find a solution to this problem. So, instead of using Kaminari and rolling my own tagging I've switched to Acts-as-taggable-on and will-paginate

How do I do a join in ActiveRecord after records have been returned?

I am using ActiveRecord in Rails 3 to pull data from two different tables in two different databases. These databases can not join on each other, but I have the need to do a simple join after-the-fact. I would like to preserve the relation so that I can chain it down the line.
here is a simplified version of what I am doing
browsers = Browser.all # <-- this is fairly small and can reside in memory
events = Event.where(:row_date=>Date.today).select(:name, :browser_id)
So as you can see, I want to join browsers in on the events relation, where browser_id should equal browsers.name. events is a relation and I can still add clauses to it down the line, so I dont want to run the query on the db just yet. How would I accomplish this?
Edit
For those that would like to see some code for the answer I accepted below, here is what I came up with:
class EventLog < ActiveRecord::Base
belongs_to :browser
def get_todays_events
Event.where(:row_date=>Date.today).select(:name, :browser_id).includes(:browser)
end
end
would let me get the browser name in the following manner
get_todays_events.browser.name
I would accomplish this by using an :include. Attempting to do this in Ruby will cause you nothing but grief. You can chain onto an include just fine.
joins does create SQL joins as expected in the current Rails 5:
pry(main)> Customer.joins(:orders).limit(5)
Customer Load (0.2ms) SELECT `customers`.* FROM `customers` INNER JOIN `orders` ON `orders`.`customer_id` = `customers`.`id` LIMIT 5
=> [#<Customer:0x007fb869f11fe8
...
This should be vastly faster, because it only requires a single database query, whereas includes will perform 1 + <number of rows in first table> + <number of rows in second table>...
Here's an example where includes requires 1750x as long as joins:
pry(main)> benchmark do
Order.joins(:address, :payments, :customer, :packages).all.size
> 0.02456 seconds
pry(main)> benchmark do
[14] pry(main)* Order.includes(:address, :payments, :customer, :packages).all.map(&:zip).max
[14] pry(main)*end
=> 35.607257 seconds

activerecord has_many :through find with one sql call

I have a these 3 models:
class User < ActiveRecord::Base
has_many :permissions, :dependent => :destroy
has_many :roles, :through => :permissions
end
class Permission < ActiveRecord::Base
belongs_to :role
belongs_to :user
end
class Role < ActiveRecord::Base
has_many :permissions, :dependent => :destroy
has_many :users, :through => :permissions
end
I want to find a user and it's roles in one sql statement, but I can't seem to achieve this:
The following statement:
user = User.find_by_id(x, :include => :roles)
Gives me the following queries:
User Load (1.2ms) SELECT * FROM `users` WHERE (`users`.`id` = 1) LIMIT 1
Permission Load (0.8ms) SELECT `permissions`.* FROM `permissions` WHERE (`permissions`.user_id = 1)
Role Load (0.8ms) SELECT * FROM `roles` WHERE (`roles`.`id` IN (2,1))
Not exactly ideal. How do I do this so that it does one sql query with joins and loads the user's roles into memory so saying:
user.roles
doesn't issue a new sql query
Loading the Roles in a separate SQL query is actually an optimization called "Optimized Eager Loading".
Role Load (0.8ms) SELECT * FROM `roles` WHERE (`roles`.`id` IN (2,1))
(It is doing this instead of loading each role separately, the N+1 problem.)
The Rails team found it was usually faster to use an IN query with the associations looked up previously instead of doing a big join.
A join will only happen in this query if you add conditions on one of the other tables. Rails will detect this and do the join.
For example:
User.all(:include => :roles, :conditions => "roles.name = 'Admin'")
See the original ticket, this previous Stack Overflow question, and Fabio Akita's blog post about Optimized Eager Loading.
As Damien pointed out, if you really want a single query every time you should use join.
But you might not want a single SQL call. Here's why (from here):
Optimized Eager Loading
Let’s take a look at this:
Post.find(:all, :include => [:comments])
Until Rails 2.0 we would see something like the following SQL query in the log:
SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `comments`.`id` AS t1_r0, `comments`.`body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id
But now, in Rails 2.1, the same command will deliver different SQL queries. Actually at least 2, instead of 1. “And how can this be an improvement?” Let’s take a look at the generated SQL queries:
SELECT `posts`.`id`, `posts`.`title`, `posts`.`body` FROM `posts`
SELECT `comments`.`id`, `comments`.`body` FROM `comments` WHERE (`comments`.post_id IN (130049073,226779025,269986261,921194568,972244995))
The :include keyword for Eager Loading was implemented to tackle the dreaded 1+N problem. This problem happens when you have associations, then you load the parent object and start loading one association at a time, thus the 1+N problem. If your parent object has 100 children, you would run 101 queries, which is not good. One way to try to optimize this is to join everything using an OUTER JOIN clause in the SQL, that way both the parent and children objects are loaded at once in a single query.
Seemed like a good idea and actually still is. But for some situations, the monster outer join becomes slower than many smaller queries. A lot of discussion has been going on and you can take a look at the details at the tickets 9640, 9497, 9560, L109.
The bottom line is: generally it seems better to split a monster join into smaller ones, as you’ve seen in the above example. This avoid the cartesian product overload problem. For the uninitiated, let’s run the outer join version of the query:
mysql> SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `comments`.`id` AS t1_r0, `comments`.`body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id ;
+-----------+-----------------+--------+-----------+---------+
| t0_r0 | t0_r1 | t0_r2 | t1_r0 | t1_r1 |
+-----------+-----------------+--------+-----------+---------+
| 130049073 | Hello RailsConf | MyText | NULL | NULL |
| 226779025 | Hello Brazil | MyText | 816076421 | MyText5 |
| 269986261 | Hello World | MyText | 61594165 | MyText3 |
| 269986261 | Hello World | MyText | 734198955 | MyText1 |
| 269986261 | Hello World | MyText | 765025994 | MyText4 |
| 269986261 | Hello World | MyText | 777406191 | MyText2 |
| 921194568 | Rails 2.1 | NULL | NULL | NULL |
| 972244995 | AkitaOnRails | NULL | NULL | NULL |
+-----------+-----------------+--------+-----------+---------+
8 rows in set (0.00 sec)
Pay attention to this: do you see lots of duplications in the first 3 columns (t0_r0 up to t0_r2)? Those are the Post model columns, the remaining being each post’s comment columns. Notice that the “Hello World” post was repeated 4 times. That’s what a join does: the parent rows are repeated for each children. That particular post has 4 comments, so it was repeated 4 times.
The problem is that this hits Rails hard, because it will have to deal with several small and short-lived objects. The pain is felt in the Rails side, not that much on the MySQL side. Now, compare that to the smaller queries:
mysql> SELECT `posts`.`id`, `posts`.`title`, `posts`.`body` FROM `posts` ;
+-----------+-----------------+--------+
| id | title | body |
+-----------+-----------------+--------+
| 130049073 | Hello RailsConf | MyText |
| 226779025 | Hello Brazil | MyText |
| 269986261 | Hello World | MyText |
| 921194568 | Rails 2.1 | NULL |
| 972244995 | AkitaOnRails | NULL |
+-----------+-----------------+--------+
5 rows in set (0.00 sec)
mysql> SELECT `comments`.`id`, `comments`.`body` FROM `comments` WHERE (`comments`.post_id IN (130049073,226779025,269986261,921194568,972244995));
+-----------+---------+
| id | body |
+-----------+---------+
| 61594165 | MyText3 |
| 734198955 | MyText1 |
| 765025994 | MyText4 |
| 777406191 | MyText2 |
| 816076421 | MyText5 |
+-----------+---------+
5 rows in set (0.00 sec)
Actually I am cheating a little bit, I manually removed the created_at and updated_at fields from the all the above queries in order for you to understand it a little bit clearer. So, there you have it: the posts result set, separated and not duplicated, and the comments result set with the same size as before. The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.
But this new feature is smart. Let’s say you want something like this:
>> Post.find(:all, :include => [:comments], :conditions => ["comments.created_at > ?", 1.week.ago.to_s(:db)])
In Rails 2.1, it will understand that there is a filtering condition for the ‘comments’ table, so it will not break it down into the small queries, but instead, it will generate the old outer join version, like this:
SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `posts`.`created_at` AS t0_r3, `posts`.`updated_at` AS t0_r4, `comments`.`id` AS t1_r0, `comments`.`post_id` AS t1_r1, `comments`.`body` AS t1_r2, `comments`.`created_at` AS t1_r3, `comments`.`updated_at` AS t1_r4 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id WHERE (comments.created_at > '2008-05-18 18:06:34')
So, nested joins, conditions, and so forth on join tables should still work fine. Overall it should speed up your queries. Some reported that because of more individual queries, MySQL seems to receive a stronger punch CPU-wise. Do you home work and make your stress tests and benchmarks to see what happens.
Including a model loads the datas. But makes a second query.
For what you want to do, you should use the :joins parameter.
user = User.find_by_id(x, :joins => :roles)

Resources