activerecord has_many :through find with one sql call

activerecord has_many :through find with one sql call - ruby-on-rails

I have a these 3 models:
class User < ActiveRecord::Base
has_many :permissions, :dependent => :destroy
has_many :roles, :through => :permissions
end
class Permission < ActiveRecord::Base
belongs_to :role
belongs_to :user
end
class Role < ActiveRecord::Base
has_many :permissions, :dependent => :destroy
has_many :users, :through => :permissions
end
I want to find a user and it's roles in one sql statement, but I can't seem to achieve this:
The following statement:
user = User.find_by_id(x, :include => :roles)
Gives me the following queries:
User Load (1.2ms) SELECT * FROM `users` WHERE (`users`.`id` = 1) LIMIT 1
Permission Load (0.8ms) SELECT `permissions`.* FROM `permissions` WHERE (`permissions`.user_id = 1)
Role Load (0.8ms) SELECT * FROM `roles` WHERE (`roles`.`id` IN (2,1))
Not exactly ideal. How do I do this so that it does one sql query with joins and loads the user's roles into memory so saying:
user.roles
doesn't issue a new sql query

Loading the Roles in a separate SQL query is actually an optimization called "Optimized Eager Loading".
Role Load (0.8ms) SELECT * FROM `roles` WHERE (`roles`.`id` IN (2,1))
(It is doing this instead of loading each role separately, the N+1 problem.)
The Rails team found it was usually faster to use an IN query with the associations looked up previously instead of doing a big join.
A join will only happen in this query if you add conditions on one of the other tables. Rails will detect this and do the join.
For example:
User.all(:include => :roles, :conditions => "roles.name = 'Admin'")
See the original ticket, this previous Stack Overflow question, and Fabio Akita's blog post about Optimized Eager Loading.

As Damien pointed out, if you really want a single query every time you should use join.
But you might not want a single SQL call. Here's why (from here):
Optimized Eager Loading
Let’s take a look at this:
Post.find(:all, :include => [:comments])
Until Rails 2.0 we would see something like the following SQL query in the log:
SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `comments`.`id` AS t1_r0, `comments`.`body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id
But now, in Rails 2.1, the same command will deliver different SQL queries. Actually at least 2, instead of 1. “And how can this be an improvement?” Let’s take a look at the generated SQL queries:
SELECT `posts`.`id`, `posts`.`title`, `posts`.`body` FROM `posts`
SELECT `comments`.`id`, `comments`.`body` FROM `comments` WHERE (`comments`.post_id IN (130049073,226779025,269986261,921194568,972244995))
The :include keyword for Eager Loading was implemented to tackle the dreaded 1+N problem. This problem happens when you have associations, then you load the parent object and start loading one association at a time, thus the 1+N problem. If your parent object has 100 children, you would run 101 queries, which is not good. One way to try to optimize this is to join everything using an OUTER JOIN clause in the SQL, that way both the parent and children objects are loaded at once in a single query.
Seemed like a good idea and actually still is. But for some situations, the monster outer join becomes slower than many smaller queries. A lot of discussion has been going on and you can take a look at the details at the tickets 9640, 9497, 9560, L109.
The bottom line is: generally it seems better to split a monster join into smaller ones, as you’ve seen in the above example. This avoid the cartesian product overload problem. For the uninitiated, let’s run the outer join version of the query:
mysql> SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `comments`.`id` AS t1_r0, `comments`.`body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id ;
+-----------+-----------------+--------+-----------+---------+
| t0_r0 | t0_r1 | t0_r2 | t1_r0 | t1_r1 |
+-----------+-----------------+--------+-----------+---------+
| 130049073 | Hello RailsConf | MyText | NULL | NULL |
| 226779025 | Hello Brazil | MyText | 816076421 | MyText5 |
| 269986261 | Hello World | MyText | 61594165 | MyText3 |
| 269986261 | Hello World | MyText | 734198955 | MyText1 |
| 269986261 | Hello World | MyText | 765025994 | MyText4 |
| 269986261 | Hello World | MyText | 777406191 | MyText2 |
| 921194568 | Rails 2.1 | NULL | NULL | NULL |
| 972244995 | AkitaOnRails | NULL | NULL | NULL |
+-----------+-----------------+--------+-----------+---------+
8 rows in set (0.00 sec)
Pay attention to this: do you see lots of duplications in the first 3 columns (t0_r0 up to t0_r2)? Those are the Post model columns, the remaining being each post’s comment columns. Notice that the “Hello World” post was repeated 4 times. That’s what a join does: the parent rows are repeated for each children. That particular post has 4 comments, so it was repeated 4 times.
The problem is that this hits Rails hard, because it will have to deal with several small and short-lived objects. The pain is felt in the Rails side, not that much on the MySQL side. Now, compare that to the smaller queries:
mysql> SELECT `posts`.`id`, `posts`.`title`, `posts`.`body` FROM `posts` ;
+-----------+-----------------+--------+
| id | title | body |
+-----------+-----------------+--------+
| 130049073 | Hello RailsConf | MyText |
| 226779025 | Hello Brazil | MyText |
| 269986261 | Hello World | MyText |
| 921194568 | Rails 2.1 | NULL |
| 972244995 | AkitaOnRails | NULL |
+-----------+-----------------+--------+
5 rows in set (0.00 sec)
mysql> SELECT `comments`.`id`, `comments`.`body` FROM `comments` WHERE (`comments`.post_id IN (130049073,226779025,269986261,921194568,972244995));
+-----------+---------+
| id | body |
+-----------+---------+
| 61594165 | MyText3 |
| 734198955 | MyText1 |
| 765025994 | MyText4 |
| 777406191 | MyText2 |
| 816076421 | MyText5 |
+-----------+---------+
5 rows in set (0.00 sec)
Actually I am cheating a little bit, I manually removed the created_at and updated_at fields from the all the above queries in order for you to understand it a little bit clearer. So, there you have it: the posts result set, separated and not duplicated, and the comments result set with the same size as before. The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.
But this new feature is smart. Let’s say you want something like this:
>> Post.find(:all, :include => [:comments], :conditions => ["comments.created_at > ?", 1.week.ago.to_s(:db)])
In Rails 2.1, it will understand that there is a filtering condition for the ‘comments’ table, so it will not break it down into the small queries, but instead, it will generate the old outer join version, like this:
SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, `posts`.`body` AS t0_r2, `posts`.`created_at` AS t0_r3, `posts`.`updated_at` AS t0_r4, `comments`.`id` AS t1_r0, `comments`.`post_id` AS t1_r1, `comments`.`body` AS t1_r2, `comments`.`created_at` AS t1_r3, `comments`.`updated_at` AS t1_r4 FROM `posts` LEFT OUTER JOIN `comments` ON comments.post_id = posts.id WHERE (comments.created_at > '2008-05-18 18:06:34')
So, nested joins, conditions, and so forth on join tables should still work fine. Overall it should speed up your queries. Some reported that because of more individual queries, MySQL seems to receive a stronger punch CPU-wise. Do you home work and make your stress tests and benchmarks to see what happens.

Including a model loads the datas. But makes a second query.
For what you want to do, you should use the :joins parameter.
user = User.find_by_id(x, :joins => :roles)

Related

cannot chain joins using string to left_joins

Summary:
I have a many to many relationship between attachments and rules, through alerts.
I have a given rule and a given selection of attachments (those with a given bug_id).
I need to go through all the selected attachments an indicate whether there is an alert for the rule or not, with a different CSS background-color.
Outer Join
I get the correct results with the following query:
SELECT attachments.*, alerts.rule_id
FROM attachments
LEFT OUTER JOIN alerts ON alerts.attachment_id = attachments.id
and alerts.rule_id = 9
WHERE attachments.bug_id;
I'm looking for something like:
bug.attachments
.left_joins(alerts: {'rules.id' => 9})
.select('attachments.*, alerts.rule_id')
Database
class Alert < ApplicationRecord
belongs_to :attachment
class Attachment < ApplicationRecord
has_many :alerts
attachments
| id | bug_id |
| 14612 | 38871 |
| 14613 | 38871 |
| 14614 | 38871 |
alerts
| attachment_id | rule_id |
| 14612 | 9 |
| 14614 | 8 |
Condition in the From Clause
Without the alerts.rule_id = 9 condition in the FROM clause, we get the following result:
| id | rule_id |
| 14612 | 9 |
| 14614 | 8 |
| 14613 | NULL |
So having a WHERE clause WHERE alerts.rule_id = 9 or alerts.rule_id is NULL would lose the result for 14612
So the following won't work:
bug.attachments
.joins(:alerts)
.select('attachments.*, alerts.rule_id')
.where( ??? )
Edit
The above is a simplified and corrected version of my original question.
The original question is below:
alerts belongs to rules and attachments, and attachments belong to bugs.
class Alert < ApplicationRecord
belongs_to :attachment
belongs_to :rule
class Attachment < ApplicationRecord
belongs_to :bug
has_many :alerts
class Bug < ApplicationRecord
has_many :attachments
For a given rule, I need to show all the attachments for a given bug, and whether there is an alert or not. I want the following SQL:
SELECT attachments.*, alerts.id as alert_id
FROM `attachments`
LEFT OUTER JOIN `alerts` ON `alerts`.`attachment_id` = `attachments`.`id`
LEFT OUTER JOIN `rules` ON `rules`.`id` = `alerts`.`rule_id` AND rules.id = 9
WHERE `attachments`.`bug_id` = 38871
I can get this from:
bug.attachments
.joins("LEFT OUTER JOIN `alerts` ON `alerts`.`attachment_id` = `attachments`.`id`")
.joins("LEFT OUTER JOIN `rules` ON `rules`.`id` = `alerts`.`rule_id` AND rules.id = 9")
.select('attachments.*, alerts.id as alert_id')
.map{|attach| [attach.file_name, attach.alert_id]}
What I want to know is how to avoid calling joins with a string SQL fragment.
I'm looking for something like:
bug.attachments
.left_joins(alerts: {rule: {'rules.id' => 9}})
.select('attachments.*, alerts.id as alert_id')
.map{|attach| [attach.file_name, attach.alert_id]}
Is there anyway to avoid passing an SQL string?

Actually I think you will able to get the right results by putting rules.id = 9 in where clause.
SELECT attachments.*, alerts.id as alert_id
FROM `attachments`
LEFT OUTER JOIN `alerts` ON `alerts`.`attachment_id` = `attachments`.`id`
LEFT OUTER JOIN `rules` ON `rules`.`id` = `alerts`.`rule_id`
WHERE `attachments`.`bug_id` = 38871 AND (rules.id = 9 OR rules.id IS NULL)

How to find posts tagged with more than one tag in Rails and Postgresql

I have the models Post, Tag, and PostTag. A post has many tags through post tags. I want to find posts that are exclusively tagged with more than one tag.
has_many :post_tags
has_many :tags, through: :post_tags
For example, given this data set:
posts table
--------------------
id | title |
--------------------
1 | Carb overload |
2 | Heart burn |
3 | Nice n Light |
tags table
-------------
id | name |
-------------
1 | tomato |
2 | potato |
3 | basil |
4 | rice |
post_tags table
-----------------------
id | post_id | tag_id |
-----------------------
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
4 | 2 | 3 |
5 | 3 | 1 |
I want to find posts tagged with tomato AND basil. This should return only the "Heart burn" post (id 2). Likewise, if I query for posts tagged with tomato AND potato, it should return the "Carb overload" post (id 1).
I tried the following:
Post.joins(:tags).where(tags: { name: ['basil', 'tomato'] })
SQL
SELECT "posts".* FROM "posts"
INNER JOIN "post_tags" ON "post_tags"."post_id" = "posts"."id"
INNER JOIN "tags" ON "tags"."id" = "post_tags"."tag_id"
WHERE "tags"."name" IN ('basil', 'tomato')
This returns all three posts because all share the tag tomato. I also tried this:
Post.joins(:tags).where(tags: { name 'basil' }).where(tags: { name 'tomato' })
SQL
SELECT "posts".* FROM "posts"
INNER JOIN "post_tags" ON "post_tags"."post_id" = "posts"."id"
INNER JOIN "tags" ON "tags"."id" = "post_tags"."tag_id"
WHERE "tags"."name" = 'basil' AND "tags"."name" = 'tomato'
This returns no records.
How can I query for posts tagged with multiple tags?

You may want to review the possible ways to write this kind of query in this answer for applying conditions to multiple rows in a join. Here is one possible option for implementing your query in Rails using 1B, the sub-query approach...
Define a query in the PostTag model that will grab up the Post ID values for a given Tag name:
# PostTag.rb
def self.post_ids_for_tag(tag_name)
joins(:tag).where(tags: { name: tag_name }).select(:post_id)
end
Define a query in the Post model that will grab up the Post records for a given Tag name, using a sub-query structure:
# Post.rb
def self.for_tag(tag_name)
where("id IN (#{PostTag.post_ids_for_tag(tag_name).to_sql})")
end
Then you can use a query like this:
Post.for_tag("basil").for_tag("tomato")

Use method .includes, like this:
Item.where(xpto: "test")
.includes({:orders =>[:suppliers, :agents]}, :manufacturers)
Documentation to .includes here.

Postgres - How to create index for simple association directly (outside of activerecord)?

We have a Postgres database that is populated through a node app that parses XML and loads our dataset for us.
We have built a Sinatra app to view the data. We have a number of archive_objects which have a number of tags.
We have associated the two classes via their models, eg:
class ArchiveObject < ActiveRecord::Base
has_and_belongs_to_many :tags
end
class Tag < ActiveRecord::Base
has_and_belongs_to_many :archive_objects
end
We have noticed that calling, for example current_archive_object.tags is quite slow (400+ms on average), and after reading Using indexes in rails: Index your associations, I see the recommendation to create the index for this simple association in the ActiveRecord migration (names modified for relevance here):
add_index :tags, :archive_object_id, :name => 'archive_object_id_idx'
I'm wondering, how can I create this index directly in psql since our database is not generated through an AR migration?
EDIT:
Information regarding our 'junction table', should it be relevant
\d+ archive_objects_tags
Table "public.archive_objects_tags"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+--------------------------+-----------+---------+--------------+-------------
created_at | timestamp with time zone | not null | plain | |
updated_at | timestamp with time zone | not null | plain | |
tag_id | integer | not null | plain | |
archive_object_id | integer | not null | plain | |
Indexes:
"archive_objects_tags_pkey" PRIMARY KEY, btree (tag_id, archive_object_id)
Has OIDs: no
And the SQL call from the rack console:
Tag Load (397.4ms) SELECT "tags".* FROM "tags" INNER JOIN "archive_objects_tags" ON "tags"."id" = "archive_objects_tags"."tag_id" WHERE "archive_objects_tags"."archive_object_id" = $1 [["archive_object_id", 4823]]

From the PostgreSQL docs, the equivalent to
add_index :tags, :archive_object_id, :name => 'archive_object_id_idx'
would be:
CREATE UNIQUE INDEX archive_object_id_idx ON tags (archive_object_id);
I don't believe that is what you want in your case, because your tags table does not have an archive_object_id column. You probably want to create a multicolumn index on your "junction table".
CREATE UNIQUE INDEX archive_objects_tags_tag_id_archive_object_id_idx ON archive_objects_tags (archive_object_id, tag_id);

Chaining Rails 3 scopes in has_many through association

Is this doable?
I have the following scope:
class Thing < ActiveRecord::Base
scope :with_tag, lambda{ |tag| joins(:tags).where('tags.name = ?', tag.name)
.group('things.id') }
def withtag_search(tags)
tags.inject(scoped) do |tagged_things, tag|
tagged_things.with_tag(tag)
end
end
I get a result if there's a single tag in the array of tags passed in with Thing.withtag_search(array_of_tags) but if I pass multiple tags in that array I get an empty relation as the result. In case it helps:
Thing.withtag_search(["test_tag_1", "test_tag_2"])
SELECT "things".*
FROM "things"
INNER JOIN "things_tags" ON "things_tags"."thing_id" = "things"."id"
INNER JOIN "tags" ON "tags"."id" = "things_tags"."tag_id"
WHERE (tags.name = 'test_tag_1') AND (tags.name = 'test_tag_2')
GROUP BY things.id
=> [] # class is ActiveRecord::Relation
whereas
Thing.withtag_search(["test_tag_1"])
SELECT "things".*
FROM "things"
INNER JOIN "things_tags" ON "things_tags"."thing_id" = "things"."id"
INNER JOIN "tags" ON "tags"."id" = "things_tags"."tag_id"
WHERE (tags.name = 'test_tag_1')
GROUP BY things.id
=> [<Thing id:1, ... >, <Thing id:2, ... >] # Relation including correctly all
# Things with that tag
I want to be able to chain these relations together so that (among other reasons) I can use the Kaminari gem for pagination which only works on relations not arrays - so I need a scope to be returned.

I also ran into this problem. The problem is not Rails, the problems is definitely MySQL:
Your SQL will create following temporary JOIN-table (only neccesary fields are shown):
+-----------+-------------+---------+------------+
| things.id | things.name | tags.id | tags.name |
+-----------+-------------+---------+------------+
| 1 | ... | 1 | test_tag_1 |
+-----------+-------------+---------+------------+
| 1 | ... | 2 | test_tag_2 |
+-----------+-------------+---------+------------+
So instead joining all Tags to one specific Thing, it generates one row for each Tag-Thing combination (If you don't believe, just run COUNT(*) on this SQL statement).
The problem is that you query criteria looks like this: WHERE (tags.name = 'test_tag_1') AND (tags.name = 'test_tag_2') which will be checked against each of this rows, and never will be true. It's not possible for tags.name to equal both test_tag_1 and test_tag_2 at the same time!
The standard SQL solution is to use the SQL statement INTERSECT... but unfortunately not with MySQL.
The best solution is to run Thing.withtag_search for each of your tags, collect the returning objects, and select only objects which are included in each of the results, like so:
%w[test_tag_1 test_tag_2].collect do |tag|
Thing.withtag_search(tag)
end.inject(&:&)
If you want to get this as an ActiveRecord relation you can probably do this like so:
ids = %w[test_tag_1 test_tag_2].collect do |tag|
Thing.withtag_search(tag).collect(&:id)
end.inject(&:&)
Things.where(:id => ids)
The other solution (which I'm using) is to cache the tags in the Thing table, and do MySQL boolean search on it. I will give you more details on this solution if you want.
Anyways I hope this will help you. :)

This is rather complicated at a glance, but based on your SQL, you want:
WHERE (tags.name IN ( 'test_tag_1', 'test_tag_2'))
I haven't dealt much with Rails 3, but if you can adjust your JOIN appropriately, this should fix your issue. Have you tried a solution akin to:
joins(:tag).where('tags.name IN (?), tags.map { |tag| tag.name })
This way, you will JOIN the way you are expecting (UNION instead of INTERSECTION). I hope this is a helpful way of thinking about this problem.

Don't seem to be able to find a solution to this problem. So, instead of using Kaminari and rolling my own tagging I've switched to Acts-as-taggable-on and will-paginate

How do I avoid multiple queries with :include in Rails?

If I do this
post = Post.find_by_id(post_id, :include => :comments)
two queries are performed (one for post data and and another for the post's comments). Then when I do post.comments, another query is not performed because data is already cached.
Is there a way to do just one query and still access the comments via post.comments?

No, there is not. This is the intended behavior of :include, since the JOIN approach ultimately comes out to be inefficient.
For example, consider the following scenario: the Post model has 3 fields that you need to select, 2 fields for Comment, and this particular post has 100 comments. Rails could run a single JOIN query along the lines of:
SELECT post.id, post.title, post.author_id, comment.id, comment.body
FROM posts
INNER JOIN comments ON comment.post_id = post.id
WHERE post.id = 1
This would return the following table of results:
post.id | post.title | post.author_id | comment.id | comment.body
---------+------------+----------------+------------+--------------
1 | Hello! | 1 | 1 | First!
1 | Hello! | 1 | 2 | Second!
1 | Hello! | 1 | 3 | Third!
1 | Hello! | 1 | 4 | Fourth!
...96 more...
You can see the problem already. The single-query JOIN approach, though it returns the data you need, returns it redundantly. When the database server sends the result set to Rails, it will send the post's ID, title, and author ID 100 times each. Now, suppose that the Post had 10 fields you were interested in, 8 of which were text blocks. Eww. That's a lot of data. Transferring data from the database to Rails does take work on both sides, both in CPU cycles and RAM, so minimizing that data transfer is important for making the app run faster and leaner.
The Rails devs crunched the numbers, and most applications run better when using multiple queries that only fetch each bit of data once rather than one query that has the potential to get hugely redundant.
Of course, there comes a time in every developer's life when a join is necessary in order to run complex conditions, and that can be achieved by replacing :include with :joins. For prefetching relationships, however, the approach Rails takes in :include is much better for performance.

If you use this behaviour of eagerly-loaded associations, you'll get a single (and efficient) query.
Here is an example:
Say you have the following model (where :user is the foreign reference):
class Item < ActiveRecord::Base
attr_accessible :name, :user_id
belongs_to :user
end
Then executing this (note: the where part is crucial as it tricks Rails to produce that single query):
#items = Item.includes(:user).where("users.id IS NOT NULL").all
will result in a single SQL query (the syntax below is that of PostgreSQL):
SELECT "items"."id" AS t0_r0, "items"."user_id" AS t0_r1,
"items"."name" AS t0_r2, "items"."created_at" AS t0_r3,
"items"."updated_at" AS t0_r4, "users"."id" AS t1_r0,
"users"."email" AS t1_r1, "users"."created_at" AS t1_r4,
"users"."updated_at" AS t1_r5
FROM "measurements"
LEFT OUTER JOIN "users" ON "users"."id" = "items"."user_id"
WHERE (users.id IS NOT NULL)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

activerecord has_many :through find with one sql call - ruby-on-rails

Including a model loads the datas. But makes a second query. For what you want to do, you should use the :joins parameter. user = User.find_by_id(x, :joins => :roles)

Related

cannot chain joins using string to left_joins

How to find posts tagged with more than one tag in Rails and Postgresql

Postgres - How to create index for simple association directly (outside of activerecord)?

Chaining Rails 3 scopes in has_many through association

How do I avoid multiple queries with :include in Rails?

Categories

Resources