Rails Eager loading explanation - ruby-on-rails

I have simplified my project to this so i can understand the better the eager loading part of RoR 3.2.13 . These are my classes:
class Person < ActiveRecord::Base
attr_accessible :name
has_many :posts
end
And
class Post < ActiveRecord::Base
attr_accessible :name, :person_id
belongs_to :person
end
When i do something like
people_data = Person.includes(:posts)
IRB shows the following SQL:
Person Load (1.3ms) SELECT `people`.* FROM `people`
Post Load (0.7ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`person_id` IN (1, 2)
And the resulting object its like:
=> [#<Person id: 1, name: "Ernie">, #<Person id: 2, name: "Bert">]
Notice that there are only 2 person objects, no posts. How can i get a simple data structure with person AND its posts. I want to do this in a single instruction, i dont want to do a foreach in the people array.
Im expecting something like this:
[#<Person id: 1, name: "Ernie">, [#<Array 0 => #<Post id:1, name: "Hi">, 1 => #<Post id:2, name: "Hello"> > ....

Eager loading is working. It's retrieving all the posts eagerly (in one query). Eager loading does not construct an array with all posts and persons.
# With eager loading
people_data = Person.includes(:posts) # 2 queries: 1 for persons, 1 for posts
people_data.first.posts.each do |post|
# No queries
end
# Without eager loading
people_data = Person.all # 1 query for persons
people_data.first.posts.each do |post|
# Query for each post
end
Note that in both cases people_data will contain (a similar looking) ActiveRecord::Relation object. (It's not an array!) You will only see the benefit of the eager loading when you start using the associated records.
Unless you have a special reason for wanting to build an array with all persons and posts, you should use eager loading like it was meant to be used. This is more efficient than trying to build your own array.

You can map your results to receive expected array
Person.includes(:posts).all.map{|person| [person, person.posts]}

The result of your query eager loads the posts as the result of the Person#posts method. I'm not quite clear on the format of the output you are expecting. If you simply want two arrays, one of people and one of all posts by those people you can split it into two queries.
#people = Person.where(...)
#posts = Post.where(person_id: #people)
This is the same query done by eager loading, but it will return posts in their own array.

Related

Rails 5 select from two different tables and get one result

I have 3 models, Shop, Client, Product.
A shop has many clients, and a shop has many products.
Then I have 2 extra models, one is ShopClient, that groups the shop_id and client_id. The second is ShopProduct, that groups the shop_id and product_id.
Now I have a controller that receives two params, the client_id and product_id. So I want to select all the shops (in one instance variable #shops) filtered by client_id and product_id without shop repetition. How can I do this??
I hope I was clear, thanks.
ps: I'm using Postgresql as database.
Below query will work for you.
class Shop
has_many :shop_clients
has_many :clients, through: :shop_clients
has_many :shop_products
has_many :products, through: :shop_products
end
class Client
end
class Product
end
class ShopClient
belongs_to :shop
belongs_to :client
end
class ShopProduct
belongs_to :shop
belongs_to :product
end
#shops = Shop.joins(:clients).where(clients: {id: params[:client_id]}).merge(Shop.joins(:products).where(products: {id: params[:product_id]}))
Just to riff on the answer provided by Prince Bansal. How about creating some class methods for those joins? Something like:
class Shop
has_many :shop_clients
has_many :clients, through: :shop_clients
has_many :shop_products
has_many :products, through: :shop_products
class << self
def with_clients(clients)
joins(:clients).where(clients: {id: clients})
end
def with_products(products)
joins(:products).where(products: {id: products})
end
end
end
Then you could do something like:
#shops = Shop.with_clients(params[:client_id]).with_products(params[:product_id])
By the way, I'm sure someone is going to say you should make those class methods into scopes. And you certainly can do that. I did it as class methods because that's what the Guide recommends:
Using a class method is the preferred way to accept arguments for scopes.
But, I realize some people strongly prefer the aesthetics of using scopes instead. So, whichever pleases you most.
I feel like the best way to solve this issue is to use sub-queries. I'll first collect all valid shop ids from ShopClient, followed by all valid shop ids from ShopProduct. Than feed them into the where query on Shop. This will result in one SQL query.
shop_client_ids = ShopClient.where(client_id: params[:client_id]).select(:shop_id)
shop_product_ids = ShopProduct.where(product_id: params[:product_id]).select(:shop_id)
#shops = Shop.where(id: shop_client_ids).where(id: shop_product_ids)
#=> #<ActiveRecord::Relation [#<Shop id: 1, created_at: "2018-02-14 20:22:18", updated_at: "2018-02-14 20:22:18">]>
The above query results in the SQL query below. I didn't specify a limit, but this might be added by the fact that my dummy project uses SQLite.
SELECT "shops".*
FROM "shops"
WHERE
"shops"."id" IN (
SELECT "shop_clients"."shop_id"
FROM "shop_clients"
WHERE "shop_clients"."client_id" = ?) AND
"shops"."id" IN (
SELECT "shop_products"."shop_id"
FROM "shop_products"
WHERE "shop_products"."product_id" = ?)
LIMIT ?
[["client_id", 1], ["product_id", 1], ["LIMIT", 11]]
Combining the two sub-queries in one where doesn't result in a correct response:
#shops = Shop.where(id: [shop_client_ids, shop_product_ids])
#=> #<ActiveRecord::Relation []>
Produces the query:
SELECT "shops".* FROM "shops" WHERE "shops"."id" IN (NULL, NULL) LIMIT ? [["LIMIT", 11]]
note
Keep in mind that when you run the statements one by one in the console this will normally result in 3 queries. This is due to the fact that the return value uses the #inspect method to let you see the result. This method is overridden by Rails to execute the query and display the result.
You can simulate the behavior of the normal application by suffixing the statements with ;nil. This makes sure nil is returned and the #inspect method is not called on the where chain, thus not executing the query and keeping the chain in memory.
edit
If you want to clean up the controller you might want to move these sub-queries into model methods (inspired by jvillians answer).
class Shop
# ...
def self.with_clients(*client_ids)
client_ids.flatten! # allows passing of multiple arguments or an array of arguments
where(id: ShopClient.where(client_id: client_ids).select(:shop_id))
end
# ...
end
Rails sub-query vs join
The advantage of a sub-query over a join is that using joins might end up returning the same record multiple times if you query on a attribute that is not unique. For example, say a product has an attribute product_type that is either 'physical' or 'digital'. If you want to select all shops selling a digital product you must not forget to call distinct on the chain when you're using a join, otherwise the same shop may return multiple times.
However if you'll have to query on multiple attributes in product, and you'll use multiple helpers in the model (where each helper joins(:products)). Multiple sub-queries are likely slower. (Assuming you set has_many :products, through: :shop_products.) Since Rails reduces all joins to the same association to a single one. Example: Shop.joins(:products).joins(:products) (from multiple class methods) will still end up joining the products table a single time, whereas sub-queries will not be reduced.
Below sql query possibly gonna work for you.
--
-- assuming
-- tables: shops, products, clients, shop_products, shop_clients
--
SELECT DISTINCT * FROM shops
JOIN shop_products
ON shop_products.shop_id = shops.id
JOIN shop_clients
ON shop_clients.shop_id = shops.id
WHERE shop_clients.client_id = ? AND shop_products.product_id = ?
If you'll face difficulties while creating an adequate AR expression for this sql query, let me know.
Btw, here is a mock

Mysql equivalent rails query

I am new to rails. What is the equivalent rails query for the following sql query?
mysql> select a.code, b.name, b.price, b.page from article a inner join books b on a.id =b.article_id;
I am trying
Article.joins(:books).select("books.name, books.price, books.page, articles.code")
The active record relation returns only table one data
=> #<ActiveRecord::Relation [#<Article id: 1, code: "x", created_at: "2014-11-12 13:28:08", updated_at: "2014-11-14 04:16:06">, #<Article id: 2, code: "y", created_at: "2014-11-12 13:28:08", updated_at: "2014-11-14 04:00:16">]>
What is the solution to join both table?
You normally don't really query like that directly with Rails. Instead, you'd set up your models and use other associated models to achieve this. If speed is an issue, you can use eager loading. If you absolutely need exactly this join, it's:
class Article < ActiveRecord::Base
has_many :books
scope :with_books, lambda {
joins(:books)
.select('articles.code, books.name, books.price, books.page')
}
end
class Book < ActiveRecord::Base
belongs_to :article
end
But this is not so useful. It generates the join you want, but retrieving the book details like that won't be fun. You could do something like:
a = Article.with_books.where("books.name = 'Woken Furies'").first
a.code
And that should give you the article code. Depending on what you need, a better way could be to remove the scope from the Article class and instead query like:
b = Book.where(name: 'Woken Furies')
.joins(:article)
.where("articles.code = 'something'")
b = Book.where("name = 'Woken Furies' AND articles.code = 'something'")
.joins(:article)
Both of these queries should be equivalent. You can get from one associated record to the other with:
book = b.first
article_code = book.article.code
I'm not sure what you need to accomplish with the join, but I think you might get more beautiful code by using plain ActiveRecord. If you need the speed gain, avoid n+1 problems, etc., it might make sense to write those joins out by hand.
I hope I understood your question correctly.
There's more about joining in the Rails guides:
http://guides.rubyonrails.org/active_record_querying.html#joining-tables
Update: You can use pluck if you need to retrieve e.g. just the code and the name:
Article.with_books
.where("books.name = 'Woken Furies'")
.pluck('articles.code, books.name')

eager loading the first record of an association

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code
This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.
To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.
There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

Rails: how to load 2 models via join?

I am new to rails and would appreciate some help optimizing my database usage.
Is there a way to load two models associated with each other with one DB query?
I have two models Person and Image:
class Person < ActiveRecord::Base
has_many :images
end
class Image < ActiveRecord::Base
belongs_to :person
end
I would like to load a set of people and their associated images with a single trip to the DB using a join command. For instance, in SQL, I can load all the data I need with the following query:
select * from people join images on people.id = images.person_id where people.id in (2, 3) order by timestamp;
So I was hoping that this rails snippet would do what I need:
>> people_and_images = Person.find(:all, :conditions => ["people.id in (?)", "2, 3"], :joins => :images, :order => :timestamp)
This code executes the SQL statement I am expecting and loads the instances of Person I need. However, I see that accessing a a Person's images leads to an additional SQL query.
>> people_and_images[0].images
Image Load (0.004889) SELECT * FROM `images` WHERE (`images`.person_id = 2)
Using the :include option in the call to find() does load both models, however it will cost me an additional SELECT by executing it along with the JOIN.
I would like to do in Rails what I can do in SQL which is to grab all the data I need with one query.
Any help would be greatly appreciated. Thanks!
You want to use :include like
Person.find(:all, :conditions => ["people.id in (?)", "2, 3"], :include => :images, :order => :timestamp)
Check out the find documentation for more details
You can use :include for eager loading of associations and indeed it does call exactly 2 queries instead of one as with the case of :joins; the first query is to load the primary model and the second is to load the associated models. This is especially helpful in solving the infamous N+1 query problem, which you will face if you doesn't use :include, and :joins doesn't eager-load the associations.
the difference between using :joins and :include is 1 query more for :include, but the difference of not using :include will be a whole lot more.
you can check it up here: http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations

Resources