Additive scope conditions for has_many :through - ruby-on-rails

I want a user to be able to find all posts that have one or more tags. And I'd like the tags to be additive criteria, so for example you could search for posts that have just the 'News' tag, or you could search for posts that have both the 'News' and 'Science' tags.
Currently what I have, and it works, is a Post model, a Tag model, and a join model called Marking. Post has_many :tags, through: :markings. I get what I need by passing an array of Tag ids to a Post class method:
post.rb
def self.from_tag_id_array array
post_array = []
Marking.where(tag_id: array).group_by(&:post_id).each do |p_id,m_array|
post_array << p_id if m_array.map(&:tag_id).sort & array.sort == array.sort
end
where id: post_array
end
This seems like a clunky way to get there. Is there a way I can do this with a scope on an association or something of the like?

So the general rule of thumb with building these kinds of queries is to minimize work in "Ruby-land" and maximize work in "Database-land". In your solution above, you're fetching a set of markings with any tags in the set array, which presumably will be a very large set (all posts that have any of those tags). This is represented in a ruby array and processed (group_by is in Ruby-world, group is the equivalent in Database-land).
So aside from being hard-to-read, that solution is going to be slow for any large set of markings.
There are a couple ways to solve the problem without doing any heavy lifting in Ruby-world. One way is using subqueries, like this:
scope :with_tag_ids, ->(tag_ids) {
tag_ids.map { |tag_id|
joins(:markings).where(markings: { tag_id: tag_id })
}.reduce(all) { |scope, subquery| scope.where(id: subquery) }
}
This generates a query like this (again for tag_ids 5 and 8)
SELECT "posts".*
FROM "posts"
WHERE "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 5)
AND "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 8)
Note that since everything here is calculated directly in SQL, no arrays are generated or processed in Ruby. This will generally scale much better.
Alternatively, you can use COUNT and do it in a single query without subqueries:
scope :with_tag_ids, ->(tag_ids) {
joins(:markings).where(markings: { tag_id: tag_ids }).
group(:post_id).having('COUNT(posts.id) = ?', tag_ids.count)
}
Which generates SQL like this:
SELECT "posts".*
FROM "posts"
INNER JOIN "markings" ON "markings"."post_id" = "posts"."id"
WHERE "markings"."tag_id" IN (5, 8)
GROUP BY "post_id"
HAVING (COUNT(posts.id) = 2)
This assumes that you don't have multiple markings with the same pair of tag_id and post_id, which would throw off the count.
I would imagine that the last solution is probably the most efficient, but you should try different solutions and see what works best for your data.
See also: Query intersection with activerecord

Related

How to pluck id of has_one associations?

class Post
has_one :latest_comment, -> { order(created_at: :desc) }, class_name: 'Comment'
end
I want to do something like:
Post.joins(:latest_comment).pluck('latest_comment.id')
but it's not valid syntax and it doesn't work.
Post.joins(:latest_comment).pluck('comments.id')
Above works but it returns ids of all comments for a post, not only of the latest.
ActiveRecord::Assocations are a very leaky abstraction around SQL joins so your has_one :latest_comment assocation won't actually return a single row in the join table per record unless you're calling it on an instance of Post.
Instead when you run Post.joins(:latest_comment).pluck('comments.id')you get:
SELECT "comments"."id"
FROM "posts"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
ActiveRecord isn't actually smart enough to know that you want to get unique values from the comments table - and it actually just behaves like a has_many association. In its defence this isn't actually something thats even realistic to do in polyglot fashion.
What you want to do can instead is to select the rows from the comments table and get distinct values:
Comment.order(:post_id, created_at: :desc)
.pluck(Arel.sql('DISTINCT ON (post_id) id'))
DISTINCT ON is Postgres specific. The exact approach here will vary between RDBMS:es and there are many other alternatives such as lateral joins, window functions etc depending on your performance requirements.

Count records after where clause

I have three models: Catalog, Upload and Product. A product belongs to a catalog, and an upload belongs to a product.
I need to count the number of uploads for all the products of a given catalog.
This is the way I've been doing it so far, which is incredibly slow for a large amount of uploads or products:
#products = Product.where(catalog_id: 123)
#uploads_count = Upload.where(product_id: #products.pluck(:id)).count
I'd like to avoid loading all the products just for a count.
Should I use raw SQL or is there a better way to do this with ActiveRecord ?
This should do it for you:
Upload.joins(:product).where(products: { catalog_id: 123 }).count
Using joins creates an INNER JOIN between the two tables, allowing you to query the products table as above.
Note the singular and plural uses of product - the joins should reflect the association (the upload belongs to one product), while the where clause always uses the table name, typically pluralised.
The SQL will look similar to:
SELECT "uploads".* FROM "uploads"
INNER JOIN "products"
ON "products"."id" = "uploads"."product_id"
WHERE "products"."catalog_id" = 123
If you need to have more information on the catalog you can also include this, something like the following:
Upload.joins(product: :catalog).where(products: { catalogs: { whatever: 'you want to query' } }).count
Bear in mind, using joins is just for a query such as this. If you need to access attributes of the product or catalog, you should use another approach, such as includes, to preload the data and avoid N + 1 queries. There's a good read here if you're interested.
Another way to avoid selecting records is to use sub-query. This can be done the following way:
query = User.where(id: 1..100)
User.where(id: query.select(:id)).count
# [DEBUG] (10.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" IN (SELECT "users"."id" FROM "users" WHERE ("users"."id" BETWEEN $1 AND $2)) [["id", 1], ["id", 100]]
# => 33
So, User.where(id: 1..100) prepares a query, that can be used as a sub-select. .select(:field) tells what field you are interested in.
Though for a basic count, SRack provides a good answer.

eager loading the first record of an association

In a very simple forum made from Rails app, I get 30 topics from the database in the index action like this
def index
#topics = Topic.all.page(params[:page]).per_page(30)
end
However, when I list them in the views/topics/index.html.erb, I also want to have access to the first post in each topic to display in a tooltip, so that when users scroll over, they can read the first post without having to click on the link. Therefore, in the link to each post in the index, I add the following to a data attribute
topic.posts.first.body
each of the links looks like this
<%= link_to simple_format(topic.name), posts_path(
:topic_id => topic), :data => { :toggle => 'tooltip', :placement => 'top', :'original-title' => "#{ topic.posts.first.body }"}, :class => 'tool' %>
While this works fine, I'm worried that it's an n+1 query, namely that if there's 30 topics, it's doing this 30 times
User Load (0.8ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."topic_id" = $1 ORDER BY "posts"."id" ASC LIMIT 1 [["topic_id", 7]]
I've noticed that Rails does automatic caching on some of these, but I think there might be a way to write the index action differently to avoid some of this n+1 problem but I can figure out how. I found out that I can
include(:posts)
to eager load the posts, like this
#topics = Topic.all.page(params[:page]).per_page(30).includes(:posts)
However, if I know that I only want the first post for each topic, is there a way to specify that? if a topic had 30 posts, I don't want to eager load all of them.
I tried to do
.includes(:posts).first
but it broke the code
This appears to work for me, so give this a shot and see if it works for you:
Topic.includes(:posts).where("posts.id = (select id from posts where posts.topic_id = topics.id limit 1)").references(:posts)
This will create a dependent subquery in which the posts topic_id in the subquery is matched up with the topics id in the parent query. With the limit 1 clause in the subquery, the result is that each Topic row will contain only 1 matching Post row, eager loaded thanks to the includes(:post).
Note that when passing an SQL string to .where, that references an eager loaded relation, the references method should be appended to inform ActiveRecord that we're referencing an association, so that it knows to perform appropriate joins in the subsequent query. Apparently it technically works without that method, but you get a deprecation warning, so you might as well throw it in lest you encounter problems in future Rails updates.
To my knowledge you can't. Custom association is often used to allow conditions on includes except limit.
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
class Picture < ActiveRecord::Base
has_many :most_recent_comments, -> { order('id DESC').limit(10) },
class_name: 'Comment'
end
Picture.includes(:most_recent_comments).first.most_recent_comments
# => returns all associated comments.
There're a few issues when trying to solve this "natively" via Rails which are detailed in this question.
We solved it with an SQL scope, for your case something like:
class Topic < ApplicationRecord
has_one :first_post, class_name: "Post", primary_key: :first_post_id, foreign_key: :id
scope :with_first_post, lambda {
select(
"topics.*,
(
SELECT id as first_post_id
FROM posts
WHERE topic_id = topics.id
ORDER BY id asc
LIMIT 1
)"
)
}
end
Topic.with_first_post.includes(:first_post)

Specifying conditions on eager loaded associations returns ActiveRecord::RecordNotFound

The problem is that when a Restaurant does not have any MenuItems that match the condition, ActiveRecord says it can't find the Restaurant. Here's the relevant code:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
def self.with_meals_of_the_week
includes({menu_items: :meal}).where(:'menu_items.date' => Time.now.beginning_of_week..Time.now.end_of_week)
end
end
And the sql code generated:
Restaurant Load (0.0ms)←[0m ←[1mSELECT DISTINCT "restaurants".id FROM "restaurants"
LEFT OUTER JOIN "menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN "meals" ON "meals"."id" = "menu_items"."meal_id" WHERE
"restaurants"."id" = ? AND ("menu_items"."date" BETWEEN '2012-10-14 23:00:00.000000'
AND '2012-10-21 22:59:59.999999') LIMIT 1←[0m [["id", "1"]]
However, according to this part of the Rails Guides, this shouldn't be happening:
Post.includes(:comments).where("comments.visible", true)
If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded.
The SQL generated is a correct translation of your query. But look at it,
just at the SQL level (i shortened it a bit):
SELECT *
FROM
"restaurants"
LEFT OUTER JOIN
"menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN
"meals" ON "meals"."id" = "menu_items"."meal_id"
WHERE
"restaurants"."id" = ?
AND
("menu_items"."date" BETWEEN '2012-10-14' AND '2012-10-21')
the left outer joins do the work you expect them to do: restaurants
are combined with menu_items and meals; if there is no menu_item to
go with a restaurant, the restaurant is still kept in the result, with
all the missing pieces (menu_items.id, menu_items.date, ...) filled in with NULL
now look aht the second part of the where: the BETWEEN operator demands,
that menu_items.date is not null! and this
is where you filter out all the restaurants without meals.
so we need to change the query in a way that makes having null-dates ok.
going back to ruby, you can write:
def self.with_meals_of_the_week
includes({menu_items: :meal})
.where('menu_items.date is NULL or menu_items.date between ? and ?',
Time.now.beginning_of_week,
Time.now.end_of_week
)
end
The resulting SQL is now
.... WHERE (menu_items.date is NULL or menu_items.date between '2012-10-21' and '2012-10-28')
and the restaurants without meals stay in.
As it is said in Rails Guide, all Posts in your query will be returned only if you will not use "where" clause with "includes", cause using "where" clause generates OUTER JOIN request to DB with WHERE by right outer table so DB will return nothing.
Such implementation is very helpful when you need some objects (all, or some of them - using where by base model) and if there are related models just get all of them, but if not - ok just get list of base models.
On other hand if you trying to use conditions on including tables then in most cases you want to select objects only with this conditions it means you want to select Restaurants only which has meals_items.
So in your case, if you still want to use only 2 queries (and not N+1) I would probably do something like this:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
cattr_accessor :meals_of_the_week
def self.with_meals_of_the_week
restaurants = Restaurant.all
meals_of_the_week = {}
MenuItems.includes(:meal).where(date: Time.now.beginning_of_week..Time.now.end_of_week, restaurant_id => restaurants).each do |menu_item|
meals_of_the_week[menu_item.restaurant_id] = menu_item
end
restaurants.each { |r| r.meals_of_the_week = meals_of_the_week[r.id] }
restaurants
end
end
Update: Rails 4 will raise Deprecation warning when you simply try to do conditions on models
Sorry for possible typo.
I think there is some misunderstanding of this
If there was no where condition, this would generate the normal set of two queries.
If, in the case of this includes query, there were no comments for any
posts, all the posts would still be loaded. By using joins (an INNER
JOIN), the join conditions must match, otherwise no records will be
returned.
[from guides]
I think this statements doesn't refer to the example Post.includes(:comments).where("comments.visible", true)
but refer to one without where statement Post.includes(:comments)
So all work right! This is the way LEFT OUTER JOIN work.
So... you wrote: "If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded." Ok! But this is true ONLY when there is NO where clause! You missed the context of the phrase.

Rails: How to get objects with at least one child?

After googling, browsing SO and reading, there doesn't seem to be a Rails-style way to efficiently get only those Parent objects which have at least one Child object (through a has_many :children relation). In plain SQL:
SELECT *
FROM parents
WHERE EXISTS (
SELECT 1
FROM children
WHERE parent_id = parents.id)
The closest I've come is
Parent.all.reject { |parent| parent.children.empty? }
(based on another answer), but it's really inefficient because it runs a separate query for each Parent.
Parent.joins(:children).uniq.all
As of Rails 5.1, uniq is deprecated and distinct should be used instead.
Parent.joins(:children).distinct
This is a follow-up on Chris Bailey's answer. .all is removed as well from the original answer as it doesn't add anything.
The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:
Parent.where<<-SQL
EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
SQL
EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:
class Post < ApplicationRecord
has_many :comments
has_many :likes
end
class Comment < ApplicationRecord
belongs_to :post
end
class Like < ApplicationRecord
belongs_to :post
end
In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:
# Create posts with comments and likes
100.times do |i|
post = Post.create!(title: "Post #{i}")
50.times do |j|
post.comments.create!(content: "Comment #{j} for #{post.title}")
post.likes.create!(user_name: "User #{j} for #{post.title}")
end
end
# Create a post without comment and like
Post.create!(title: 'Hidden post')
If you want to get posts which have at least one comment and like, you might write like this:
# NOTE: uniq method will be removed in Rails 5.1
Post.joins(:comments, :likes).distinct
The query above generates SQL like this:
SELECT DISTINCT "posts".*
FROM "posts"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"
But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.
In this case you should write like this:
Post.where <<-SQL
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
SQL
This query generates SQL like this:
SELECT "posts".*
FROM "posts"
WHERE (
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
)
This query does not generate useless duplicated rows, so it could be faster.
Here is benchmark:
user system total real
Uniq: 0.010000 0.000000 0.010000 ( 0.074396)
Exists: 0.000000 0.000000 0.000000 ( 0.003711)
It shows EXISTS is 20.047661 times faster than DISTINCT.
I pushed the sample application in GitHub, so you can confirm the difference by yourself:
https://github.com/JunichiIto/exists-query-sandbox
I have just modified this solution for your need.
Parent.joins("left join childrens on childrends.parent_id = parents.id").where("childrents.parent_id is not null")
You just want an inner join with a distinct qualifier
SELECT DISTINCT(*)
FROM parents
JOIN children
ON children.parent_id = parents.id
This can be done in standard active record as
Parent.joins(:children).uniq
However if you want the more complex result of find all parents with no children
you need an outer join
Parent.joins("LEFT OUTER JOIN children on children.parent_id = parent.id").
where(:children => { :id => nil })
which is a solution which sux for many reasons. I recommend Ernie Millers squeel library which will allow you to do
Parent.joins{children.outer}.where{children.id == nil}
try including the children with #includes()
Parent.includes(:children).all.reject { |parent| parent.children.empty? }
This will make 2 queries:
SELECT * FROM parents;
SELECT * FROM children WHERE parent_id IN (5, 6, 8, ...);
[UPDATE]
The above solution is usefull when you need to have the Child objects loaded.
But children.empty? can also use a counter cache1,2 to determine the amount of children.
For this to work you need to add a new column to the parents table:
# a new migration
def up
change_table :parents do |t|
t.integer :children_count, :default => 0
end
Parent.reset_column_information
Parent.all.each do |p|
Parent.update_counters p.id, :children_count => p.children.length
end
end
def down
change_table :parents do |t|
t.remove :children_count
end
end
Now change your Child model:
class Child
belongs_to :parent, :counter_cache => true
end
At this point you can use size and empty? without touching the children table:
Parent.all.reject { |parent| parent.children.empty? }
Note that length doesn't use the counter cache whereas size and empty? do.

Resources