Count records after where clause - ruby-on-rails

I have three models: Catalog, Upload and Product. A product belongs to a catalog, and an upload belongs to a product.
I need to count the number of uploads for all the products of a given catalog.
This is the way I've been doing it so far, which is incredibly slow for a large amount of uploads or products:
#products = Product.where(catalog_id: 123)
#uploads_count = Upload.where(product_id: #products.pluck(:id)).count
I'd like to avoid loading all the products just for a count.
Should I use raw SQL or is there a better way to do this with ActiveRecord ?

This should do it for you:
Upload.joins(:product).where(products: { catalog_id: 123 }).count
Using joins creates an INNER JOIN between the two tables, allowing you to query the products table as above.
Note the singular and plural uses of product - the joins should reflect the association (the upload belongs to one product), while the where clause always uses the table name, typically pluralised.
The SQL will look similar to:
SELECT "uploads".* FROM "uploads"
INNER JOIN "products"
ON "products"."id" = "uploads"."product_id"
WHERE "products"."catalog_id" = 123
If you need to have more information on the catalog you can also include this, something like the following:
Upload.joins(product: :catalog).where(products: { catalogs: { whatever: 'you want to query' } }).count
Bear in mind, using joins is just for a query such as this. If you need to access attributes of the product or catalog, you should use another approach, such as includes, to preload the data and avoid N + 1 queries. There's a good read here if you're interested.

Another way to avoid selecting records is to use sub-query. This can be done the following way:
query = User.where(id: 1..100)
User.where(id: query.select(:id)).count
# [DEBUG] (10.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" IN (SELECT "users"."id" FROM "users" WHERE ("users"."id" BETWEEN $1 AND $2)) [["id", 1], ["id", 100]]
# => 33
So, User.where(id: 1..100) prepares a query, that can be used as a sub-select. .select(:field) tells what field you are interested in.
Though for a basic count, SRack provides a good answer.

Related

Additive scope conditions for has_many :through

I want a user to be able to find all posts that have one or more tags. And I'd like the tags to be additive criteria, so for example you could search for posts that have just the 'News' tag, or you could search for posts that have both the 'News' and 'Science' tags.
Currently what I have, and it works, is a Post model, a Tag model, and a join model called Marking. Post has_many :tags, through: :markings. I get what I need by passing an array of Tag ids to a Post class method:
post.rb
def self.from_tag_id_array array
post_array = []
Marking.where(tag_id: array).group_by(&:post_id).each do |p_id,m_array|
post_array << p_id if m_array.map(&:tag_id).sort & array.sort == array.sort
end
where id: post_array
end
This seems like a clunky way to get there. Is there a way I can do this with a scope on an association or something of the like?
So the general rule of thumb with building these kinds of queries is to minimize work in "Ruby-land" and maximize work in "Database-land". In your solution above, you're fetching a set of markings with any tags in the set array, which presumably will be a very large set (all posts that have any of those tags). This is represented in a ruby array and processed (group_by is in Ruby-world, group is the equivalent in Database-land).
So aside from being hard-to-read, that solution is going to be slow for any large set of markings.
There are a couple ways to solve the problem without doing any heavy lifting in Ruby-world. One way is using subqueries, like this:
scope :with_tag_ids, ->(tag_ids) {
tag_ids.map { |tag_id|
joins(:markings).where(markings: { tag_id: tag_id })
}.reduce(all) { |scope, subquery| scope.where(id: subquery) }
}
This generates a query like this (again for tag_ids 5 and 8)
SELECT "posts".*
FROM "posts"
WHERE "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 5)
AND "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 8)
Note that since everything here is calculated directly in SQL, no arrays are generated or processed in Ruby. This will generally scale much better.
Alternatively, you can use COUNT and do it in a single query without subqueries:
scope :with_tag_ids, ->(tag_ids) {
joins(:markings).where(markings: { tag_id: tag_ids }).
group(:post_id).having('COUNT(posts.id) = ?', tag_ids.count)
}
Which generates SQL like this:
SELECT "posts".*
FROM "posts"
INNER JOIN "markings" ON "markings"."post_id" = "posts"."id"
WHERE "markings"."tag_id" IN (5, 8)
GROUP BY "post_id"
HAVING (COUNT(posts.id) = 2)
This assumes that you don't have multiple markings with the same pair of tag_id and post_id, which would throw off the count.
I would imagine that the last solution is probably the most efficient, but you should try different solutions and see what works best for your data.
See also: Query intersection with activerecord

Rails group_by and sorting upon field in related model

I have a User that belongs to a User_type. In the user_type model, there's a field called position to handle the default sorting when displaying user_types and their users.
Unfortunataly this does not work when searching with Ransack, so I need to search from the User model and use group_by to group the records based on their user_type_id.
This works perfectly, but I need a way to respect the sorting that is defined in the user_type model. This is also dynamic, so there's no way of telling what the sorting is from the user model.
Therefor I think I need to loop through the group_by array and do the sorting manually. But I have no clue where to start. This is the controller method:
def index
#q = User.ransack(params[:q])
#users = #q.result(distinct: true).group_by &:user_type
end
How do I manipulate that array to sort on a field that in the related model?
Try to add this line to Usertype model
default_scope order('position')
First of all there is n+1 query problem. You are not joining user_types table to users and application calls SELECT on user_types n times where n is a number of Users + another one SELECT call to grab users:
...
UserType Load (0.2ms) SELECT "user_types".* FROM "user_types" WHERE "user_types"."id" = $1 LIMIT 1 [["id", 29]]
UserType Load (0.2ms) SELECT "user_types".* FROM "user_types" WHERE "user_types"."id" = $1 LIMIT 1 [["id", 7]]
...
So you need to include user_types and order by user_types.position:
#q.result(distinct: true).includes(:user_type).order('user_types.position')
There are a lot of examples for ordering here:
http://apidock.com/rails/ActiveRecord/QueryMethods/order
Your case (Ordering on associations) is also available
Information about n+1 query:
What is SELECT N+1?

Combine a join query with a normal condition

Users have a main category and multiple sub-categories
I would like to get all users who belong to a category, regardless if it is their main or sub.
#users = User.joins(:sub_categories).where('sub_category_id = ? OR type = ?', #sub_category.id, "User::#{category}User").page(params[:page])
A user's main category is also their STI type.
The query works, however I am getting duplicate results when trying to include the user's main type.
The query that is generated is:
User Load (0.0ms) SELECT "users".* FROM "users" INNER JOIN "user_sub_categories_users" ON "user_sub_categories_users"."user_id" = "users"."id" INNER JOIN "user_sub_categories" ON "user_sub_categories"."id" = "user_sub_categories_users"."sub_category_id" WHERE "users"."deleted_at" IS NULL AND (sub_category_id = 1 OR type = 'User::ModelUser') ORDER BY "users"."created_at" DESC LIMIT 20 OFFSET 0
EDIT: A user can not belong to a sub-category if its their main, so its safe to simply join the two conditions together
Because there is more than one group for each user, your join is creating multiple rows for each user, e.g.:
User Group
-------|------
UserA |GroupA
UserA |GroupB
UserB |GroupA
UserC |GroupA
UserC |GroupB
Three users, but five rows!
You can safely add a .uniq on the end of your query if you're just interested in the distinct users. In the context of an ActiveRecord query, .uniq will be converted to SQL's DISTINCT().

How to eager load child model's sum value for ruby on rails?

I have an Order model, it has many items, it looks like this
class Order < ActiveRecord::Base
has_many :items
def total
items.sum('price * quantity')
end
end
And I have an order index view, querying order table like this
def index
#orders = Order.includes(:items)
end
Then, in the view, I access total of order, as a result, you will see tons of SUM query like this
SELECT SUM(price * quantity) FROM "items" WHERE "items"."order_id" = $1 [["order_id", 1]]
SELECT SUM(price * quantity) FROM "items" WHERE "items"."order_id" = $1 [["order_id", 2]]
SELECT SUM(price * quantity) FROM "items" WHERE "items"."order_id" = $1 [["order_id", 3]]
...
It's pretty slow to load order.total one by one, I wonder how can I load the sum in a eager manner via single query, but still I can access order.total just like before.
Try this:
subquery = Order.joins(:items).select('orders.id, sum(items.price * items.quantity) AS total').group('orders.id')
#orders = Order.includes(:items).joins("INNER JOIN (#{subquery.to_sql}) totals ON totals.id = orders.id")
This will create a subquery that sums the total of the orders, and then you join that subquery to your other query.
I wrote up two options for this in this blog post on using find_by_sql or joins to solve this.
For your example above, using find_by_sql you could write something like this:
Order.find_by_sql("select
orders.id,
SUM(items.price * items.quantity) as total
from orders
join items
on orders.id = items.order_id
group by
order.id")
Using joins, you could rewrite as:
Order.all.select("order.id, SUM(items.price * items.quantity) as total").joins(:items).group("order.id")
Include all the fields you want in your select list in both the select clause and the group by clause. Hope that helps!

Rails 4 Eager load limit subquery

Is there a way to avoid the n+1 problem when eager loading and also applying a limit to the subquery?
I want to avoid lots of sql queries like this:
Category.all.each do |category|
category.posts.limit(10)
end
But I also want to only get 10 posts per category, so the standard eager loading, which gets all the posts, does not suffice:
Category.includes(:posts).all
What is the best way to solve this problem? Is N+1 the only way to limit the amount of posts per category?
From the Rails docs
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects
So given the following model definition
class Category < ActiveRecord::Base
has_many :posts
has_many :included_posts, -> { limit 10 }, class_name: "Post"
end
Calling Category.find(1).included_posts would work as expected and apply the limit of 10 in the query. However, if you try to do Category.includes(:included_posts).all the limit option will be ignored. You can see why this is the case if you look at the SQL generated by an eager load
Category.includes(:posts).all
Category Load (0.2ms) SELECT "categories".* FROM "categories"
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."category_id" IN (1, 2, 3)
If you added the LIMIT clause to the posts query, it would return a total of 10 posts and not 10 posts per category as you might expect.
Getting back to your problem, I would eager load all posts and then limit the loaded collection using first(10)
categories = Category.includes(:posts).all
categories.first.posts.first(10)
Although you're loading more models into memory, this is bound to be more performant since you're only making 2 calls against the database vs. n+1. Cheers.

Resources