Nested SQL SELECT in Rails 4 - ruby-on-rails

I'm looking for a way to generate the following SQL in Rails (to make it a scope), so that I could chain it with further scopes (e.g. Article.published.most_comments):
SELECT *, cs.count
FROM articles, (
SELECT article_id, count(*)
FROM comments
GROUP BY comments.article_id
) cs
WHERE articles.id = cs.article_id
ORDER BY cs.count DESC;
I've tried something along the lines of Article.joins(:comments).select('*').group('comments.article_id'), but that doesn't generate the desired SQL:
SELECT * FROM "articles"
INNER JOIN "comments" ON "comments"."article_id" = "articles"."id"
GROUP BY comments.article_id
(PSQL): PG::GroupingError: ERROR: column "articles.id"
must appear in the GROUP BY clause or be used in
an aggregate function
And there doesn't appear to be a .from method in which I could specify the nested SQL SELECT.

Actually, there's a .from method:
scope :most_comments, -> {
Article.select('*, cs.count').from(
'articles, (
SELECT article_id, count(*)
FROM comments
GROUP BY comments.article_id
) cs'
)
.where('articles.id = cs.article_id')
.order('cs.count DESC')
}
Not sure if this is the best way but it works...

Related

How can I make a recursive CTE query in an ActiveRecord scope?

I have a simple table to represent a hierarchy of organizations:
CREATE TABLE public.organizations (
id integer NOT NULL,
name character varying(255) NOT NULL,
parent_id integer,
deleted_at timestamp without time zone
)
The objective is, given some query for some organizations, to expand those results to include all the descendendants. In Postgres, this can be done with a recursive CTE, like this:
WITH RECURSIVE "all_orgs" AS (
SELECT "organizations".*
FROM "organizations"
/* some filter, maybe some joins here */
UNION
SELECT "organizations".*
FROM "organizations"
INNER JOIN "all_orgs" ON "all_orgs"."id" = "organizations"."parent_id"
)
SELECT "all_orgs".* FROM "all_orgs"
I would like to program a reusable way to include all the descendents for any arbitrary initial set of organizations. So naturally, I attempted to implement a scope:
class Organization << ApplicationRecord
scope :recursive_child_orgs, ->(root_orgs) do
all_orgs = Arel::Table.new(:all_orgs)
join_constraint = Arel::Nodes::On.new(all_orgs[:id].eq(arel_table[:parent_id]))
join_node = Arel::Nodes::InnerJoin.new(all_orgs, join_constraint)
child_orgs = joins(join_node)
union = root_orgs.arel.union(child_orgs.arel)
recursive_cte = Arel::SelectManager.new(all_orgs).tap do |sm|
sm.with(:recursive, Arel::Nodes::As.new(all_orgs, union))
sm.project(all_orgs[Arel::star])
end
from(recursive_cte.as(table_name))
end
end
It would be used something like Organization.recursive_child_orgs(Organization.where(id: 3)) (In practice the inner part of that expression would be something less trivial.)
This is tantalizingly close, generating the query:
SELECT "organizations".* FROM (
WITH RECURSIVE "all_orgs" AS (
SELECT "organizations".*
FROM "organizations"
WHERE "organizations"."deleted_at" IS NULL
AND "organizations"."id" = /* note: missing value! */
UNION
SELECT "organizations".*
FROM "organizations"
INNER JOIN "all_orgs" ON "all_orgs"."id" = "organizations"."parent_id"
WHERE "organizations"."deleted_at" IS NULL
)
SELECT "all_orgs".* FROM "all_orgs"
) organizations
WHERE "organizations"."deleted_at" IS NULL;
The default scope for Organization was included, nice! And the whole CTE mess is bundled up as a thing named organizations, which means it looks to ActiveRecord to be the organizations table, so adding sorting and whatnot after the recursive bit should work.
But unfortunately, the bound parameter that was introduced with Organization.where(id: 3) does not survive the translation, and a syntactically invalid statement is generated as a result. How might this be fixed?

Remove duplicated records keeping last usign ActiveRecord

I've been trying to remove the records that are duplicated (same value in the column shopify_order_id) keeping the most recent one.
I wrote it in sql:
select orders.id from (
select shopify_order_id, min(shopify_created_at) as min_created
from orders group by shopify_order_id having count(*) > 1 limit 5000
) as keep_orders
join orders
on
keep_orders.shopify_order_id = orders.shopify_order_id and
orders.shopify_created_at <> keep_orders.min_created
and now I'm trying to get it to Active Record but can't seem to join the two parts.
The first nested select is
Order.select('shopify_order_id, MIN(shopify_created_at) as min_created').
group(:shopify_order_id).
having('count(*) > 1').
limit(5000)
but then the following doesn't work:
Order.select('orders.id').from(keep_orders, :keep_orders).
joins('orders ON keep_orders.shopify_order_id = orders.shopify_order_id').
where.not('orders.shopify_created_at = keep_orders.min_created')
it builds the query:
SELECT orders.id FROM (SELECT shopify_order_id, MIN(shopify_created_at) as min_created FROM "orders" GROUP BY "orders"."shopify_order_id" HAVING (count(*) > 1) LIMIT $1) keep_orders orders ON keep_orders.shopify_order_id = orders.shopify_order_id WHERE NOT (orders.shopify_created_at = keep_orders.min_created) ORDER BY "orders"."id" ASC LIMIT $2 [["LIMIT", 5000], ["LIMIT", 1]]
which is missing the keyword join.
Any help on how to refactor the query/do it in another way would be more than appreciated.
If you call joins with a string SQL fragment you need to specify the type of join you want:
Order.select('orders.id').from(keep_orders, :keep_orders)
.joins('JOIN orders ON keep_orders.shopify_order_id = orders.shopify_order_id')
.where.not('orders.shopify_created_at = keep_orders.min_created')

How to eager load child model's sum value for ruby on rails?

I have an Order model, it has many items, it looks like this
class Order < ActiveRecord::Base
has_many :items
def total
items.sum('price * quantity')
end
end
And I have an order index view, querying order table like this
def index
#orders = Order.includes(:items)
end
Then, in the view, I access total of order, as a result, you will see tons of SUM query like this
SELECT SUM(price * quantity) FROM "items" WHERE "items"."order_id" = $1 [["order_id", 1]]
SELECT SUM(price * quantity) FROM "items" WHERE "items"."order_id" = $1 [["order_id", 2]]
SELECT SUM(price * quantity) FROM "items" WHERE "items"."order_id" = $1 [["order_id", 3]]
...
It's pretty slow to load order.total one by one, I wonder how can I load the sum in a eager manner via single query, but still I can access order.total just like before.
Try this:
subquery = Order.joins(:items).select('orders.id, sum(items.price * items.quantity) AS total').group('orders.id')
#orders = Order.includes(:items).joins("INNER JOIN (#{subquery.to_sql}) totals ON totals.id = orders.id")
This will create a subquery that sums the total of the orders, and then you join that subquery to your other query.
I wrote up two options for this in this blog post on using find_by_sql or joins to solve this.
For your example above, using find_by_sql you could write something like this:
Order.find_by_sql("select
orders.id,
SUM(items.price * items.quantity) as total
from orders
join items
on orders.id = items.order_id
group by
order.id")
Using joins, you could rewrite as:
Order.all.select("order.id, SUM(items.price * items.quantity) as total").joins(:items).group("order.id")
Include all the fields you want in your select list in both the select clause and the group by clause. Hope that helps!

column "users.id" must appear in the GROUP BY clause or be used in an aggregate function

Relationships:
Item belongs to Product
Product belongs to User
Item model scope:
scope :search, ->(search_term) {
select('products.name, users.*, products.brand, COUNT(products.id)')
.joins(:product => :user)
.where('users.name = ? OR products.brand = ?', search_term, search_term)
.group('products.id')
}
The above results in the following SQL statement:
SELECT products.name, users.*, products.brand, COUNT(products.id) FROM "items"
INNER JOIN "products" ON "products"."id" = "items"."product_id"
INNER JOIN "users" ON "users"."id" = "products"."user_id"
WHERE (users.name = 'Atsuete Lipstick' OR products.brand = 'Atsuete Lipstick')
GROUP BY products.id
The problem here is that an error occurs:
ActiveRecord::StatementInvalid: PG::Error: ERROR: column "users.id"
must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT products.name, users.*, products.brand, COUNT(product...
What could be a fix for this?
From the error you can see that you should try including users.id in the GROUP BY clause:
.group('products.id, users.id')

Nested query in squeel

Short version: How do I write this query in squeel?
SELECT OneTable.*, my_count
FROM OneTable JOIN (
SELECT DISTINCT one_id, count(*) AS my_count
FROM AnotherTable
GROUP BY one_id
) counts
ON OneTable.id=counts.one_id
Long version: rocket_tag is a gem that adds simple tagging to models. It adds a method tagged_with. Supposing my model is User, with an id and name, I could invoke User.tagged_with ['admin','sales']. Internally it uses this squeel code:
select{count(~id).as(tags_count)}
.select("#{self.table_name}.*").
joins{tags}.
where{tags.name.in(my{tags_list})}.
group{~id}
Which generates this query:
SELECT count(users.id) AS tags_count, users.*
FROM users INNER JOIN taggings
ON taggings.taggable_id = users.id
AND taggings.taggable_type = 'User'
INNER JOIN tags
ON tags.id = taggings.tag_id
WHERE tags.name IN ('admin','sales')
GROUP BY users.id
Some RDBMSs are happy with this, but postgres complains:
ERROR: column "users.name" must appear in the GROUP BY
clause or be used in an aggregate function
I believe a more agreeable way to write the query would be:
SELECT users.*, tags_count FROM users INNER JOIN (
SELECT DISTINCT taggable_id, count(*) AS tags_count
FROM taggings INNER JOIN tags
ON tags.id = taggings.tag_id
WHERE tags.name IN ('admin','sales')
GROUP BY taggable_id
) tag_counts
ON users.id = tag_counts.taggable_id
Is there any way to express this using squeel?
I wouldn't know about Squeel, but the error you see could be fixed by upgrading PostgreSQL.
Some RDBMSs are happy with this, but postgres complains:
ERROR: column "users.name" must appear in the GROUP BY clause or be
used in an aggregate function
Starting with PostgreSQL 9.1, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
BTW, your alternative query can be simplified, an additional DISTINCT would be redundant.
SELECT o.*, c.my_count
FROM onetable o
JOIN (
SELECT one_id, count(*) AS my_count
FROM anothertable
GROUP BY one_id
) c ON o.id = counts.one_id

Resources