Rails join table and multi sum - ruby-on-rails

What I want to do is to join table and sum 3 columns.
self.document_products.joins("JOIN products ON products.id = document_products.product_id").group("products.tax_id").select("sum(a), sum(b), sum(c)")
Gives me
#<ActiveRecord::Relation [#<DocumentProduct id: nil>]>
Something like that works:
self.document_products.joins("JOIN products ON products.id = document_products.product_id").group("products.tax_id").sum("a")
But I want to have 3 sums. I can`t do sum("a, b, c"). Where is the problem?

So, the code is building a SQL query using the ActiveRecord chained method syntax. It's possible to use .to_sql as the final part of most such chains (basically, as long as it's still an ActiveRecord object, rather than having been converted to an Array, for example) to see the SQL generated, or indeed inspecting the log, if it's on. Considering the common part of the chain:
self.document_products.joins("JOIN products ON products.id = document_products.product_id").group("products.tax_id")
This generates something like (might not be exact, because I'm guessing a little about your application):
SELECT "document_products".* FROM "document_products" JOIN products ON products.id = document_products.product_id WHERE "document_products"."document_id" = 1497 GROUP BY products.tax_id
The two final methods you list are very different; select selects which columns in the query to return, whereas sum is an aggregate function which expects a single value to be returned in each case. Considering the select, we get something like the following generated:
SELECT SUM(products.a), SUM(products.b), SUM(products.c) FROM "document_products" JOIN products ON products.id = document_products.product_id WHERE "document_products"."document_id" = 1497 GROUP BY products.tax_id
When this query is interpreted, the expected data cannot be found, leading to the problem described. Ensuring that the GROUP BY clause is included in the SELECT part, however, yields the necessary information. Try something like this:
self.document_products.joins("JOIN products ON products.id = document_products.product_id").group("products.tax_id").select("products.tax_id, sum(a), sum(b), sum(c)")
This generates SQL something like:
SELECT products.tax_id, SUM(products.a), SUM(products.b), SUM(products.c) FROM "document_products" JOIN products ON products.id = document_products.product_id WHERE "document_products"."document_id" = 1497 GROUP BY products.tax_id
This appears to return the necessary information, and is, I think, what you're looking for (or close to it).

Related

Rails: How to force ActiveRecord generate alias for an association every time (just like Hibernate in Java does it), not only when it's ambiguous?

I work on a project where there is STI Item with 5 subclasses (Item1, Item2 ... Item5). This STI (items table) is mapped over a join table item_parents to Parent record (parents table) record. The mapping is done via has_many trough:.
Each of the items has two fields: name, code both are strings. Parent has many fields, but for the sake of example let's say it has name, created_at.
On the frontend, they are displayed in one table, like this:
Parent.name | Parent.created_at | Item1.name | Item1.code | Item2.name | Item2.code | ...
Users can configure filtering for each of the columns. It can be any combination or no filter at all. For example, they can choose the following combination:
Parent.created_at before 2020.02.22
Item1.name containing 'abc'
Item2.name containing 'xyz'
Item3.code equals 'Z12'
The filtering code implemented like this:
def search(filters)
filters.reduce(Parent.all) { |query, (key, value)| apply_filter(query, key, value) }
end
def apply_filter(query, key, value)
case filter_key
when :parent_name_contains
query.where(Parent.arel_table[:name].matches("%#{value}%"))
when :parent_created_at_before
query.where(Parent.arel_table[:created_at].lt(value))
when :item1_name_contains
query.joins(:item1s).where(Item1.arel_table[:name].matches("%#{value}%"))
when :item2_name_contains
query.joins(:item2s).where(Item2.arel_table[:name].matches("%#{value}%"))
when :item1_code_equals
query.joins(:item1s).where(Item1.arel_table[:name].eq(value))
when :item2_code_equals
query.joins(:item2s).where(Item2.arel_table[:name].eq(value))
# ... and so on for all the filters
else
query
end
end
The problem
When I query by fields of two or more different subclasses of Item, ActiveRecord fails to generate correct WHERE clause. It does not use the alias that it has assigned for the association in JOIN clause.
Let's say I want to filter by Item1.name = 'i1' and Item2.name = 'i2', then what rails generates is this:
SELECT "parents".*
FROM "parents"
INNER JOIN "item_parents"
ON "item_parents"."parent_id" = "parents"."id"
INNER JOIN "items"
ON "items"."id" = "item_parents"."item_id"
AND "items"."item_type" = 'Item::Item1'
INNER JOIN "item_parents" "item_parents_parents_join"
ON "item_parents_parents_join"."parent_id" = "parents"."id"
INNER JOIN "items" "item2s_parents" -- OK. join has an alias
ON "item2s_parents"."id" = "item_parents_parents_join"."item_id"
AND "item2s_parents"."item_type" = 'Item::Item2'
WHERE "items"."name" = 'i1'
AND "items"."name" = 'i2' -- Wrong! Must be "item2s_parents"."name" = 'i2'
As a result, I have zero rows returned, because it's impossible to have an item with name equal to 'i1' AND 'i2' at the same time.
What I tried
It seemed to be a good idea to write a custom joins_item method, that would dig the query and check whether it has other joins called on it before (AR stores such information in query.values[:joins] and query.values[:left_outer_joins]) and if there is, then it would return another Arel::Table instance having the correct alias. If there is nothing joined before, then I don't need alias and return the default Arel::Table.
But then I found out that AR resolves aliases at the moment of building SQL. So even though I could guess the correct alias (or no alias) at the moment of joining it can change in the end. And this is actually what happens when you do left_outer_joins first and then joins. AR always places INNER JOINs before LEFT OUTER JOINs in the resulting SQL.
So the question is...
Is there a way to force AR to alias everything when I do joins or left_outer_joins with Arel, or any other more or less maintainable workaround/fix/monkey patch for this issue?

Why does Hive warn that this subquery would cause a Cartesian product?

According to Hive's documentation it supports NOT IN subqueries in a WHERE clause, provided that the subquery is an uncorrelated subquery (does not reference columns from the main query).
However, when I attempt to run the trivial query below, I get an error FAILED: SemanticException Cartesian products are disabled for safety reasons.
-- sample data
CREATE TEMPORARY TABLE foods (name STRING);
CREATE TEMPORARY TABLE vegetables (name STRING);
INSERT INTO foods VALUES ('steak'), ('eggs'), ('celery'), ('onion'), ('carrot');
INSERT INTO vegetables VALUES ('celery'), ('onion'), ('carrot');
-- the problematic query
SELECT *
FROM foods
WHERE foods.name NOT IN (SELECT vegetables.name FROM vegetables)
Note that if I use an IN clause instead of a NOT IN clause, it actually works fine, which is perplexing because the query evaluation structure should be the same in either case.
Is there a workaround for this, or another way to filter values from a query based on their presence in another table?
This is Hive 2.3.4 btw, running on an Amazon EMR cluster.
Not sure why you would get that error. One work around is to use not exists.
SELECT f.*
FROM foods f
WHERE NOT EXISTS (SELECT 1
FROM vegetables v
WHERE v.name = f.name)
or a left join
SELECT f.*
FROM foods f
LEFT JOIN vegetables v ON v.name = f.name
WHERE v.name is NULL
You got cartesian join because this is what Hive does in this case. vegetables table is very small (just one row) and it is being broadcasted to perform the cross (most probably map-join, check the plan) join. Hive does cross (map) join first and then applies filter. Explicit left join syntax with filter as #VamsiPrabhala said will force to perform left join, but in this case it works the same, because the table is very small and CROSS JOIN does not multiply rows.
Execute EXPLAIN on your query and you will see what is exactly happening.

Properly format an ActiveRecord query with a subquery in Postgres

I have a working SQL query for Postgres v10.
SELECT *
FROM
(
SELECT DISTINCT ON (title) products.title, products.*
FROM "products"
) subquery
WHERE subquery.active = TRUE AND subquery.product_type_id = 1
ORDER BY created_at DESC
With the goal of the query to do a distinct based on the title column, then filter and order them. (I used the subquery in the first place, as it seemed there was no way to combine DISTINCT ON with ORDER BY without a subquery.
I am trying to express said query in ActiveRecord.
I have been doing
Product.select("*")
.from(Product.select("DISTINCT ON (product.title) product.title, meals.*"))
.where("subquery.active IS true")
.where("subquery.meal_type_id = ?", 1)
.order("created_at DESC")
and, that works! But, it's fairly messy with the string where clauses in there. Is there a better way to express this query with ActiveRecord/Arel, or am I just running into the limits of what ActiveRecord can express?
I think the resulting ActiveRecord call can be improved.
But I would start improving with original SQL query first.
Subquery
SELECT DISTINCT ON (title) products.title, products.* FROM products
(I think that instead of meals there should be products?) has duplicate products.title, which is not necessary there. Worse, it misses ORDER BY clause. As PostgreSQL documentation says:
Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
I would rewrite sub-query as:
SELECT DISTINCT ON (title) * FROM products ORDER BY title ASC
which gives us a call:
Product.select('DISTINCT ON (title) *').order(title: :asc)
In main query where calls use Rails-generated alias for the subquery. I would not rely on Rails internal convention on aliasing subqueries, as it may change anytime. If you do not take this into account you could merge these conditions in one where call with hash-style argument syntax.
The final result:
Product.select('*')
.from(Product.select('DISTINCT ON (title) *').order(title: :asc))
.where(subquery: { active: true, meal_type_id: 1 })
.order('created_at DESC')

Order by foreign key in activerecord: without a join?

I want to expand this question.
order by foreign key in activerecord
I'm trying to order a set of records based on a value in a really large table.
When I use join, it brings all the "other" records data into the objects.. As join should..
#table users 30+ columns
#table bids 5 columns
record = Bid.find(:all,:joins=>:users, :order=>'users.ranking DESC' ).first
Now record holds 35 fields..
Is there a way to do this without the join?
Here's my thinking..
With the join I get this query
SELECT * FROM "bids"
left join users on runner_id = users.id
ORDER BY ranking LIMIT 1
Now I can add a select to the code so I don't get the full user table, but putting a select in a scope is dangerous IMHO.
When I write sql by hand.
SELECT * FROM bids
order by (select users.ranking from users where users.id = runner_id) DESC
limit 1
I believe this is a faster query, based on the "explain" it seems simpler.
More important than speed though is that the second method doesn't have the 30 extra fields.
If I build in a custom select inside the scope, it could explode other searches on the object if they too have custom selects (there can be only one)
What you would like to achieve in active record writing is something along
SELECT b.* from bids b inner join users u on u.id=b.user_id order by u.ranking desc
In active record i would write such as:
Bids.joins("inner join users u on bids.user_id=u.id").order("u.ranking desc")
I think it's the only to make a join without fetching all attributes from the user models.

How to get records from multiple condition from a same column through associated table

Let say a book model HABTM categories, for an example book A has categories "CA" & "CB". How can i retrieve book A if I query using "CA" & "CB" only. I know about the .where("category_id in (1,2)") but it uses OR operation. I need something like AND operation.
Edited
And also able to get books from category CA only. And how to include query criteria such as .where("book.p_year = 2012")
ca = Category.find_by_name('CA')
cb = Category.find_by_name('CB')
Book.where(:id => (ca.book_ids & cb.book_ids)) # & returns elements common to both arrays.
Otherwise you'd need to abuse the join table directly in SQL, group the results by book_id, count them, and only return rows where the count is at least equal to the number of categories... something like this (but I'm sure it's wrong so double check the syntax if you go this route. Also not sure it would be any faster than the above):
SELECT book_id, count(*) as c from books_categories where category_id IN (1,2) group by book_id having count(*) >= 2;

Resources