Re-write a query to avoid PG::GroupingError: ERROR: in the GROUP BY clause or be used in an aggregate function - ruby-on-rails

I tried many alternatives before posting this question.
I have a query on a table A with columns: id, num, user_id.
id is PK, user_id can be duplicate.
I need to have all the rows such that only unique user_id has chosen to have highest num value. For this, I came up with aSQL below, which will work in Oracle database. I am on ruby on rails platform with Postgres Database.
select stats.* from stats as A
where A.num > (
select B.num
from stats as B
where A.user_id == B.user_id
group by B.user_id
having B.num> min(B.num) )
I tried writing this query via active record method but still ran into
PG::GroupingError: ERROR: column "b.num" must appear in the GROUP BY
clause or be used in an aggregate function
Stat.where("stats.num > ( select B.nums from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
Can someone tell me alternative way of writing this query

The SELECT clause of your subquery in Rails doesn't match that of your example. Note that since you're performing an aggregate function min(B.num) in your HAVING clause, you'll have to also include it in your SELECT clause:
Stat.where("stats.num > ( select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
You may also need a condition to handle the case where select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) returns more than one row.

Related

Find n most referenced records by foreign_key in related table

I have a table skills and a table programs_skills which references skill_id as a foreign key, I want to retrieve the 10 most present skills in table programs_skills (I need to count the number of occurrence of skill_id in programs_skills and then order it by descending order).
I wrote this in my skill model:
def self.most_used(limit)
Skill.find(
ActiveRecord::Base.connection.execute(
'SELECT programs_skills.skill_id, count(*) FROM programs_skills GROUP BY skill_id ORDER BY count DESC'
).to_a.first(limit).map { |record| record['skill_id'] }
)
end
This is working but I would like to find a way to perform this query in a more elegant, performant, "activerecord like" way.
Could you help me rewrite this query ?
Just replace your query by:
WITH
T AS
(
SELECT skill_id, COUNT(*) AS NB, RANK() OVER(ORDER BY COUNT(*) DESC) AS RNK
FROM programs_skills
GROUP BY skill_id
)
SELECT wojewodztwo, NB
FROM T
WHERE RNK <= 10
This use CTE and windowed function.
ProgramsSkills.select("skill_id, COUNT(*) AS nb_skills")
.group(:skill_id).order("nb_skills DESC").limit(limit)
.first(limit).pluck(:skill_id)

only show highest value user entry [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 7 years ago.
I am creating a contest where user can submit multiple entries. Only the entry with the highest tonnage will be shown. In the index view all the entries has to be sorted descending based on tonnage value.
My submissions controller shows following:
#submissions = #contest.submissions.maximum(:tonnage, group: User)
The problem here is that I do not get an array back with all the submission values. I need something I can iterate through.
e.g. a list which only contains one submissions from a user which is the submission with the highest tonnage value.
When I just group I get following error:
GroupingError: ERROR: column "submissions.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "submissions".* FROM "submissions" WHERE "submission...
UPDATE:
I found an sql query who does approximately what I want.
select *
from submissions a
inner join
( select user_id, max(tonnage) as max_tonnage
from submissions
group by user_id) b
on
a.user_id = b.user_id and
a.tonnage = b.max_tonnage
How can I fix this in activerecord?
Comment info:
Simpler with DISTINCT ON:
SELECT DISTINCT ON (user_id) *
FROM submissions
ORDER BY user_id, tonnage DESC NULLS LAST;
NULLS LAST is only relevant if tonnage can be NULL:
Detailed explanation:
Select first row in each GROUP BY group?
Syntax in ActiveRecord:
Submission.select("DISTINCT ON (user_id) *").order("user_id, tonnage DESC NULLS LAST")
More in the Ruby documentation or this related answer:
Get a list of first record for each group
Possible performance optimization:
Optimize GROUP BY query to retrieve latest record per user
Sort result rows
Per request in comment.
SELECT * FROM (
SELECT DISTINCT ON (user_id) *
FROM submissions
ORDER BY user_id, tonnage DESC NULLS LAST
) sub
ORDER BY tonnage DESC NULLS LAST, user_id; -- 2nd item to break ties;
Alternatively use row_number() in a subquery:
SELECT * FROM (
SELECT *
, row_number() OVER (PARTITION BY user_id ORDER BY tonnage DESC NULLS LAST) AS rn
FROM submissions
) sub
WHERE rn = 1
ORDER BY tonnage DESC NULLS LAST, user_id;
Or the query you have, plus ORDER BY.

Using multiple column names in where with RoR and ActiveRecord

I want to produce the following sql using active record.
WHERE (column_name1, column_name1) IN (SELECT ....)
I don't know how to do this is active record.
I've tried these so far
where('column_name1, column_name2' => {})
where([:column_name1, :column_name2] => {})
This is the full query I'd like to create
SELECT a, Count(1)
FROM table
WHERE ( a, b ) IN (SELECT a,
Max(b)
FROM table
GROUP BY a)
GROUP BY a
HAVING Count(1) > 1)
I've already written a scope for the subquery
Thanks in advance.
WHERE (column_name1, column_name1) IN (SELECT ....) is not a valid construct in sql; so it can't be done in active record either.
The valid way of accomplishing the same in SQL would be:
WHERE column_name1 IN (select ....) OR column_name2 IN (select ...)
The same query can be used directly in the active record:
where("column_name1 IN (select ...) OR column_name2 IN (select...)")
Avoiding duplication:
selected_values = select ...
where("column_name IN ? OR column_name2 in ?", selected_values, selected_values)
So I decided to use an inner join to gain the same functionality. Here is my solution.
select(:column1, 'Count(1)').
joins("INNER JOIN (#{subquery.to_sql}) AS table2 ON
table1.column1=table2.column1
AND table1.column2=table2.column2")

Nested query in squeel

Short version: How do I write this query in squeel?
SELECT OneTable.*, my_count
FROM OneTable JOIN (
SELECT DISTINCT one_id, count(*) AS my_count
FROM AnotherTable
GROUP BY one_id
) counts
ON OneTable.id=counts.one_id
Long version: rocket_tag is a gem that adds simple tagging to models. It adds a method tagged_with. Supposing my model is User, with an id and name, I could invoke User.tagged_with ['admin','sales']. Internally it uses this squeel code:
select{count(~id).as(tags_count)}
.select("#{self.table_name}.*").
joins{tags}.
where{tags.name.in(my{tags_list})}.
group{~id}
Which generates this query:
SELECT count(users.id) AS tags_count, users.*
FROM users INNER JOIN taggings
ON taggings.taggable_id = users.id
AND taggings.taggable_type = 'User'
INNER JOIN tags
ON tags.id = taggings.tag_id
WHERE tags.name IN ('admin','sales')
GROUP BY users.id
Some RDBMSs are happy with this, but postgres complains:
ERROR: column "users.name" must appear in the GROUP BY
clause or be used in an aggregate function
I believe a more agreeable way to write the query would be:
SELECT users.*, tags_count FROM users INNER JOIN (
SELECT DISTINCT taggable_id, count(*) AS tags_count
FROM taggings INNER JOIN tags
ON tags.id = taggings.tag_id
WHERE tags.name IN ('admin','sales')
GROUP BY taggable_id
) tag_counts
ON users.id = tag_counts.taggable_id
Is there any way to express this using squeel?
I wouldn't know about Squeel, but the error you see could be fixed by upgrading PostgreSQL.
Some RDBMSs are happy with this, but postgres complains:
ERROR: column "users.name" must appear in the GROUP BY clause or be
used in an aggregate function
Starting with PostgreSQL 9.1, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
BTW, your alternative query can be simplified, an additional DISTINCT would be redundant.
SELECT o.*, c.my_count
FROM onetable o
JOIN (
SELECT one_id, count(*) AS my_count
FROM anothertable
GROUP BY one_id
) c ON o.id = counts.one_id

Ruby: group and count the number of results

How can I count the number of results returned by a "group" query without getting the data ? So far, I am just getting a hashtable of results. Is it possible in rails3 to optimize this query ?
Vote.group("question_id, user_id").where("question_id = 3").count.count
=> 2
In this case we are doing a count of this hashtable => {1=>10, 15=>1}
Query is:
SELECT COUNT(*) AS count_all, question_id, user_id AS question_id_user_id
FROM `votes`
WHERE (question_id = 3)
GROUP BY question_id, user_id
You can use count_by_sql:
Vote.count_by_sql("select count(*) from ( select 1 from Votes group by question_id, user_id )")
Or, you can build up the query using Rails, and then run it:
query = Vote.group(:question_id, :user_id).to_sql
count = Vote.count_by_sql("select count(*) from ( #{query} )")

Resources