I want to create a leaderboard for highest sum of purchases. Every Purchase has a user_id and a price. Every Purchase, belongs_to a User. We want to query all the purchases, group the records by user_id, and sum the price totals. I have tried a million things. The closest I've been able to come is
Purchase.joins(:user).select('users.*, sum(price) as total').group('user_id'). order('total DESC').limit(20)
which returns the error ActiveRecord::StatementInvalid: PG::Error: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT users.*, sum(price) as total FROM "purchase...
I'm running PostgreSQL 9 and Rails 3. Any and all help appreciated. Thanks!
You need to include users.id in group clause instead of user_id.
Try this:
Purchase.joins(:user).select('users.*, sum(price) as total').group('users.id').order('total DESC').limit(20)
Related
I have got difficulties group in PostGreSQL. My code below
Invoice.select("customer_id, due_date, sum(balance) AS total_balance, total_mount").group(:customer_id)
I have got the errors
Invoice Load (1.8ms) SELECT customer_id, due_date, sum(balance) AS total_balance, total_amount FROM "records" WHERE "records"."type" IN ('Invoice') GROUP BY "records"."customer_id"
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "records.due_date" must appear in the GROUP BY clause or be used in an aggregate function
enter code here
LINE 1: SELECT customer_id, due_date, sum(balance) AS total_balance,...
UPDATE
Thanks for #slicedpan for the solution. I have update like below code.
Invoice.select("customer_id, MAX(due_date), SUM(balance) AS total_balance").group(:customer_id)
I have read many of articles about that but It still show the error.
This error is due to the fact that Postgres doesn't know what to do with the due_date column. The query you have written is basically:
for each customer, show me the total sum of the balances for all their invoices, and the due date for one of their invoices.
The problem here is that in PostgreSQL you have to be explicit about which invoice you want to show the due_date of, whereas in MySQL or SQLite it will pick one (can't remember how off the top of my head). In this case I think it would make more sense to leave out the due_date from the select, otherwise use MAX(due_date) to get the most recent, or something like that.
You can get the same functionality as MySQL/SQLite if you do it like this:
Invoice.select("DISTINCT ON (customer_id) customer_id, due_date, sum(balance) over(partition by customer_id) AS total_balance, total_mount")
It will sum balance and select "random" due_date/total_mount for each customer_id. Just like non-standard GROUP BY possible in those 2 other RDBMS.
So here's the lay of the land:
I have a Applicant model which has_many Lead records.
I need to group leads by applicant email, i.e. for each specific applicant email (there may be 2+ applicant records with the email) i need to get a combined list of leads.
I already have this working using an in-memory / N+1 solution
I want to do this in a single query, if possible. Right now I'm running one for each lead which is maxing out the CPU.
Here's my attempt right now:
Lead.
all.
select("leads.*, applicants.*").
joins(:applicant).
group("applicants.email").
having("count(*) > 1").
limit(1).
to_a
And the error:
Lead Load (1.2ms) SELECT leads.*, applicants.* FROM "leads" INNER
JOIN "applicants" ON "applicants"."id" = "leads"."applicant_id"
GROUP BY applicants.email HAVING count(*) > 1 LIMIT 1
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column
"leads.id" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT leads.*, applicants.* FROM "leads" INNER JOIN
"appli...
This is a postgres specific issue. "the selected fields must appear in the GROUP BY clause".
must appear in the GROUP BY clause or be used in an aggregate function
You can try this
Lead.joins(:applicant)
.select('leads.*, applicants.email')
.group_by('applicants.email, leads.id, ...')
You will need to list all the fields in leads table in the group by clause (or all the fields that you are selecting).
I would just get all the records and do the grouping in memory. If you have a lot of records, I would paginate them or batch them.
group_by_email = Hash.new { |h, k| h[k] = [] }
Applicant.eager_load(:leads).each_batch(10_000) do |batch|
batch.each do |applicant|
group_by_email[:applicant.email] << applicant.leads
end
end
You need to use a .where rather than using Lead.all. The reason it is maxing out the CPU is you are trying to load every lead into memory at once. That said I guess I am still missing what you actually want back from the query so it would be tough for me to help you write the query. Can you give more info about your associations and the expected result of the query?
I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..
To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.
Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq
I am trying to generate a report to screen of accounting transaction history. In most situations it is one display row per record in the AccountingTransaction table. But occasionally there are transactions that I wish to display to the end user as one transaction which are really, behind the scenes, two accounting transactions. This is caused by deferral of revenues and fund splitting since this app is a fund accounting app.
If I display all rows one by one, those double entries look odd to the user since the fund splitting and deferral is "behind the scenes". So I want to roll up all the related transactions into one display row on screen.
I have my query now using group by to group the related transactions
#history = AccountingTransaction.where("customer_id in (?) AND no_download <> 1", customers_in_account).group(:transaction_type_id, :reference_id).order(:created_at)
as I loop through I get the transactions grouped as I want but I am struggling with how to display the total sum of the 'credit' field for all records in the group. (It is only showing the credit for the first record of the group) If I add a .sum(:credit) to my query, of course, it returns the sums just as I want but not all the other data.
Is there a way for me to group these records like in my #history query and also get the sum of the credit field for each respective group?
* Addition *
What I really want is what the following SQL query would give me.
SELECT transaction_type_id, reference_id, sum(credit)
WHERE customer_id in (21,22,23,24) AND no_download <> 1
GROUP BY reference_id, transaction_type_id ORDER BY created_at
I'm not sure you can do "ORDER BY created_at" and not include it in the select fields, but here is an example.
#history = AccountingTransaction.
select([:reference_id, :transaction_type_id, :created_at]).
select(AccountingTransaction.arel_table[:credit].sum.as("credit_sum")).
where("customer_id in (?) AND no_download <> 1", customers_in_account).
group(:transaction_type_id, :reference_id).
order(:created_at)
To access the credit_sum you could do:
#history[0].attributes["credit_sum"]
I guess if you'd like, you could create a method:
def credit_sum
attributes["credit_sum"]
end
EDIT *
As stated in comments you can access the attribute directly:
#history[0].credit_sum
How do I retrieve a set of records, ordered by count in Arel? I have a model which tracks how many views a product get. I want to find the X most frequently viewed products over the last Y days.
This problem has cropped up while migrating to PostgreSQL from MySQL, due to MySQL being a bit forgiving in what it will accept. This code, from the View model, works with MySQL, but not PostgreSQL due to non-aggregated columns being included in the output.
scope :popular, lambda { |time_ago, freq|
where("created_on > ?", time_ago).group('product_id').
order('count(*) desc').limit(freq).includes(:product)
}
Here's what I've got so far:
View.select("id, count(id) as freq").where('created_on > ?', 5.days.ago).
order('freq').group('id').limit(5)
However, this returns the single ID of the model, not the actual model.
Update
I went with:
select("product_id, count(id) as freq").
where('created_on > ?', time_ago).
order('freq desc').
group('product_id').
limit(freq)
On reflection, it's not really logical to expect a complete model when the results are made up of GROUP BY and aggregate functions results, as returned data will (most likely) match no actual model (row).
you have to extend your select clause with all column you wish to retrieve. or
select("views.*, count(id) as freq")
SQL would be:
SELECT product_id, product, count(*) as freq
WHERE created_on > '$5_days_ago'::timestamp
GROUP BY product_id, product
ORDER BY count(*) DESC, product
LIMIT 5;
Extrapolating from your example, it should be:
View.select("product_id, product, count(*) as freq").where('created_on > ?', 5.days.ago).
order("count(*) DESC" ).group('product_id, product').limit(5)
Disclaimer: Ruby syntax is a foreign language to me.