In Postgresql, if you have a table that has two columns: a list of purchases, and another column that is the state in which those purchases were made, how would you count the number of purchases by state.
if your column names are state and purchases you can group by the state column and use count(purchases) to count all the instances of purchases within that state. I have posted an example below. You will just need to fill in the table name that you are pulling from.
SELECT
state,
count(purchases) as purchase_count
FROM
[table_name]
GROUP BY
state
Secondarily, you can order the with the most purchases to the least by using ORDER BY and referencing the column number. Example below:
SELECT
state,
count(purchases) as purchase_count
FROM
[table_name]
GROUP BY
state
ORDER BY
2 DESC
Related
I have a table that deliberately has duplicates in it. In this instance the things that will be duplicated are a deviceId, and the datetime. Sometimes the customer updates their data. The table has three columns, deviceId, datetime and value (there is an incremental primary key). Sometimes when the customer re-evaluates their data, they notice that the value is incorrect, they then update it and send the data for re-processing. As a consequence, i need to be able to delete records that are not the very latest records. I cant do it by datetime, as this will also be duplicated in some cases and I cant truncate the staging table.
To delete the dupes I have the following:
;WITH DupeData AS (
SELECT ROW_NUMBER() OVER(PARTITION BY tblMeterData_Id,fldDateTime, fldValue, [fldBatchId],[fldProcessed] ORDER BY fldDateTime) AS ROW
FROM [Stage.tblMeterData])
DELETE FROM DupeData
WHERE ROW > 1
The problem with this, is it seems to delete a random duplicate.
I want to keep the latest record that is in the staging area and delete any others that are not the latest record. I can then update the relevant row with the new value, with the latest data, when I take it from staging into prod.
is any primary or unique key on the table?
if there's unique id - the easiest way below
not sure about performance but should work ok on small amounts
DELETE FROM DupeData
where id in
(select id from
( SELECT id,
ROW_NUMBER() OVER(PARTITION BY tblMeterData_Id,fldDateTime, fldValue, [fldBatchId],[fldProcessed] ORDER BY fldDateTime) AS ROW
FROM [Stage.tblMeterData])
) q
where q.row > 1)
I want to create a leaderboard for highest sum of purchases. Every Purchase has a user_id and a price. Every Purchase, belongs_to a User. We want to query all the purchases, group the records by user_id, and sum the price totals. I have tried a million things. The closest I've been able to come is
Purchase.joins(:user).select('users.*, sum(price) as total').group('user_id'). order('total DESC').limit(20)
which returns the error ActiveRecord::StatementInvalid: PG::Error: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT users.*, sum(price) as total FROM "purchase...
I'm running PostgreSQL 9 and Rails 3. Any and all help appreciated. Thanks!
You need to include users.id in group clause instead of user_id.
Try this:
Purchase.joins(:user).select('users.*, sum(price) as total').group('users.id').order('total DESC').limit(20)
I am trying to generate a report to screen of accounting transaction history. In most situations it is one display row per record in the AccountingTransaction table. But occasionally there are transactions that I wish to display to the end user as one transaction which are really, behind the scenes, two accounting transactions. This is caused by deferral of revenues and fund splitting since this app is a fund accounting app.
If I display all rows one by one, those double entries look odd to the user since the fund splitting and deferral is "behind the scenes". So I want to roll up all the related transactions into one display row on screen.
I have my query now using group by to group the related transactions
#history = AccountingTransaction.where("customer_id in (?) AND no_download <> 1", customers_in_account).group(:transaction_type_id, :reference_id).order(:created_at)
as I loop through I get the transactions grouped as I want but I am struggling with how to display the total sum of the 'credit' field for all records in the group. (It is only showing the credit for the first record of the group) If I add a .sum(:credit) to my query, of course, it returns the sums just as I want but not all the other data.
Is there a way for me to group these records like in my #history query and also get the sum of the credit field for each respective group?
* Addition *
What I really want is what the following SQL query would give me.
SELECT transaction_type_id, reference_id, sum(credit)
WHERE customer_id in (21,22,23,24) AND no_download <> 1
GROUP BY reference_id, transaction_type_id ORDER BY created_at
I'm not sure you can do "ORDER BY created_at" and not include it in the select fields, but here is an example.
#history = AccountingTransaction.
select([:reference_id, :transaction_type_id, :created_at]).
select(AccountingTransaction.arel_table[:credit].sum.as("credit_sum")).
where("customer_id in (?) AND no_download <> 1", customers_in_account).
group(:transaction_type_id, :reference_id).
order(:created_at)
To access the credit_sum you could do:
#history[0].attributes["credit_sum"]
I guess if you'd like, you could create a method:
def credit_sum
attributes["credit_sum"]
end
EDIT *
As stated in comments you can access the attribute directly:
#history[0].credit_sum
In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".
I have two tables, one containing a list of different options users can select from. For example:
tbl_options
id_option
option
The next table I use to store which of these options the user selects. For example:
tbl_selected
id_selected
id_option
id_user
I use PHP to loop through the tbl_options table to generate a full list of checkboxes that the user can select from. When a user selects an option, the id_option and id_user are stored in the tbl_selected table. When a user deselects an option, the id_selected record is deleted from the tbl_selected table.
The challenge I am having is the best way to retrieve the full list of options in tbl_options, plus having the query indicate the associated records stored in the tbl_selected table.
I've tried LEFT JOIN'ing tbl_options to tbl_selected which provides me with the full list of options, but as soon as I add the WHERE id_user = ### the query only returns those records with values in tbl_selected. Ideally, I would like to see the results from a query as follows:
id_option option id_user
1 Apples 3
2 Oranges 3
3 Bananas
4 Pears
5 Peaches 3
This would indicate that user #3 has stored Apples, Oranges and Peaches. This also indicates that user #3 has not selected Bananas or Pears.
Is this possible using a SQL statement or should I pursue a different technique?
Your problem is that the user-restriction is applied to the whole query. To apply it only to the Join condition you need to add it to the ON clause like this:
select o.id_option, o.[option], s.id_user
from tbl_options o
left outer join tbl_selected s
on o.id_option = s.id_option and s.id_user = 3