Get latest record for multiple ids in psql - psql

I have a table where I have multiple records for the same id. I would like to get the latest record for each id using some where clause.
Sample table records
vendor_id | data | created_at | id
-----------+------------+----------------------------+----
1 | some-data | 2014-01-12 16:32:54.084505 | 2
vendor_id | data | created_at | id
-----------+------------+----------------------------+----
1 | Notsome-dat| 2014-01-13 16:32:54.084505 | 3
I have multiple vendors with same data. So I want to get all the latest records for all the vendors where I can filter it with data. I have been using following query
SELECT VENDOR_ID,MAX(CREATED_AT) FROM TABLE WHERE DATA ILIKE '%Not some%'GROUP BY VENDOR_ID;
However, this query also gives me the vendor_id where they have "Not some" data in their second latest record not the latest one.
Please help.

You have to use DISTINCT
SELECT
DISTINCT ON (vendor_id) data
FROM TABLE
ORDER BY vendor_id, "created_at" DESC;

Related

Using Crosstab to Generate Data for Charts

I'm trying to make an efficient query to create a view that will contains counts for the number of successful logins by day as well as by type of user with no duplicate users per day.
I have 3 tables involved in this query. One table that contains all successful login attempts, one table for standard user accounts, and one table for admin user accounts. All user_id values are unique across the entire database so there are no user accounts that will share the same user_id with an admin account:
TABLE 1: user_account
user_id | username
---------|----------
1 | user1
2 | user2
TABLE 2: admin_account
user_id | username
---------|----------
6 | admin6
7 | admin7
TABLE 3: successful_logins
user_id | timestamp
---------|------------------------------
1 | 2022-01-23 14:39:12.63798-07
1 | 2022-01-28 11:16:45.63798-07
1 | 2022-01-28 01:53:51.63798-07
2 | 2022-01-28 15:19:21.63798-07
6 | 2022-01-28 09:42:36.63798-07
2 | 2022-01-23 03:46:21.63798-07
7 | 2022-01-28 19:52:16.63798-07
2 | 2022-01-29 23:12:41.63798-07
2 | 2022-01-29 18:50:10.63798-07
The resulting view I would like to generate would contain the following information from the above 3 tables:
VEIW: login_counts
date_of_login | successful_user_logins | successful_admin_logins
---------------|------------------------|-------------------------
2022-01-23 | 1 | 1
2022-01-28 | 2 | 2
2022-01-29 | 1 | 0
I'm currently reading up on how crosstabs work but having trouble figuring out how to write the query based on my table setups.
I actually was able to get the values I needed by using the following query:
SELECT
to_char(s.timestamp, 'YYYY-MM-DD') AS login_date,
count(distinct u.user_id) AS successful_user_logins,
count(distinct a.user_id) AS successful_admin_logins
FROM successful_logins s
LEFT JOIN user_account u ON u.user_id= s.user_id
LEFT JOIN admin_account a ON a.user_id= s.user_id
GROUP BY login_date
However, I was told it would be even quicker using crosstabs, especially considering the successful_logins table contains millions of records. So I'm trying to also create a version of the query using crosstabs then comparing both execution times.
Any help would be greatly appreciated. Thanks!
Turns out it isn't possible to do what I was asking about using crosstabs, so the original query I have will have to do.

How to retrieve the most recent Active Record objects per column combination?

I have a model called Event which belongs to Area and Task. I'm attempting to retrieve a collection of events that only contains the most recent event per area and task combination. That is, I only want the most recent event of the events that have the same area_id and task_id. Example collection of events:
|event_id|area_id|task_id| ... |
|--------|-------|-------|-----|
| 5 | 3 | 2 | ... |
| 4 | 3 | 1 | ... |
| 3 | 3 | 2 | ... |
Here I want only event 5 and 4 to be returned since 3 is older.
I've tried using Event.select(:area_id,:task_id).distinct which seems to work, but strips all other attributes of the returned events, including :id. Grateful for any help or suggestions!
You can use raw SQL inside select, so you could try something like this:
Event.select("DISTINCT(CONCAT(area_id, task_id)), id, attr1, attr2")
Where id, attr1 and attr2 are the other attributes from your Event table.
Or you could use .group instead of .distinct and forget about using raw SQL:
Event.all.group(:area_id,:task_id)
You will get the same result as using DISTINCT and all attributes will be available.
UPDATE
To order before grouping, you can use find_by_sql with nested queries (again, raw SQL):
Event.find_by_sql(
"SELECT * FROM (
SELECT * FROM `events`
ORDER BY `events`.`created_at`) AS t1
GROUP BY t1.`area_id`, t1.`task_id`";
)
In another words you need to group events by area_id and task_id, and select recent record in each group. There question about building sql-query for this case. PostgreSQL version for your table will be like this:
SELECT DISTINCT ON (area_id, task_id) events.*
FROM events
ORDER BY area_id ASC, task_id ASC, created_at DESC
And Rails code for this query:
Event.
select("DISTINCT ON (area_id, task_id) events.*").
order("area_id ASC, task_id ASC, created_at DESC")

How to return distinct rows in rails?

I have a model Program containing fields program_title, department_id and date. I have inserted two rows having same program title and date but different department_id.
Insert into programs(program_title,date,department_id) Values ("prog1","4/2/2017","1");
Insert into programs(program_title,date,department_id) Values ("prog1","4/2/2017","2");
Now I want to return rows which will be distinct by program_title whatever be the department_id. I have tried,
#event_contents=Program.select(:id,:date,:program_title).distinct(:program_title)
But still it returns both the rows. Any help is appreciated.
SQL can only collapse rows where all values are the same when using DISTINCT. Because you are selecting id, which is different for every record, the rows are not distinct. E.g.:
---------------------------------
| id | program_title | date |
---------------------------------
| 1 | prog1 | 4/2/2017 |
| 2 | prog1 | 4/2/2017 |
---------------------------------
You'll need to exclude the id from your #select for it to work:
Program.select(:date, :program_title).distinct
Try this
Program.all.distinct(:program_title).pluck( :id,:program_title,:date)
It will return data as array of elements though
Hope it helps

Rails 3: How to skip duplicate rows on joins with ORDER by multiple columns in Postgres

I have a simple HABTM relationship Books <-> Authors and I'd like to get books by its title and name of the authors:
Book.joins(:authors)
.where("books.title ILIKE 'roman%' OR authors.name ILIKE 'pushkin%'")
.order("books.id DESC").limit(20).group("books.id")
That works perfect.
BUT if i want to sort additionally by Author name i got duplicate row for books that have many authors:
Book.joins(:authors)
.where("books.title ILIKE 'roman%' OR authors.name ILIKE 'pushkin%'")
.order("books.id DESC, authors.id DESC").limit(20).group("books.id, authors.id")
I got something like:
id | title | ...
123 | Roman1 | // this book has only 1 author
55 | roman2 |
55 | roman2 | // this one hase 2 authors
177 | Roman5 | ...
etc.
How can I merge those rows by id in sql query (btw, Postgres 9.1)?
The problem is not the order part, but the group by part. If you include the author, it means you make difference between the book and the author too.
I recommend you to omitt the author.id, and order the list from code, and not is SQL.

Rails created_at timestamp order disagrees with id order

I have a Rails 2.3.5 app with a table containing id and created_at columns. The table records state changes to entities over time, so I occasionally use it to look up the state of an entity at a particular time, by looking for state changes that occurred before the time, and picking the latest one according to the created_at timestamp.
For 10 out of 1445 entities, the timestamps of the state changes are in a different order to the ids, and the state of the last state change differs from the state which is stored with the entity itself, e.g.
id | created_at | entity_id | state |
------+---------------------+-----------+-------+
1151 | 2009-01-26 10:27:02 | 219 | 1 |
1152 | 2009-01-26 10:27:11 | 219 | 2 |
1153 | 2009-01-26 10:27:17 | 219 | 4 |
1154 | 2009-01-26 10:26:41 | 219 | 5 |
I can probably get around this by ordering on id instead of timestamp, but can't think of an explanation as to how it could have happened. The app uses several mongrel instances, but they're all on the same machine (Debian Lenny); am I missing something obvious? DB is Postgres.
Because Rails is using database sequence to fetch the new id for your id field (at least in PostgreSQL) on insert or with the RETURNING keyword if the database supports it.
But it updates the created_at and updated_at fields on create with ActiveRecord::Timestamp#create_with_timestamps method which uses the system time.
The row 1154 was inserted later, but the timestamp for created_at field was calculated before.

Resources