I have an app that includes music charts to showcase the top tracks (it shows the top 10).
However, I'm trying to limit the charts so that any particular user cannot have more than one track on the top charts at the same time.
If you need any more info, just let me know.
You can use the row_number() function which gives a running number that resets when the user id changes. Then you can use that in a WHERE clause to create a per-user-limit:
SELECT * FROM (
SELECT COALESCE(sum(plays.clicks), 0),
row_number() OVER (PARTITION BY users.id ORDER BY COALESCE(sum(plays.clicks), 0) DESC),
users.id AS user_id,
tracks.*
FROM tracks
JOIN plays
ON tracks.id = plays.track_id
AND plays.created_at > now() - interval '14 days'
INNER JOIN albums
ON tracks.album_id = albums.id
INNER JOIN users
ON albums.user_id = users.id
GROUP BY users.id, tracks.id
ORDER BY 1 desc) sq1
WHERE row_number <= 2 LIMIT 10;
Related
I am playing around with Clickhouse DB and I am trying to figure out why the query below is giving me a DB::Exception: Memory limit (for query) exceeded and could use some help...
SELECT * FROM
(
SELECT created_at, rates.car_id, MIN(rates.price) FROM rates
WHERE
pickup_location_id = 198
AND created_at = '2020-10-01'
GROUP BY created_at, car_id
) r
JOIN cars c2 ON r.car_id = c2.id
The inner query bit performs almost instantly (millions of records) and yields only 212 results. However, adding the JOIN causes the query to fail (memory exception, 45GB)
Looks like the JOIN happens on the whole of rates/cars - and not on the "result"?
CH uses HASHJOIN and places the right table into memory into a HashTable.
In case of inner join you can swap tables:
SELECT * FROM cars c2 JOIN
(
SELECT created_at, rates.car_id, MIN(rates.price) FROM rates
WHERE
pickup_location_id = 198
AND created_at = '2020-10-01'
GROUP BY created_at, car_id
) r
ON r.car_id = c2.id
I have two queries that return the same data.
Query1, which is normal join takes a long time to execute:
SELECT TOP 1000 bigtable.*, tbl1.name, tb2.name FROM
bigtable INNER JOIN tbl1 on bigtable.id1 = tbl1.id1 AND
INNER JOIN tbl2 on tbl1.id1 = tbl2.id1
order by bigtable.id desc
Query2 that uses a sub-query returns fairly quickly:
SELECT subtable.*, tbl1.name, tb2.name FROM
(SELECT TOP 1000 FROM bigtable) subtable
INNER JOIN tbl1 on subtable.id1 = tbl1.id1 AND
INNER JOIN tbl2 on tbl1.id1 = tbl2.id1
order by subtable.id desc
bigtable contains 100k rows or so. tbl1 is a very small table (less than 10 rows). I would rather not use subqueries. If I skip the order by clause, both queries run quickly. I have tried adding indexes to the fields being joined, adding a DESC index on id etc. but nothing seems to help.
Any help is appreciated!
===> Update:
This turned out to be an non-issue. After creating another table similar to tbl1 with the same rows, I found that the Query1 ran under a second (with the copied table). Rebuilt stats on tbl1 and it fixed it.
I think that the two queries are not equivalent - try to write the second one as
SELECT subtable.*, tbl1.name, tb2.name FROM
(SELECT TOP 1000 FROM bigtable order by bigtable.id desc) subtable
INNER JOIN tbl1 on subtable.id1 = tbl1.id1 AND
INNER JOIN tbl2 on tbl1.id1 = tbl2.id1
order by subtable.id desc
I expect the expensive operation to be the ordering of the big table, which is now present in both versions.
Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;
I'm building a paginated scoreboard using postgres 9.1.
There are several criteria that users can sort the scoreboard by, and they can sort by ascending or descending. There is a feature to let users find "their row" across the multiple pages in the scoreboard, and it must reflect the users selected sorting criteria.
I am using postgres's row_number function to find their offset into the result set to return the page where the user can find their row.
Everything I'm reading about row_number seems to imply that bad things happen to people who don't specify an ordering within the row_number window. E.g. row_number() OVER (ORDER BY score_1) is OK, row_number() OVER () is bad.
My case is different from the examples I've read about in that I am explicitly ordering my query, I realize the DB engine may not return the results in any particular order if I don't.
But I'd like to just specify ordering at the level of the entire query and get the row_number of the results, without having to duplicate my ordering specification with the row_number's window.
So this is what I'd like to do, and it "seems to work".
SELECT
id,
row_number() OVER () AS player_position,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY (score_1 ASC | score_1 DESC | score_2 ASC | score_2 DESC | score_3 ASC | score_3 DESC)
Where player_position reflects the players rank in whatever criteria I'm ordering by.
But the documentation I've read tells me I should do it like this:
SELECT
id,
row_number() OVER (ORDER BY score_1 ASC) AS player_position,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY score_1 ASC
or
SELECT
id,
row_number() OVER (ORDER BY score_2 DESC) AS player_position,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY score_2 DESC
The real reason that I'd like to avoid redundantly specifying the ordering for the row_number window is to keep my query amenable with the ActiveRecord ORM. I want to have my base scoreboard query, and chain on the ordering.
e.g. Ultimately, I want to be able to do this:
Players.scoreboard.order('score_1 ASC')
Players.scoreboard.order('score_2 DESC')
etc...
Is it possible?
Try moving your main query into a subquery with an ORDER BY and apply the ROW_NUMBER() to the outermost query.
SELECT y.*,
ROW_NUMBER() OVER () as player_position
FROM
(SELECT
id,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY <whatever>) as y
I have the following query:
SELECT name, rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position FROM items
And I'd now like to do a where clause on the rank() function:
SELECT name, rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position FROM items WHERE position = 1
That is, I want to query the most loved item for each user. However, this results in:
PGError: ERROR: column "position" does not exist
Also, I'm using Rails AREL to do this and would like to enable chaining. This is the Ruby code that creates the query:
Item.select("name, rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position").where("position = 1")
Any ideas?
You need to "wrap" it into a derived table:
SELECT *
FROM (
SELECT name,
rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position
FROM items
) t
WHERE position = 1
My first thought was, "Use a common table expression", like this untested one.
WITH badly_named_cte AS (
SELECT name,
rank() OVER (PARTITION BY user_id
ORDER BY love_count DESC) AS position
FROM items
)
SELECT * FROM badly_named_cte WHERE position = 1;
The problem you're seeing has to do with the logical order of evaluation required by SQL standards. SQL has to act as if column aliases (arguments to the AS operator) don't exist until after the WHERE clause is evaluated.