Converting a SQL statement to rails command - ruby-on-rails

I have a situation where I need to fetch only few records from a particular active record query response.
#annotations = Annotations.select('*', ROW_NUMBER() OVER (PARTITION BY column_a) ORDER BY column_b)
Above is the query for which the #annotations is the Active Record Response on which I would like to apply the below logic. Is there a better way to write the below logic in rails way?
with some_table as
(
select *, row_number() over (partition by column_a order by column_b) rn
from the_table
)
select * from some_table
where (column_a = 'ABC' and rn <= 10) or (column_b <> 'AAA')

ActiveRecord does not provide CTEs in its high level API; however with a little Arel we can make this a sub query in the FROM clause
annotations_table = Annotation.arel_table
sub_query = annotations_table.project(
Arel.star,
Arel.sql('row_number() over (partition by column_a order by column_b)').as(:rn)
)
query = Arel::Nodes::As.new(sub_query, Arel.sql(annotations_table.name))
Annotation.from(query).where(
annotations_table[:column_a].eq('ABC').and(
annotations_table[:rn].lt(10)
).or(annotations_table[:column_b].not_eq('AAA'))
)
The result will be a collection of Annotation objects using your CTE and the filters you described.
SQL:
select annotations.*
from (
select *, row_number() over (partition by column_a order by column_b) AS rn
from annotations
) AS annotations
where (annotations.column_a = 'ABC' and annotations.rn <= 10) or (annotations.column_b <> 'AAA')
Notes:
With a little extra work we could make this a CTE but it does not seem needed in this case
We could use a bunch of hacks to transition this row_number() over (partition by column_a order by column_b) to Arel as well but it did not seem pertinent to the question.

Related

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

BigQuery Subqueries Efficient Join

I am trying to analyse firebase analytics data in BigQuery. I need to update a table in BigQuery using StandardSQL.
I have to update order_flag in table cart where key = 'item_id' by joining it to another table order.
Below is the query:
#standardSQL
UPDATE `dataset.cart` c
SET c.order_flag = true
WHERE (SELECT value.string_value
FROM UNNEST(c.event_dim.params)
WHERE key = 'item_id') IN
(SELECT
(SELECT value.string_value
FROM UNNEST(o.event_dim.params)
WHERE key = 'item_id')
FROM `dataset.order` o
WHERE (SELECT key FROM UNNEST(o.event_dim.params)
WHERE key = 'item_id') =
(SELECT value.string_value FROM UNNEST(c.event_dim.params)
WHERE key = 'item_id'))
But I am getting the error:
Error: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
How to do an efficient join in this scenario?
Your query looks a bit strange because it has IN clause with correlated subquery (subquery uses both a and c tables).
Which is kind of antipattern and usually indicates mistake in query. Because normally IN clause subquery is NOT correlated across tables.
EXISTS clause usually requires correlation in subquery, but not IN.
This would work most likely:
UPDATE
`dataset.cart` c
SET
c.order_flag=TRUE
WHERE
(
SELECT
value.string_value
FROM
UNNEST(c.event_dim.params)
WHERE
key = 'item_id') IN (
SELECT
(
SELECT
value.string_value
FROM
UNNEST(o.event_dim.params)
WHERE
key = 'item_id')
FROM
`dataset.order` o
)
If you decide to switch to EXISTS then I would recommend storing
(SELECT
value.string_value
FROM
UNNEST(o.event_dim.params)
WHERE
key = 'item_id')
into separate column to keep things simple and easy to optimize for query optimizer.

Solving a PG::GroupingError: ERROR

The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);

row_number() with an unspecified window `row_number() OVER ()`

I'm building a paginated scoreboard using postgres 9.1.
There are several criteria that users can sort the scoreboard by, and they can sort by ascending or descending. There is a feature to let users find "their row" across the multiple pages in the scoreboard, and it must reflect the users selected sorting criteria.
I am using postgres's row_number function to find their offset into the result set to return the page where the user can find their row.
Everything I'm reading about row_number seems to imply that bad things happen to people who don't specify an ordering within the row_number window. E.g. row_number() OVER (ORDER BY score_1) is OK, row_number() OVER () is bad.
My case is different from the examples I've read about in that I am explicitly ordering my query, I realize the DB engine may not return the results in any particular order if I don't.
But I'd like to just specify ordering at the level of the entire query and get the row_number of the results, without having to duplicate my ordering specification with the row_number's window.
So this is what I'd like to do, and it "seems to work".
SELECT
id,
row_number() OVER () AS player_position,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY (score_1 ASC | score_1 DESC | score_2 ASC | score_2 DESC | score_3 ASC | score_3 DESC)
Where player_position reflects the players rank in whatever criteria I'm ordering by.
But the documentation I've read tells me I should do it like this:
SELECT
id,
row_number() OVER (ORDER BY score_1 ASC) AS player_position,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY score_1 ASC
or
SELECT
id,
row_number() OVER (ORDER BY score_2 DESC) AS player_position,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY score_2 DESC
The real reason that I'd like to avoid redundantly specifying the ordering for the row_number window is to keep my query amenable with the ActiveRecord ORM. I want to have my base scoreboard query, and chain on the ordering.
e.g. Ultimately, I want to be able to do this:
Players.scoreboard.order('score_1 ASC')
Players.scoreboard.order('score_2 DESC')
etc...
Is it possible?
Try moving your main query into a subquery with an ORDER BY and apply the ROW_NUMBER() to the outermost query.
SELECT y.*,
ROW_NUMBER() OVER () as player_position
FROM
(SELECT
id,
score_1,
score_2,
score_3,
FROM my_table
ORDER BY <whatever>) as y

Postgres Rank As Column

I have the following query:
SELECT name, rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position FROM items
And I'd now like to do a where clause on the rank() function:
SELECT name, rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position FROM items WHERE position = 1
That is, I want to query the most loved item for each user. However, this results in:
PGError: ERROR: column "position" does not exist
Also, I'm using Rails AREL to do this and would like to enable chaining. This is the Ruby code that creates the query:
Item.select("name, rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position").where("position = 1")
Any ideas?
You need to "wrap" it into a derived table:
SELECT *
FROM (
SELECT name,
rank() OVER (PARTITION BY user_id ORDER BY love_count DESC) AS position
FROM items
) t
WHERE position = 1
My first thought was, "Use a common table expression", like this untested one.
WITH badly_named_cte AS (
SELECT name,
rank() OVER (PARTITION BY user_id
ORDER BY love_count DESC) AS position
FROM items
)
SELECT * FROM badly_named_cte WHERE position = 1;
The problem you're seeing has to do with the logical order of evaluation required by SQL standards. SQL has to act as if column aliases (arguments to the AS operator) don't exist until after the WHERE clause is evaluated.

Resources