Postgres Common Table Expression query with Ruby on Rails - ruby-on-rails

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord doesn't support CTEs.
I have a table called user_activity_transitions which contains a series of records of a user activity being started and stopped (each row refers to a change of state: e.g started or stopped).
One user_activity_id might have a lot of couples started-stopped, which are in 2 different rows.
It's also possible that there is only "started" if the activity is currently going on and hasn't been stopped. The sort_key starts at 0 with the first ever state and increments by 10 for each state change.
id to_state sort_key user_activity_id created_at
1 started 0 18 2014-11-15 16:56:00
2 stopped 10 18 2014-11-15 16:57:00
3 started 20 18 2014-11-15 16:58:00
4 stopped 30 18 2014-11-15 16:59:00
5 started 40 18 2014-11-15 17:00:00
What I want is the following output, grouping couples of started-stopped together to be able to calculate duration etc.
user_activity_id started_created_at stopped_created_at
18 2014-11-15 16:56:00 2014-11-15 16:57:00
18 2014-11-15 16:58:00 2014-11-15 16:59:00
18 2014-11-15 17:00:00 null
The way the table is implemented makes it much harder to run that query but much more flexible for future changes (e.g new intermediary states), so that's not going to be revised.
My Postgres query (and the associated code in Rails):
query = <<-SQL
with started as (
select
id,
sort_key,
user_activity_id,
created_at as started_created_at
from
user_activity_transitions
where
sort_key % 4 = 0
), stopped as (
select
id,
sort_key-10 as sort_key2,
user_activity_id,
created_at as stopped_created_at
from
user_activity_transitions
where
sort_key % 4 = 2
)
select
started.user_activity_id AS user_activity_id,
started.started_created_at AS started_created_at,
stopped.stopped_created_at AS stopped_created_at
FROM
started
left join stopped on stopped.sort_key2 = started.sort_key
and stopped.user_activity_id = started.user_activity_id
SQL
results = ActiveRecord::Base.connection.execute(query)
What it does is "trick" SQL into joining 2 consecutive rows based on a modulus check on the sort key.
The query works fine. But using this raw AR call annoys me, especially since what connection.execute returns is quite messy. I basically need to loop through the results and put it in the right hash.
2 questions:
Is there a way to get rid of the CTE and run the same query using
Rails magic?
If not, is there a better way to get the results I want in a nice-looking hash?
Bear in mind that I'm quite new to Rails and not a query expert so there might be an obvious improvement...
Thanks a lot!

While Rails does not directly support CTEs, you can emulate a single CTE and still take advantage of ActiveRecord. Instead of a CTE, use a from subquery.
Thing
.from(
# Using a subquery in place of a single CTE
Thing
.select(
'*',
%{row_number() over(
partition by
this, that
order by
created_at desc
) as rank
}
)
:things
)
.where(rank: 1)
This is not exactly the same as, but equivalent to...
with ranked_things as (
select
*,
row_number() over(
partition by
this, that
order by
created_at desc
) as rank
)
select *
from ranked_things
where rank = 1

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord does support CTEs.
As far as I know ActiveRecord doesn't support CTE. Arel, which is used by AR under the hood, supports them, but they're not exposed to AR's interface.
Is there a way to get rid of the CTE and run the same query using Rails magic?
Not really. You could write it in AR's APIs but you'd just write the same SQL split into a few method calls.
If not, is there a better way to get the results I want in a nice-looking hash?
I tried to run the query and I'm getting the following which seems nice enough to me. Are you getting a different result?
[
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:56:00", "stopped_created_at"=>"2014-11-15 16:57:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:58:00", "stopped_created_at"=>"2014-11-15 16:59:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 17:00:00", "stopped_created_at"=>nil}
]
I assume you have a model called UserActivityTransition you use for manipulating the data. You can use the model to get the results as well.
results = UserActivityTransition.find_by_sql(query)
results.size # => 3
results.first.started_created_at # => 2014-11-15 16:56:00 UTC
Note that these "virtual" attributes will not be visible when inspecting the result but they're there.

Related

Transform query that works in SQLite into something that works in Postgres

Message.order("created_at DESC").where(user_id: current_user.id).group(:sender_id, :receiver_id).count
Works with my dev environment SQLite3 for a Rails 3.2 app but fails when pushed to Heroku using Postgres with this error:
PG::GroupingError: ERROR: column "messages.created_at" must appear in the GROUP BY clause or be used in an aggregate function
It seems it wants :created_at in the query. Unable to come up with anything. Any ideas?
Thanks
PostgreSQL can't figure out how to order by your created_at.
Suppose that you find 2 groups of (:sender_id, :receiver_id), for instance [1, 2] and
[1, 3].
Now suppose that in the first group you have 2 messages, one from 1 day ago and one from 1 minute ago. And let's say you have 1 message in the second group from 12 hours ago.
Then ORDER BY created_at DESC doesn't make any sense: do you take the message from 1 day ago as the created_at of the first group (hence the first group appears after the second one), or the one from 1 minute ago (in which case the first group now appears first)?
That's why PostgreSQL says that you need to either have created_at in the GROUP BY (in which case you now have 3 different group, as the first one is now split in two), or you need to use an aggregate function to transform multiple values of created_at into a single one.
This will run (I don't know what you expect the results to be, you might not want to use MAX(created_at) ! You can find a list of PostgreSQL's aggregate functions here) :
Message.order("MAX(created_at) DESC")
.where(user_id: current_user.id)
.group(:sender_id, :receiver_id)
.count

PG::Error in GROUP BY clause

I couldn't think of a better way to refactor the below code (see this question), though I know it's very ugly. However, it's throwing a Postgres error (not with SQLite):
ActiveRecord::StatementInvalid:
PG::Error: ERROR:
column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
The query itself is:
SELECT "articles".*
FROM "articles"
WHERE "articles"."user_id" = 1
GROUP BY publication
Which comes from the following view code:
=#user.articles.group(:publication).map do |p|
=p.publication
=#user.articles.where("publication = ?", p.publication).sum(:twitter_count)
=#user.articles.where("publication = ?", p.publication).sum(:facebook_count)
=#user.articles.where("publication = ?", p.publication).sum(:linkedin_count)
In SQLite, this gives the output (e.g.) NYT 12 18 14 BBC 45 46 47 CNN 75 54 78, which is pretty much what I need.
How can I improve the code to remove this error?
When using GROUP BY you cannot SELECT fields that are not either part of the GROUP BY or used in an aggregate function. This is specified by the SQL standard, though some databases choose to execute such queries anyway. Since there's no single correct way to execute such a query they tend to just pick the first row they find and return that, so results will vary unpredictably.
It looks like you're trying to say:
"For each publication get me the sum of the twitter, facebook and linkedin counts for that publication".
If so, you could write:
SELECT publication,
sum(twitter_count) AS twitter_sum,
sum(linkedin_count) AS linkedin_sum,
sum(facebook_count) AS facebook_sum
FROM "articles"
WHERE "articles"."user_id" = 1
GROUP BY publication;
Translating that into ActiveRecord/Rails ... up to you, I don't use it. It looks like it's pretty much what you tried to write but ActiveRecord seems to be mangling it, perhaps trying to execute the sums locally.
Craig's answer explains the issue well. Active Record will select * by default, but you can override it easily:
#user.articles.select("publication, sum(twitter_count) as twitter_count").group(:publication).each do |row|
p row.publication # "BBC"
p row.twitter_count # 45
end

executing query from ruby on rails the right way

I'm just beginning with ruby on rails and have a question regarding a bit more complex query. So far I've done simple queries while looking at rails guide and it worked really well.
Right now I'm trying to get some Ids from database and I would use those Ids to get the real objects and do something with them. Getting those is a bit more complex than simple Object.find method.
Here is how my query looks like :
select * from quotas q, requests r
where q.id=r.quota_id
and q.status=3
and r.text is not null
and q.id in
(
select A.id from (
select max(id) as id, name
from quotas
group by name) A
)
order by q.created_at desc
limit 1000;
This would give me 1000 ids when executing this query from sql manager. And I was thinking to obtain the list of ids first and then find objects by id.
Is there a way to get these objects directly by using this query? Avoiding ids lookup? I googled that you can execute query like this :
ActiveRecord::Base.connection.execute(query);
Assuming Quota has_many :requests,
Quota.includes(:requests).
where(status:3).
where('requests.text is not null').
where("quotas.id in (#{subquery_string_here})").
order('quotas.created_at desc').limit(1000)
I'm by no means an expert but most basic SQL functionality is baked into ActiveRecord. You might also want to look at the #group and #pluck methods for ways to eliminate the ugly string subquery.
Calling #to_sql on a relationship object will show you the SQL command it is equivalent to, and may help with your debugging.
I would use find_by_sql for this. I wouldn't swear that this is exactly right, but as I recall you can pretty much plonk an SQL statement into a find_by_sql and the resulting columns will be returned as attributes of an array of objects of the class you call it on:
status = 3
Quota.find_by_sql('
select *
from quotas q, requests r
where q.id=r.quota_id
and q.status= ?
and r.text is not null
and q.id in
(
select A.id from (
select max(id) as id, name
from quotas
group by name) A
)
order by q.created_at desc
limit 1000;', status)
If you come to Rails as someone used to writing raw SQL, you're probably better off using this syntax than stringing together a bunch of ActiveRecord methods - the result is the same, so it's just a matter of what you find more readable.
Btw, you shouldn't use string interpolation (i.e. #{variable} syntax) inside an SQL query. Use the '?' syntax instead (see my example) to avoid SQL injection potential.

Get top results by SUM column in Rails

I am trying to write following SQL in rails (via ActiveRecord) and having no luck. SQL is following end as such works:
select main_section_id, district_id, sum(answer)
from section_inputs
where year = 2012
and main_section_id= 2
group by main_section_id, district_id
order by 3 desc
limit 5
I think that column names are descriptive, in any case following Rails conventions. To sum the problem up, I am trying to get top 5 Districts for specific MainSection, answer column here is integer which represents my score system.
I know question is little too specific (doing my job for me), but I really hit the wall here and if asking for solution is too much some guidance would be great help as well.
Thanks
This should work
SectionInput.select([:main_section_id, :district_id, 'sum(answer) as total']).where(:year=>2012).where(:main_section_id=>2).group(:main_section_id).group(:district_id).order('3 desc').limit(5)
Else, you can directly include the sql to run
SectionInput.find_all_by_sql('select main_section_id, district_id,
sum(answer) from section_inputs where year = 2012 and main_section_id=
2 group by main_section_id, district_id order by 3 desc limit 5')
Also, look at the guide to see all Rails 3 querying basics

Rails: Order with nulls last

In my Rails app I've run into an issue a couple times that I'd like to know how other people solve:
I have certain records where a value is optional, so some records have a value and some are null for that column.
If I order by that column on some databases the nulls sort first and on some databases the nulls sort last.
For instance, I have Photos which may or may not belong to a Collection, ie there are some Photos where collection_id=nil and some where collection_id=1 etc.
If I do Photo.order('collection_id desc) then on SQLite I get the nulls last but on PostgreSQL I get the nulls first.
Is there a nice, standard Rails way to handle this and get consistent performance across any database?
I'm no expert at SQL, but why not just sort by if something is null first then sort by how you wanted to sort it.
Photo.order('collection_id IS NULL, collection_id DESC') # Null's last
Photo.order('collection_id IS NOT NULL, collection_id DESC') # Null's first
If you are only using PostgreSQL, you can also do this
Photo.order('collection_id DESC NULLS LAST') #Null's Last
Photo.order('collection_id DESC NULLS FIRST') #Null's First
If you want something universal (like you're using the same query across several databases, you can use (courtesy of #philT)
Photo.order('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id')
Even though it's 2017 now, there is still yet to be a consensus on whether NULLs should take precedence. Without you being explicit about it, your results are going to vary depending on the DBMS.
The standard doesn't specify how NULLs should be ordered in comparison with non-NULL values, except that any two NULLs are to be considered equally ordered, and that NULLs should sort either above or below all non-NULL values.
source, comparison of most DBMSs
To illustrate the problem, I compiled a list of a few most popular cases when it comes to Rails development:
PostgreSQL
NULLs have the highest value.
By default, null values sort as if larger than any non-null value.
source: PostgreSQL documentation
MySQL
NULLs have the lowest value.
When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.
source: MySQL documentation
SQLite
NULLs have the lowest value.
A row with a NULL value is higher than rows with regular values in ascending order, and it is reversed for descending order.
source
Solution
Unfortunately, Rails itself doesn't provide a solution for it yet.
PostgreSQL specific
For PostgreSQL you could quite intuitively use:
Photo.order('collection_id DESC NULLS LAST') # NULLs come last
MySQL specific
For MySQL, you could put the minus sign upfront, yet this feature seems to be undocumented. Appears to work not only with numerical values, but with dates as well.
Photo.order('-collection_id DESC') # NULLs come last
PostgreSQL and MySQL specific
To cover both of them, this appears to work:
Photo.order('collection_id IS NULL, collection_id DESC') # NULLs come last
Still, this one does not work in SQLite.
Universal solution
To provide cross-support for all DBMSs you'd have to write a query using CASE, already suggested by #PhilIT:
Photo.order('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id')
which translates to first sorting each of the records first by CASE results (by default ascending order, which means NULL values will be the last ones), second by calculation_id.
Photo.order('collection_id DESC NULLS LAST')
I know this is an old one but I just found this snippet and it works for me.
Put minus sign in front of column_name and reverse the order direction. It works on mysql. More details
Product.order('something_date ASC') # NULLS came first
Product.order('-something_date DESC') # NULLS came last
Bit late to the show but there is a generic SQL way to do it. As usual, CASE to the rescue.
Photo.order('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id')
The easiest way is to use:
.order('name nulls first')
For posterity's sake, I wanted to highlight an ActiveRecord error relating to NULLS FIRST.
If you try to call:
Model.scope_with_nulls_first.last
Rails will attempt to call reverse_order.first, and reverse_order is not compatible with NULLS LAST, as it tries to generate the invalid SQL:
PG::SyntaxError: ERROR: syntax error at or near "DESC"
LINE 1: ...dents" ORDER BY table_column DESC NULLS LAST DESC LIMIT...
This was referenced a few years ago in some still-open Rails issues (one, two, three). I was able to work around it by doing the following:
scope :nulls_first, -> { order("table_column IS NOT NULL") }
scope :meaningfully_ordered, -> { nulls_first.order("table_column ASC") }
It appears that by chaining the two orders together, valid SQL gets generated:
Model Load (12.0ms) SELECT "models".* FROM "models" ORDER BY table_column IS NULL DESC, table_column ASC LIMIT 1
The only downside is that this chaining has to be done for each scope.
Rails 6.1 adds nulls_first and nulls_last methods to Arel for PostgreSQL.
Example:
User.order(User.arel_table[:login_count].desc.nulls_last)
Source: https://www.bigbinary.com/blog/rails-6-1-adds-nulls-first-and-nulls-last-to-arel
Here are some Rails 6 solutions.
The answer by #Adam Sibik is a great summary about the difference between various database systems.
Unfortunately, though, some of the presented solutions, including "Universal solution" and "PostgreSQL and MySQL specific", would not work any more with Rails 6 (ActiveRecord 6) as a result of its changed specification of order() not accepting some raw SQLs (I confirm the "PostgreSQL specific" solution still works as of Rails 6.1.4). For the background of this change, see, for example,
"Updates for SQL Injection in Rails 6.1" by Justin.
To circumvent the problem, you can wrap around the SQL statements with Arel.sql as follows, where NULLs come last, providing you are 100% sure the SQL statements you give are safe.
Photo.order(Arel.sql('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id'))
Just for reference, if you want to sort by a Boolean column (is_ok, as an example) in the order of [TRUE, FALSE, NULL] regardless of the database systems, either of these should work:
Photo.order(Arel.sql('CASE WHEN is_ok IS NULL THEN 1 ELSE 0 END, is_ok DESC'))
Photo.order(Arel.sql('CASE WHEN is_ok IS NULL THEN 1 WHEN is_ok IS TRUE THEN -1 ELSE 0 END'))
(n.b., SQLite does not have the Boolean type and so the former may be safer arguably, though it should not matter because Rails should guarantee the value is either 0 or 1 (or NULL).)
In my case I needed sort lines by start and end date by ASC, but in few cases end_date was null and that lines should be in above, I used
#invoice.invoice_lines.order('start_date ASC, end_date ASC NULLS FIRST')
Adding arrays together will preserve order:
#nonull = Photo.where("collection_id is not null").order("collection_id desc")
#yesnull = Photo.where("collection_id is null")
#wanted = #nonull+#yesnull
http://www.ruby-doc.org/core/classes/Array.html#M000271
It seems like you'd have to do it in Ruby if you want consistent results across database types, as the database itself interprets whether or not the NULLS go at the front or end of the list.
Photo.all.sort {|a, b| a.collection_id.to_i <=> b.collection_id.to_i}
But that is not very efficient.

Resources