Get top results by SUM column in Rails - ruby-on-rails

I am trying to write following SQL in rails (via ActiveRecord) and having no luck. SQL is following end as such works:
select main_section_id, district_id, sum(answer)
from section_inputs
where year = 2012
and main_section_id= 2
group by main_section_id, district_id
order by 3 desc
limit 5
I think that column names are descriptive, in any case following Rails conventions. To sum the problem up, I am trying to get top 5 Districts for specific MainSection, answer column here is integer which represents my score system.
I know question is little too specific (doing my job for me), but I really hit the wall here and if asking for solution is too much some guidance would be great help as well.
Thanks

This should work
SectionInput.select([:main_section_id, :district_id, 'sum(answer) as total']).where(:year=>2012).where(:main_section_id=>2).group(:main_section_id).group(:district_id).order('3 desc').limit(5)
Else, you can directly include the sql to run
SectionInput.find_all_by_sql('select main_section_id, district_id,
sum(answer) from section_inputs where year = 2012 and main_section_id=
2 group by main_section_id, district_id order by 3 desc limit 5')
Also, look at the guide to see all Rails 3 querying basics

Related

Transform query that works in SQLite into something that works in Postgres

Message.order("created_at DESC").where(user_id: current_user.id).group(:sender_id, :receiver_id).count
Works with my dev environment SQLite3 for a Rails 3.2 app but fails when pushed to Heroku using Postgres with this error:
PG::GroupingError: ERROR: column "messages.created_at" must appear in the GROUP BY clause or be used in an aggregate function
It seems it wants :created_at in the query. Unable to come up with anything. Any ideas?
Thanks
PostgreSQL can't figure out how to order by your created_at.
Suppose that you find 2 groups of (:sender_id, :receiver_id), for instance [1, 2] and
[1, 3].
Now suppose that in the first group you have 2 messages, one from 1 day ago and one from 1 minute ago. And let's say you have 1 message in the second group from 12 hours ago.
Then ORDER BY created_at DESC doesn't make any sense: do you take the message from 1 day ago as the created_at of the first group (hence the first group appears after the second one), or the one from 1 minute ago (in which case the first group now appears first)?
That's why PostgreSQL says that you need to either have created_at in the GROUP BY (in which case you now have 3 different group, as the first one is now split in two), or you need to use an aggregate function to transform multiple values of created_at into a single one.
This will run (I don't know what you expect the results to be, you might not want to use MAX(created_at) ! You can find a list of PostgreSQL's aggregate functions here) :
Message.order("MAX(created_at) DESC")
.where(user_id: current_user.id)
.group(:sender_id, :receiver_id)
.count

Postgres Common Table Expression query with Ruby on Rails

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord doesn't support CTEs.
I have a table called user_activity_transitions which contains a series of records of a user activity being started and stopped (each row refers to a change of state: e.g started or stopped).
One user_activity_id might have a lot of couples started-stopped, which are in 2 different rows.
It's also possible that there is only "started" if the activity is currently going on and hasn't been stopped. The sort_key starts at 0 with the first ever state and increments by 10 for each state change.
id to_state sort_key user_activity_id created_at
1 started 0 18 2014-11-15 16:56:00
2 stopped 10 18 2014-11-15 16:57:00
3 started 20 18 2014-11-15 16:58:00
4 stopped 30 18 2014-11-15 16:59:00
5 started 40 18 2014-11-15 17:00:00
What I want is the following output, grouping couples of started-stopped together to be able to calculate duration etc.
user_activity_id started_created_at stopped_created_at
18 2014-11-15 16:56:00 2014-11-15 16:57:00
18 2014-11-15 16:58:00 2014-11-15 16:59:00
18 2014-11-15 17:00:00 null
The way the table is implemented makes it much harder to run that query but much more flexible for future changes (e.g new intermediary states), so that's not going to be revised.
My Postgres query (and the associated code in Rails):
query = <<-SQL
with started as (
select
id,
sort_key,
user_activity_id,
created_at as started_created_at
from
user_activity_transitions
where
sort_key % 4 = 0
), stopped as (
select
id,
sort_key-10 as sort_key2,
user_activity_id,
created_at as stopped_created_at
from
user_activity_transitions
where
sort_key % 4 = 2
)
select
started.user_activity_id AS user_activity_id,
started.started_created_at AS started_created_at,
stopped.stopped_created_at AS stopped_created_at
FROM
started
left join stopped on stopped.sort_key2 = started.sort_key
and stopped.user_activity_id = started.user_activity_id
SQL
results = ActiveRecord::Base.connection.execute(query)
What it does is "trick" SQL into joining 2 consecutive rows based on a modulus check on the sort key.
The query works fine. But using this raw AR call annoys me, especially since what connection.execute returns is quite messy. I basically need to loop through the results and put it in the right hash.
2 questions:
Is there a way to get rid of the CTE and run the same query using
Rails magic?
If not, is there a better way to get the results I want in a nice-looking hash?
Bear in mind that I'm quite new to Rails and not a query expert so there might be an obvious improvement...
Thanks a lot!
While Rails does not directly support CTEs, you can emulate a single CTE and still take advantage of ActiveRecord. Instead of a CTE, use a from subquery.
Thing
.from(
# Using a subquery in place of a single CTE
Thing
.select(
'*',
%{row_number() over(
partition by
this, that
order by
created_at desc
) as rank
}
)
:things
)
.where(rank: 1)
This is not exactly the same as, but equivalent to...
with ranked_things as (
select
*,
row_number() over(
partition by
this, that
order by
created_at desc
) as rank
)
select *
from ranked_things
where rank = 1
I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord does support CTEs.
As far as I know ActiveRecord doesn't support CTE. Arel, which is used by AR under the hood, supports them, but they're not exposed to AR's interface.
Is there a way to get rid of the CTE and run the same query using Rails magic?
Not really. You could write it in AR's APIs but you'd just write the same SQL split into a few method calls.
If not, is there a better way to get the results I want in a nice-looking hash?
I tried to run the query and I'm getting the following which seems nice enough to me. Are you getting a different result?
[
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:56:00", "stopped_created_at"=>"2014-11-15 16:57:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:58:00", "stopped_created_at"=>"2014-11-15 16:59:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 17:00:00", "stopped_created_at"=>nil}
]
I assume you have a model called UserActivityTransition you use for manipulating the data. You can use the model to get the results as well.
results = UserActivityTransition.find_by_sql(query)
results.size # => 3
results.first.started_created_at # => 2014-11-15 16:56:00 UTC
Note that these "virtual" attributes will not be visible when inspecting the result but they're there.

Rails Postgres Error GROUP BY clause or be used in an aggregate function

In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size
I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)
See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?

PG::Error in GROUP BY clause

I couldn't think of a better way to refactor the below code (see this question), though I know it's very ugly. However, it's throwing a Postgres error (not with SQLite):
ActiveRecord::StatementInvalid:
PG::Error: ERROR:
column "articles.id" must appear in the GROUP BY clause or be used in an aggregate function
The query itself is:
SELECT "articles".*
FROM "articles"
WHERE "articles"."user_id" = 1
GROUP BY publication
Which comes from the following view code:
=#user.articles.group(:publication).map do |p|
=p.publication
=#user.articles.where("publication = ?", p.publication).sum(:twitter_count)
=#user.articles.where("publication = ?", p.publication).sum(:facebook_count)
=#user.articles.where("publication = ?", p.publication).sum(:linkedin_count)
In SQLite, this gives the output (e.g.) NYT 12 18 14 BBC 45 46 47 CNN 75 54 78, which is pretty much what I need.
How can I improve the code to remove this error?
When using GROUP BY you cannot SELECT fields that are not either part of the GROUP BY or used in an aggregate function. This is specified by the SQL standard, though some databases choose to execute such queries anyway. Since there's no single correct way to execute such a query they tend to just pick the first row they find and return that, so results will vary unpredictably.
It looks like you're trying to say:
"For each publication get me the sum of the twitter, facebook and linkedin counts for that publication".
If so, you could write:
SELECT publication,
sum(twitter_count) AS twitter_sum,
sum(linkedin_count) AS linkedin_sum,
sum(facebook_count) AS facebook_sum
FROM "articles"
WHERE "articles"."user_id" = 1
GROUP BY publication;
Translating that into ActiveRecord/Rails ... up to you, I don't use it. It looks like it's pretty much what you tried to write but ActiveRecord seems to be mangling it, perhaps trying to execute the sums locally.
Craig's answer explains the issue well. Active Record will select * by default, but you can override it easily:
#user.articles.select("publication, sum(twitter_count) as twitter_count").group(:publication).each do |row|
p row.publication # "BBC"
p row.twitter_count # 45
end

Count of a relation using arel in active record

I'm having a really rough time figuring out how to do this query and others like it in arel from active record.
select users.id,
users.name,
maps.count as map_count,
from users
left join (select user_id, count(map_id) as count from maps_users group by user_id) maps on users.id = maps.user_id
On the surface, it looks just like Nik's example here (http://magicscalingsprinkles.wordpress.com/2010/01/28/why-i-wrote-arel/):
photo_counts = photos.
group(photos[:user_id]).
project(photos[:user_id], photos[:id].count)
users.join(photo_counts).on(users[:id].eq(photo_counts[:user_id]))
But I can't get it to work in rails using active record. I think the equivalent should be something like this, but it errors out :(
maps = Map.arel_table
map_counts = Map.group(maps[:owner_id]).
select(maps[:owner_id]).
select(maps[:id].count.as("map_count"))
users = User.joins(map_counts).on(User.arel_table[:id].eq(map_counts[:map_count]))
Any ideas on how to do it?
Well first replace the select with project. In relational algebra SELECT (restriction) is the WHERE clause.
Secondly you can do subselections.
sub_restriction = b.
where( b[:domain].eq(1) ).
project( b[:domain] )
restriction = a.
where( a[:domain].in sub_restriction )
"sub selections" DONE! :-)
Yeah, that article really made me want to learn Arel magic, too.
All the "do something intelligent with Arel" questions on Stackoverflow are getting answered with SQL. From articles and research, then, I can say that Arel is Not ActiveRecord. Despite the dynamic formulation of queries, Active doesn't have the power to map the results of a fully formed Arel projection.
You get the ability to specify operators with
https://github.com/activerecord-hackery/squeel
but no subselects.
Updated: OMG, I answered this question 5 years ago. No kidding the link was dead :)

Resources