Grouping a timestamp field by date in Ruby On Rails / PostgreSQL - ruby-on-rails

I am trying to convert the following bit of code to work with PostgreSQL.
After doing some digging around I realized that PostgreSQL is much stricter (in a good way) with the GROUP BY than MySQL but for the life of me I cannot figure out how to rewrite this statement to satisfy Postgres.
def show
show! do
#recent_tasks = resource.jobs.group(:task).order(:created_at).limit(5)
end
end
PG::Error: ERROR: column "jobs.created_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...ND "jobs"."project_id" = 1 GROUP BY task ORDER BY created_at...
^
: SELECT COUNT(*) AS count_all, task AS task FROM "jobs" WHERE "jobs"."deleted_at" IS NULL AND "jobs"."project_id" = 1 GROUP BY task ORDER BY created_at DESC, created_at LIMIT 5

You cannot use column in order which is not in group by.
You can do something like
#recent_tasks = resource.jobs.group(:task, :created_at).order(:created_at).limit(5)
but it will change result
You can also
#recent_tasks = resource.jobs.group(:task).order(:task).limit(5)
or
#recent_tasks = resource.jobs.group(:task).order('count(*) desc').limit(5)

Related

Can you do a group by with find_each in rails?

I am trying to write a function that groups by some columns in a very large table (millions of rows). Is there any way to get find_each to work with this, or is it impossible given that I do not want to order by the id column?
The SQL of my query is:
SELECT derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
The rails find_each automatically adds the ORDER BY clause using a reorder statement. I have tried changing the SQL to:
SELECT MAX(id) AS "mytable"."id", derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
but that doesn't work either. Any ideas other than writing my own find_each function or overriding the private batch_order function in batches.rb?
There are at least two approaches to solve this problem:
I. Use subquery:
# query the table and select id, derivable_type and derivable_id
my_table_ids = MyTable
.group("derivable_type, derivable_id")
.select("MAX(id) AS my_table_id, derivable_type, derivable_id")
# use subquery to allow rails to use ORDER BY in find_each
MyTable
.where(id: my_table_ids.select('my_table_id'))
.find_each { |row| do_something(row) }
II. Write custom find_each function
rows = MyTable
.group("derivable_type, derivable_id")
.select("derivable_type, derivable_id")
find_each_grouped(rows, ['derivable_type', 'derivable_id']) do |row|
do_something(row)
end
def find_each_grouped(rows, columns, &block)
offset = 0
batch_size = 1_000
loop do
batch = rows
.order(columns)
.offset(offset)
.limit(limit)
batch.each(&block)
break if batch.size < limit
offset += limit
end
end
I'm not sure I'm 100% clear on what you're trying to do, but your query looks the same as doing an aggregate distinct()
SELECT derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
---- vv
SELECT DISTINCT(derivable_type, derivable_id) FROM "mytable" ORDER BY "mytable"."id" ASC;
You should be able to use Active Record to accomplish this, combined with find_each (if Mytable is your model):
Mytable.all.group(:derivable_type, :derivable_id).distinct.find_each
# gives => #<Enumerator: #<ActiveRecord::Relation [...]>:find_each({:start=>nil, :finish=>nil, :batch_size=>1000, :error_on_ignore=>nil})>

Custom scope in ActiveRecord - Reverse sorting is produced with invalid syntax

Rails version 4.1.6, Postgres version not important.
I use a custom sorting, where strings come before integers and then integers get sorted as numbers:
sample sorting:
A0101
BD330
BE124
1
2
3
10
Since there is no direct way to achieve this with the query interface, I've found this postgres specific syntax which, in general, works fine:
default_scope {
order("substring(entries.code, '[^0-9_].*$') ASC").
order("(substring(entries.code, '^[0-9]+'))::int ASC")
}
For example, to get the first record:
2.0.0p247 :001 > Entry.first
Entry Load (3.6ms) SELECT "entries".* FROM "entries" ORDER BY substring(entries.code, '[^0-9_].*$') ASC, (substring(entries.code, '^[0-9]+'))::int ASC LIMIT 1
=> #<Entry id: ...............>
However, when I want to do a reverse search, I get some DESC words raining all over the query string... This is quite annoying since I haven't found a way yet to dispose off them:
2.0.0p247 :002 > Entry.last
Entry Load (0.8ms) SELECT "entries".* FROM "entries" ORDER BY substring(entries.code DESC, '[^0-9_].*$') DESC, (substring(entries.code DESC, '^[0-9]+'))::int DESC LIMIT 1
PG::Error: ERROR: syntax error at or near "DESC"
LINE 1: ... FROM "entries" ORDER BY substring(entries.code DESC, '[^0...
^
: SELECT "entries".* FROM "entries" ORDER BY substring(entries.code DESC, '[^0-9_].*$') DESC, (substring(entries.code DESC, '^[0-9]+'))::int DESC LIMIT 1
ActiveRecord::StatementInvalid: PG::Error: ERROR: syntax error at or near "DESC"
LINE 1: ... FROM "entries" ORDER BY substring(entries.code DESC, '[^0...
To be more specific, which I believe is not necessary, I would like to get rid of those DESC within the substring() methods...
EDIT:
I see in definition of reverse_sql_order, that the string is split at the commas , and ASC or DESC is applied there...
Using extensive database-oriented functions in a Rails project is never a good idea. Those kind of composite statements can drive you insanely crazy.
order("substring(entries.code, '[^0-9_].*$') ASC").
order("(substring(entries.code, '^[0-9]+'))::int ASC")
IMHO, the simplest and more effective solution is an helper column. Define, for instance, a table column called weight with type integer.
Define a model callback that, every time you save an object, stores in the column 0 if the value of the sorting field is a string, the digit if the value is a number. Here's your sort index.
Run the sort queries against that weight column. You can even index the attribute, and your queries will be much cleaner and faster. You will also be able to sort by DESC or ASC with no complexity at all.

Why does rails add "order_by id" to all queries? Which ends up breaking Postgres

The following rails code:
class User < MyModel
def top?
data = self.answers.select("sum(up_votes) total_up_votes").first
return (data.total_up_votes.present? && data.total_up_votes >= 10)
end
end
Generates the following query (note the order_by added by Rails):
SELECT
sum(up_votes) total_up_votes
FROM
"answers"
WHERE
"answers"."user_id" = 100
ORDER BY
"answers"."id" ASC
This throws an error in Postgres:
PG::GroupingError: ERROR: column "answers.id" must appear in the GROUP BY clause or be used in an aggregate function
Is rails' database abstraction only made with MySQL in mind?
No, the 'order by id' is added to ensure .first always returns the same result. Without an ORDER BY clause, the result is not guaranteed to be the same under the SQL spec.
For your case, you should use .sum() instead of .select() to do this more simply:
def top?
self.answers.sum(:up_votes) >= 10
end
You used #first method at the end. That's the reason.
self.answers.select("sum(up_votes) total_up_votes").first # <~~~
Model.first finds the first record ordered by the primary key
Look at the clause
ORDER BY
"answers"."id" ASC # this is the primary key of your table ansers.
Check the documentation of 1.1.3 first or #first .

"Order by" result of "group by" count?

This query
Message.where("message_type = ?", "incoming").group("sender_number").count
will return me an hash.
OrderedHash {"1234"=>21, "2345"=>11, "3456"=>63, "4568"=>100}
Now I want to order by count of each group. How can I do that within the query.
The easiest way to do this is to just add an order clause to the original query. If you give the count method a specific field, it will generate an output column with the name count_{column}, which can be used in the sql generated by adding an order call:
Message.where('message_type = ?','incoming')
.group('sender_number')
.order('count_id asc').count('id')
When I tried this, rails gave me this error
SQLite3::SQLException: no such column: count_id: SELECT COUNT(*) AS count_all, state AS state FROM "ideas" GROUP BY state ORDER BY count_id desc LIMIT 3
Notice that it says SELECT ... AS count_all
So I updated the query from #Simon's answer to look like this and it works for me
.order('count_all desc')

Rails 3.1 with PostgreSQL: GROUP BY must be used in an aggregate function

I am trying to load the latest 10 Arts grouped by the user_id and ordered by created_at. This works fine with SqlLite and MySQL, but gives an error on my new PostgreSQL database.
Art.all(:order => "created_at desc", :limit => 10, :group => "user_id")
ActiveRecord error:
Art Load (18.4ms) SELECT "arts".* FROM "arts" GROUP BY user_id ORDER BY created_at desc LIMIT 10
ActiveRecord::StatementInvalid: PGError: ERROR: column "arts.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "arts".* FROM "arts" GROUP BY user_id ORDER BY crea...
Any ideas?
The sql generated by the expression is not a valid query, you are grouping by user_id and selecting lot of other fields based on that but not telling the DB how it should aggregate the other fileds. For example, if your data looks like this:
a | b
---|---
1 | 1
1 | 2
2 | 3
Now when you ask db to group by a and also return b, it doesn't know how to aggregate values 1,2. You need to tell if it needs to select min, max, average, sum or something else. Just as I was writing the answer there have been two answers which might explain all this better.
In your use case though, I think you don't want a group by on db level. As there are only 10 arts, you can group them in your application. Don't use this method with thousands of arts though:
arts = Art.all(:order => "created_at desc", :limit => 10)
grouped_arts = arts.group_by {|art| art.user_id}
# now you have a hash with following structure in grouped_arts
# {
# user_id1 => [art1, art4],
# user_id2 => [art3],
# user_id3 => [art5],
# ....
# }
EDIT: Select latest_arts, but only one art per user
Just to give you the idea of sql(have not tested it as I don't have RDBMS installed on my system)
SELECT arts.* FROM arts
WHERE (arts.user_id, arts.created_at) IN
(SELECT user_id, MAX(created_at) FROM arts
GROUP BY user_id
ORDER BY MAX(created_at) DESC
LIMIT 10)
ORDER BY created_at DESC
LIMIT 10
This solution is based on the practical assumption, that no two arts for same user can have same highest created_at, but it may well be wrong if you are importing or programitically creating bulk of arts. If assumption doesn't hold true, the sql might get more contrieved.
EDIT: Attempt to change the query to Arel:
Art.where("(arts.user_id, arts.created_at) IN
(SELECT user_id, MAX(created_at) FROM arts
GROUP BY user_id
ORDER BY MAX(created_at) DESC
LIMIT 10)").
order("created_at DESC").
page(params[:page]).
per(params[:per])
You need to select the specific columns you need
Art.select(:user_id).group(:user_id).limit(10)
It will raise error when you try to select title in the query, for example
Art.select(:user_id, :title).group(:user_id).limit(10)
column "arts.title" must appear in the GROUP BY clause or be used in an aggregate function
That is because when you try to group by user_id, the query has no idea how to handle the title in the group, because the group contains several titles.
so the exception already mention you need to appear in group by
Art.select(:user_id, :title).group(:user_id, :title).limit(10)
or be used in an aggregate function
Art.select("user_id, array_agg(title) as titles").group(:user_id).limit(10)
Take a look at this post SQLite to Postgres (Heroku) GROUP BY
PostGres is actually following the SQL standard here whilst sqlite and mysql break from the standard.
Have at look at this question - Converting MySQL select to PostgreSQL. Postgres won't allow a column to be listed in the select statement that isn't in the group by clause.

Resources