Rails: Order with nulls last - ruby-on-rails

In my Rails app I've run into an issue a couple times that I'd like to know how other people solve:
I have certain records where a value is optional, so some records have a value and some are null for that column.
If I order by that column on some databases the nulls sort first and on some databases the nulls sort last.
For instance, I have Photos which may or may not belong to a Collection, ie there are some Photos where collection_id=nil and some where collection_id=1 etc.
If I do Photo.order('collection_id desc) then on SQLite I get the nulls last but on PostgreSQL I get the nulls first.
Is there a nice, standard Rails way to handle this and get consistent performance across any database?

I'm no expert at SQL, but why not just sort by if something is null first then sort by how you wanted to sort it.
Photo.order('collection_id IS NULL, collection_id DESC') # Null's last
Photo.order('collection_id IS NOT NULL, collection_id DESC') # Null's first
If you are only using PostgreSQL, you can also do this
Photo.order('collection_id DESC NULLS LAST') #Null's Last
Photo.order('collection_id DESC NULLS FIRST') #Null's First
If you want something universal (like you're using the same query across several databases, you can use (courtesy of #philT)
Photo.order('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id')

Even though it's 2017 now, there is still yet to be a consensus on whether NULLs should take precedence. Without you being explicit about it, your results are going to vary depending on the DBMS.
The standard doesn't specify how NULLs should be ordered in comparison with non-NULL values, except that any two NULLs are to be considered equally ordered, and that NULLs should sort either above or below all non-NULL values.
source, comparison of most DBMSs
To illustrate the problem, I compiled a list of a few most popular cases when it comes to Rails development:
PostgreSQL
NULLs have the highest value.
By default, null values sort as if larger than any non-null value.
source: PostgreSQL documentation
MySQL
NULLs have the lowest value.
When doing an ORDER BY, NULL values are presented first if you do ORDER BY ... ASC and last if you do ORDER BY ... DESC.
source: MySQL documentation
SQLite
NULLs have the lowest value.
A row with a NULL value is higher than rows with regular values in ascending order, and it is reversed for descending order.
source
Solution
Unfortunately, Rails itself doesn't provide a solution for it yet.
PostgreSQL specific
For PostgreSQL you could quite intuitively use:
Photo.order('collection_id DESC NULLS LAST') # NULLs come last
MySQL specific
For MySQL, you could put the minus sign upfront, yet this feature seems to be undocumented. Appears to work not only with numerical values, but with dates as well.
Photo.order('-collection_id DESC') # NULLs come last
PostgreSQL and MySQL specific
To cover both of them, this appears to work:
Photo.order('collection_id IS NULL, collection_id DESC') # NULLs come last
Still, this one does not work in SQLite.
Universal solution
To provide cross-support for all DBMSs you'd have to write a query using CASE, already suggested by #PhilIT:
Photo.order('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id')
which translates to first sorting each of the records first by CASE results (by default ascending order, which means NULL values will be the last ones), second by calculation_id.

Photo.order('collection_id DESC NULLS LAST')
I know this is an old one but I just found this snippet and it works for me.

Put minus sign in front of column_name and reverse the order direction. It works on mysql. More details
Product.order('something_date ASC') # NULLS came first
Product.order('-something_date DESC') # NULLS came last

Bit late to the show but there is a generic SQL way to do it. As usual, CASE to the rescue.
Photo.order('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id')

The easiest way is to use:
.order('name nulls first')

For posterity's sake, I wanted to highlight an ActiveRecord error relating to NULLS FIRST.
If you try to call:
Model.scope_with_nulls_first.last
Rails will attempt to call reverse_order.first, and reverse_order is not compatible with NULLS LAST, as it tries to generate the invalid SQL:
PG::SyntaxError: ERROR: syntax error at or near "DESC"
LINE 1: ...dents" ORDER BY table_column DESC NULLS LAST DESC LIMIT...
This was referenced a few years ago in some still-open Rails issues (one, two, three). I was able to work around it by doing the following:
scope :nulls_first, -> { order("table_column IS NOT NULL") }
scope :meaningfully_ordered, -> { nulls_first.order("table_column ASC") }
It appears that by chaining the two orders together, valid SQL gets generated:
Model Load (12.0ms) SELECT "models".* FROM "models" ORDER BY table_column IS NULL DESC, table_column ASC LIMIT 1
The only downside is that this chaining has to be done for each scope.

Rails 6.1 adds nulls_first and nulls_last methods to Arel for PostgreSQL.
Example:
User.order(User.arel_table[:login_count].desc.nulls_last)
Source: https://www.bigbinary.com/blog/rails-6-1-adds-nulls-first-and-nulls-last-to-arel

Here are some Rails 6 solutions.
The answer by #Adam Sibik is a great summary about the difference between various database systems.
Unfortunately, though, some of the presented solutions, including "Universal solution" and "PostgreSQL and MySQL specific", would not work any more with Rails 6 (ActiveRecord 6) as a result of its changed specification of order() not accepting some raw SQLs (I confirm the "PostgreSQL specific" solution still works as of Rails 6.1.4). For the background of this change, see, for example,
"Updates for SQL Injection in Rails 6.1" by Justin.
To circumvent the problem, you can wrap around the SQL statements with Arel.sql as follows, where NULLs come last, providing you are 100% sure the SQL statements you give are safe.
Photo.order(Arel.sql('CASE WHEN collection_id IS NULL THEN 1 ELSE 0 END, collection_id'))
Just for reference, if you want to sort by a Boolean column (is_ok, as an example) in the order of [TRUE, FALSE, NULL] regardless of the database systems, either of these should work:
Photo.order(Arel.sql('CASE WHEN is_ok IS NULL THEN 1 ELSE 0 END, is_ok DESC'))
Photo.order(Arel.sql('CASE WHEN is_ok IS NULL THEN 1 WHEN is_ok IS TRUE THEN -1 ELSE 0 END'))
(n.b., SQLite does not have the Boolean type and so the former may be safer arguably, though it should not matter because Rails should guarantee the value is either 0 or 1 (or NULL).)

In my case I needed sort lines by start and end date by ASC, but in few cases end_date was null and that lines should be in above, I used
#invoice.invoice_lines.order('start_date ASC, end_date ASC NULLS FIRST')

Adding arrays together will preserve order:
#nonull = Photo.where("collection_id is not null").order("collection_id desc")
#yesnull = Photo.where("collection_id is null")
#wanted = #nonull+#yesnull
http://www.ruby-doc.org/core/classes/Array.html#M000271

It seems like you'd have to do it in Ruby if you want consistent results across database types, as the database itself interprets whether or not the NULLS go at the front or end of the list.
Photo.all.sort {|a, b| a.collection_id.to_i <=> b.collection_id.to_i}
But that is not very efficient.

Related

Rails 4 - Order by the presence of an attribute

Using Rails 4. Psql DB.
I have a model Article with an attribute amazon_title. I am having trouble understanding how I can order my articles so articles with the amazon_title present are first, and ones without are second.
I've tried ordering them like this with no success:
Article.all.order(amazon_title: :desc)
The above orders it alphabetically, showing blank first, present second, and nil third.
I feel like this is very simple, but for some reason I cannot find the answer. Thanks!
For PostgreSQL (the order will be true, false, nil):
Article.order('amazon_title DESC NULLS LAST')
Another option (database agnostic):
Article.order('(CASE WHEN amazon_title THEN 1 WHEN amazon_title IS NULL THEN 2 ELSE 3 END) ASC')
In PostgreSQL you can pass NULLS FIRST OR NULLS LAST depending on your requirement. That's why I asked you about your DB.
Article.order('amazon_title DESC NULLS FIRST')
the above will list the NULLS first and
Article.order('amazon_title DESC NULLS LAST')
and this one will list the NULL records last.
Hope that helps!

Rails Postgres Error GROUP BY clause or be used in an aggregate function

In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size
I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)
See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?

Model.first does not retrieve first record from table

Model.first doesnot retrive first record from table. Instead it retrives any random record from table.
eg:
Merchant.first
Query
SELECT "merchants".* FROM "merchants" LIMIT 1
=> <Merchant id: 6, merchant_name: "Bestylish", description: "", description_html: "" >
Instead the query should be
SELECT "merchants".* FROM "merchants" ORDER BY "merchants"."id" ASC LIMIT 1;
Why it doesnot retrive the first record
Model.first will use the default sorting of your database.
For example. In Postgresql default sorting is not necessarily an id.
This seems to be default behaviour with Postgres, as some active-record versions do not add a default ordering to the query for first, while adding one for last.
https://github.com/rails/rails/issues/9885
PostgreSQL does not by default apply a sort, which is generally a good thing for performance.
So in this context "first" means "the first row returned", not "the first row when ordered by some meaningless key value".
Curiously "last" does seem to order by id.
It is defined here, in Rails 4, to order by primary key if no other order conditions are specified.
In Rails 3.2.11, it is as such:
def find_first
if loaded?
#records.first
else
#first ||= limit(1).to_a[0]
end
end
Without the order method, which will just apply the limit and then leave the ordering up to your database.
You need to apply the ordering yourself. Try calling Merchant.order('id ASC').first
It may be possible to automate this using default scopes in your model but I'm not sure about that.

when I do order on a counter_cache it returns nil objects first?

I have owns_count counter_cache on my items model. When i do
Items.order("owns_cache DESC")
, it returns me objects that are nil before other results. If I do
"owns_cache ASC"
, it is correct.
What should I be doing?
How NULLs get ordered depends on the underlying database.
For PostgreSQL, you could do this:
Items.order("owns_cache DESC NULLS LAST")
For MySQL and SQLite:
Items.order("COALESCE(owns_cache, 0) DESC")
I think MySQL sorts NULLs at the bottom of a DESC ordering though so you might not need anything special there. This COALESCE approach will also work in PostgreSQL so this would be a portable solution that should give you consistent results everywhere.
If you wanted NULLs at the bottom on an ASC sort, you'd replace the 0 with something larger than the largest owns_cache number.

Rails expanding fields with scope, PG does not like it

I have a model of Widgets. Widgets belong to a Store model, which belongs to an Area model, which belongs to a Company. At the Company model, I need to find all associated widgets. Easy:
class Widget < ActiveRecord::Base
def self.in_company(company)
includes(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
Which will generate this beautiful query:
> Widget.in_company(Company.first).count
SQL (50.5ms) SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1
=> 15088
But, I later need to use this scope in more complex scope. The problem is that AR is expanding the query by selecting individual fields, which fails in PG because selected fields must in the GROUP BY clause or the aggregate function.
Here is the more complex scope.
def self.sum_amount_chart_series(company, start_time)
orders_by_day = Widget.in_company(company).archived.not_void.
where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
group(pg_print_date_group).
select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
end
def self.pg_print_date_group
"CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end
And this is the select it is throwing at PG:
> Widget.sum_amount_chart_series(Company.first, 1.day.ago)
SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)
Which generates this error:
PGError: ERROR: column
"widgets.id" must appear in the
GROUP BY clause or be used in an
aggregate function LINE 1: SELECT
"widgets"."id" AS t0_r0,
"widgets"."user_id...
How do I rewrite the Widget.in_company scope so that AR does not expand the select query to include every Widget model field?
As Frank explained, PostgreSQL will reject any query which doesn't return a reproducible set of rows.
Suppose you've a query like:
select a, b, agg(c)
from tbl
group by a
PostgreSQL will reject it because b is left unspecified in the group by statement. Run that in MySQL, by contrast, and it will be accepted. In the latter case, however, fire up a few inserts, updates and deletes, and the order of the rows on disk pages ends up different.
If memory serves, implementation details are so that MySQL will actually sort by a, b and return the first b in the set. But as far as the SQL standard is concerned, the behavior is unspecified -- and sure enough, PostgreSQL does not always sort before running aggregate functions.
Potentially, this might result in different values of b in result set in PostgreSQL. And thus, PostgreSQL yields an error unless you're more specific:
select a, b, agg(c)
from tbl
group by a, b
What Frank highlighted is that, in PostgreSQL 9.1, if a is the primary key, than you can leave b unspecified -- the planner has been taught to ignore subsequent group by fields when applicable primary keys imply a unique row.
For your problem in particular, you need to specify your group by as you currently do, plus every field that you're basing your aggregate onto, i.e. "widgets"."id", "widgets"."user_id", [snip] but not stuff like sum(amount), which are the aggregate function calls.
As an off topic side note, I'm not sure how your ORM/model works but the SQL it's generating isn't optimal. Many of those left outer joins seem like they should be inner joins. This will result in allowing the planner to pick an appropriate join order where applicable.
PostgreSQL version 9.1 (beta at this moment) might fix your problem, but only if there is a functional dependency on the primary key.
From the release notes:
Allow non-GROUP BY columns in the
query target list when the primary key
is specified in the GROUP BY clause
(Peter Eisentraut)
Some other database system already
allowed this behavior, and because of
the primary key, the result is
unambiguous.
You could run a test and see if it fixes your problem. If you can wait for the production release, this can fix the problem without changing your code.
Firstly simplify your life by storing all dates in a standard time-zone. Changing dates with time-zones should really be done in the view as a user convenience. This alone should save you a lot of pain.
If you're already in production write a migration to create a normalised_date column wherever it would be helpful.
nrI propose that the other problem here is the use of raw SQL, which rails won't poke around for you. To avoid this try using the gem called Squeel (aka Metawhere 2) http://metautonomo.us/projects/squeel/
If you use this you should be able to remove hard coded SQL and let rails get back to doing its magic.
For example:
.select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
becomes (once your remove the need for normalising the date):
.select{sum(amount).as(total_amount)}
Sorry to answer my own question, but I figured it out.
First, let me apologize to those who thought I might be having an SQL or Postgres issue, it is not. The issue is with ActiveRecord and the SQL it is generating.
The answer is... use .joins instead of .includes. So I just changed the line in the top code and it works as expected.
class Widget < ActiveRecord::Base
def self.in_company(company)
joins(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
I'm guessing that when using .includes, ActiveRecord is trying to be smart and use JOINS in the SQL, but it's not smart enough for this particular case and was generating that ugly SQL to select all associated columns.
However, all the replies have taught me quite a bit about Postgres that I did not know, so thank you very much.
sort in mysql:
> ids = [11,31,29]
=> [11, 31, 29]
> Page.where(id: ids).order("field(id, #{ids.join(',')})")
in postgres:
def self.order_by_ids(ids)
order_by = ["case"]
ids.each_with_index.map do |id, index|
order_by << "WHEN id='#{id}' THEN #{index}"
end
order_by << "end"
order(order_by.join(" "))
end
User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id)
#=> [3,2,1]

Resources