Why does rails add "order_by id" to all queries? Which ends up breaking Postgres - ruby-on-rails

The following rails code:
class User < MyModel
def top?
data = self.answers.select("sum(up_votes) total_up_votes").first
return (data.total_up_votes.present? && data.total_up_votes >= 10)
end
end
Generates the following query (note the order_by added by Rails):
SELECT
sum(up_votes) total_up_votes
FROM
"answers"
WHERE
"answers"."user_id" = 100
ORDER BY
"answers"."id" ASC
This throws an error in Postgres:
PG::GroupingError: ERROR: column "answers.id" must appear in the GROUP BY clause or be used in an aggregate function
Is rails' database abstraction only made with MySQL in mind?

No, the 'order by id' is added to ensure .first always returns the same result. Without an ORDER BY clause, the result is not guaranteed to be the same under the SQL spec.
For your case, you should use .sum() instead of .select() to do this more simply:
def top?
self.answers.sum(:up_votes) >= 10
end

You used #first method at the end. That's the reason.
self.answers.select("sum(up_votes) total_up_votes").first # <~~~
Model.first finds the first record ordered by the primary key
Look at the clause
ORDER BY
"answers"."id" ASC # this is the primary key of your table ansers.
Check the documentation of 1.1.3 first or #first .

Related

Check if ActiveRecord::Relation alread includes JOIN

I'm inside method that adds filter (user.type) to my query/relation.
Sometimes if grouping by the user (which needs INNER join to users table in another module) is selected before filtering I receive an error:
PostgreSQL: PG::DuplicateAlias: ERROR: table name "users" specified more than once
Before error happen JOIN is already in query -
$ pry> relation.to_sql
SELECT \"posts\".* FROM \"posts\"
INNER JOIN users ON users.id = posts.user_id
WHERE \"posts\".\"created_at\" BETWEEN '2019-05-01 00:00:00'
AND '2020-05-01 23:59:59' AND \"users\".\"type\" = 'Guest'"
I wanna fix it, by checking if the table is already joined inside my ActiveRecord::Relation object. I added:
def join_users
return relation if /JOIN users/.match? relation.to_sql
relation.joins('LEFT JOIN users ON users.id = posts.user_id')
end
This solution works, but I wonder - is there any better way to check if JOIN is inside relation?
Perhaps you can use joins_values, which isn't documented, but is an ActiveRecord_Relation public method that returns an array containing the name of the table the current query (object) is constructed with:
Post.joins(:user).joins_values # [:user]
Post.all.joins_values # []
if simple join
Post.joins(:user)
you can find via joins_values
so it will look like Post.joins(:user).joins_values # [:user]
if post has left joins
Post.left_joins(:user)
you can find via left_outer_joins_values
So in this case if you write Post.joins(:user).joins_values # []
so you can fix it by writing Post.joins(:user).left_outer_joins_values # [:user]

ActiveRecord select with OR and exclusive limit

I have the need to query the database and retrieve the last 10 objects that are either active or declined. We use the following:
User.where(status: [:active, :declined]).limit(10)
Now we need to get the last 10 of each status (total of 20 users)
I've tried the following:
User.where(status: :active).limit(10).or(User.where(status: : declined).limit(10))
# SELECT "users".* FROM "users" WHERE ("users"."status" = $1 OR "users"."status" = $2) LIMIT $3
This does the same as the previous query and returns only 10 users, of mixed statuses.
How can I get the last 10 active users and the last 10 declined users with a single query?
I'm not sure that SQL allows doing what you want. First thing I would try would be to use a subquery, something like this:
class User < ApplicationRecord
scope :active, -> { where status: :active }
scope :declined, -> { where status: :declined }
scope :last_active_or_declined, -> {
where(id: active.limit(10).pluck(:id))
.or(where(id: declined.limit(10).pluck(:id))
}
end
Then somewhere else you could just do
User.last_active_or_declined()
What this does is to perform 2 different subqueries asking separately for each of the group of users and then getting the ones in the propper group ids. I would say you could even forget about the pluck(:id) parts since ActiveRecord is smart enough to add the proper select clause to your SQL, but I'm not 100% sure and I don't have any Rails project at hand where I can try this.
limit is not a permitted value for #or relationship. If you check the Rails code, the Error raised come from here:
def or!(other) # :nodoc:
incompatible_values = structurally_incompatible_values_for_or(other)
unless incompatible_values.empty?
raise ArgumentError, "Relation passed to #or must be structurally compatible. Incompatible values: #{incompatible_values}"
end
# more code
end
You can check which methods are restricted further down in the code here:
STRUCTURAL_OR_METHODS = Relation::VALUE_METHODS - [:extending, :where, :having, :unscope, :references]
def structurally_incompatible_values_for_or(other)
STRUCTURAL_OR_METHODS.reject do |method|
get_value(method) == other.get_value(method)
end
end
You can see in the Relation class here that limit is restricted:
SINGLE_VALUE_METHODS = [:limit, :offset, :lock, :readonly, :reordering,
:reverse_order, :distinct, :create_with, :skip_query_cache,
:skip_preloading]
So you will have to resort to raw SQL I'm afraid
I don't think you can do it with a single query, but you can do it with two queries, get the record ids, and then build a query using those record ids.
It's not ideal but as you're just plucking ids the impact isn't too bad.
user_ids = User.where(status: :active).limit(10).pluck(:id) + User.where(status: :declined).limit(10).pluck(id)
users = User.where(id: user_ids)
I think you can use UNION. Install active_record_union and replace or with union:
User.where(status: :active).limit(10).union(User.where(status: :declined).limit(10))

Order and limit clauses unexpectedly passed down to scope

(the queries here have no sensible semantic but I chose them for the sake of simplicity)
Project.limit(10).where(id: Project.select(:id))
generates as expected the following SQL query:
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."id" IN (
SELECT
"projects"."id"
FROM
"projects"
) LIMIT 10
But if I defined in my Project class the method
def self.my_filter
where(id: Project.select(:id))
end
Then
Project.limit(10).my_filter
generates the following query
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."id" IN (
SELECT
"projects"."id"
FROM
"projects" LIMIT 10
) LIMIT 10
See how the LIMIT 10 has now been also applied to the subquery.
Same issue when using a .order clause.
It happens with Rails 4.2.2 and Rails 3.2.20. It happens when the subquery is done on the Project table, it does happens if the subquery is done on another table.
Is there something I'm doing wrong here or do you think it is a Rails bug?
A workaround is to build my_filter by explicitly adding limit(nil).reorder(nil) to it but it is hackish.
EDIT: another workaround is to append the limit clause after the my_filter scope: Project.my_filter.limit(10).
This is actually a feature. Class methods work similar to scopes in ActiveRecord models.
And if you want to remove the already added scopes, you can either use unscoped, either call the method on a class directly, not on a scope:
def self.my_filter
unscoped.where(id: Project.select(:id))
end
# or
Project.my_filter
Your class method is applied in a way you may not be expecting:
Project.limit(10) # => a relation, not the Project class
.my_filter # => calling a class method on a relation
# Does, the following, suddenly:
# scoping { Project.my_filter }
# It's a relation's wrapper
From: .../ruby-2.0.0-p598/gems/activerecord-4.1.6/lib/active_record/relation.rb # line 281:
Owner: ActiveRecord::Relation
Visibility: public
Signature: scoping()
Scope all queries to the current scope.
Comment.where(post_id: 1).scoping do
Comment.first
end
# => SELECT "comments".* FROM "comments"
# WHERE "comments"."post_id" = 1 ORDER BY "comments"."id" ASC LIMIT 1
Please check unscoped if you want to remove all previous scopes (including
the default_scope) during the execution of a block.
Inside that scoping block, your class will include all the scoping rules of a relation it was built from into all queries, as scoping will enforce context. This is done so class methods can be properly chained, while still retaining the correct self. Of course, when you try using a class method inside the class method, stuff blows up.
In your first, "expected outcome" example, where is "natively" defined on relations, so no scope enforcement takes place: it's just not necessary.
Yeah, documentation hints that you can use unscoped in your nested query, like so:
def my_filter
where(id: Project.unscoped.select(:id))
end
...since that's where you need the "bare basis". Or, as you've already found out, you can just place limit at the end:
Project.my_filter.limit(10)
...here, at the time my_filter gets to execute, scoping will do effectively nothing: there will be no context built up to this point.

ActiveRecord change or reset ordering defined in scope

I have a function which uses another functions output: an ActiveRecord::Relation object. This relation already has an order clause:
# This function cannot be changed
def black_box
Product.where('...').order("name")
end
def my_func
black_box.order("id")
end
when I execute the relation the ORDER_BY clause is ordered by the order functions:
SELECT * FROM products
WHERE ...
ORDER_BY('name', 'id') // The first order function, then the second
Is there any way I can specify the relation to insert my order function BEFORE the previous one? So the SQL would look like so?
SELECT * FROM products
WHERE ...
ORDER_BY('id', 'name')
You could use reorder method to reset the original order and add your new order by column.
reorder(*args)
Replaces any existing order defined on the relation with the specified order.
User.order('email DESC').reorder('id ASC') # generated SQL has 'ORDER BY id ASC'
Subsequent calls to order on the same relation will be appended. For example:
User.order('email DESC').reorder('id ASC').order('name ASC')
# generates a query with 'ORDER BY id ASC, name ASC'.

Grouping a timestamp field by date in Ruby On Rails / PostgreSQL

I am trying to convert the following bit of code to work with PostgreSQL.
After doing some digging around I realized that PostgreSQL is much stricter (in a good way) with the GROUP BY than MySQL but for the life of me I cannot figure out how to rewrite this statement to satisfy Postgres.
def show
show! do
#recent_tasks = resource.jobs.group(:task).order(:created_at).limit(5)
end
end
PG::Error: ERROR: column "jobs.created_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...ND "jobs"."project_id" = 1 GROUP BY task ORDER BY created_at...
^
: SELECT COUNT(*) AS count_all, task AS task FROM "jobs" WHERE "jobs"."deleted_at" IS NULL AND "jobs"."project_id" = 1 GROUP BY task ORDER BY created_at DESC, created_at LIMIT 5
You cannot use column in order which is not in group by.
You can do something like
#recent_tasks = resource.jobs.group(:task, :created_at).order(:created_at).limit(5)
but it will change result
You can also
#recent_tasks = resource.jobs.group(:task).order(:task).limit(5)
or
#recent_tasks = resource.jobs.group(:task).order('count(*) desc').limit(5)

Resources