I have to update a search builder which builds a relation with CTE. This is necessary because a complex relation (which includes DISTINCT, JOINs etc) is first built and then it's results have to be ordered – all in one query.
Here's a simplified look at things:
rel = User.select('DISTINCT ON (users.id) users.*').where(<lotsastuff>)
rel.to_sql
# SELECT DISTINCT ON (users.id) users.*
# FROM "users"
# WHERE <lotsastuff>
rel2 = User.from_cte('cte_table', rel).order(:created_at)
rel2.to_sql
# WITH "cte_table" AS (
# SELECT DISTINCT ON (users.id) users.*
# FROM "users"
# WHERE <lotsastuff>
# ) SELECT "cte_table".* FROM "cte_table"
# ORDER BY "cte_table"."created_at" ASC
The beauty of it is that rel2 responds as expected e.g. to count.
The from_cte method is provided by the "posgres_ext" gem which appears to have been abandoned. I'm therefore looking for another way to build the relation rel2 from rel.
The Arel docs mention a case which doesn't seem to help here.
Any hints on how to get there? Thanks a bunch!
PS: I know how to do this with to queries by selecting all user IDs in the first, then build a query with IN over the IDs and order there. However, I'm curious whether this is possible with one query (with or without CTE) as well.
Since your CTE is non-recursive, you can rewrite it as a subquery in the FROM clause. The only change is that Postgres's planner will optimize it as part of the main query instead of separately (because a CTE is an optimization fence). In ActiveRecord this works for me (tested on 5.1.4):
2.4.1 :001 > rel = User.select("DISTINCT ON (users.id) users.*").where("1=1")
2.4.1 :002 > puts User.from(rel, 'users').order(:created_at).to_sql
SELECT "users".* FROM (SELECT DISTINCT ON (users.id) users.* FROM "users" WHERE (1=1)) users ORDER BY "users"."created_at" ASC
I don't see any way to squeeze a CTE into ActiveRecord without extending it though, like what postgres_ext does. Sorry!
From what you've mentioned, I did not understand why do you need to use CTE instead of just a nested query.
rel = User.select('DISTINCT ON (users.id) users.*').where(<lotsastuff>).arel
inner_query = Arel::Table.new(:inner_query)
composed_cte = Arel::Nodes::As.new(inner_query, rel)
select_manager = Arel::SelectManager.new(composed_cte)
rel2 = select_manager.project('COUNT(*)')
rel2.to_sql
rel3 = select_manager.order('created_at ASC')
rel3.to_sql
you can then execute that sql
Related
I have the following relationship in Rails: Campaign -has-many- Promise(s)
And I have the following Ruby code to return verbose list of campaigns with promises count:
def campaigns
results = Campaign
.where(user_id: current_user.id)
.left_outer_joins(:promises)
.select('campaigns.*', 'COUNT(DISTINCT promises.id) AS promises_count')
.group('campaigns.id')
.ransack(params[:q])
.result(distinct: true)
render json: {
results: results.page(params[:page]).per(params[:per_page]),
total_results: results.count(:id)
}
end
Everything works fine, unless I try to sort by promises_count. Ransack (or something else?) generates the following SQL for Postgres:
SELECT DISTINCT
campaigns.*,
COUNT(DISTINCT promises.id) AS promises_count
FROM "campaigns"
LEFT OUTER JOIN "promises"
ON "promises"."campaign_id" = "campaigns"."id"
LEFT OUTER JOIN "promises" "promises_campaigns"
ON "promises_campaigns"."campaign_id" = "campaigns"."id"
WHERE "campaigns"."user_id" = 1 GROUP BY campaigns.id;
It works, but there is no ORDER BY for some reason. When I sort by other properties, it works fine. I think Ransack is missing something, and handles promises_count different way because it's generated property and not real one.
It's possible to sort in Postgres, for example, manual query with added ORDER BY works:
SELECT DISTINCT
campaigns.*,
COUNT(DISTINCT promises.id) AS promises_count
FROM "campaigns"
LEFT OUTER JOIN "promises"
ON "promises"."campaign_id" = "campaigns"."id"
LEFT OUTER JOIN "promises" "promises_campaigns"
ON "promises_campaigns"."campaign_id" = "campaigns"."id"
WHERE "campaigns"."user_id" = 1
GROUP BY campaigns.id
ORDER BY promises_count desc;
How do I make Ransack work? I tried different combinations of queries without too much luck.
I have a working SQL query for Postgres v10.
SELECT *
FROM
(
SELECT DISTINCT ON (title) products.title, products.*
FROM "products"
) subquery
WHERE subquery.active = TRUE AND subquery.product_type_id = 1
ORDER BY created_at DESC
With the goal of the query to do a distinct based on the title column, then filter and order them. (I used the subquery in the first place, as it seemed there was no way to combine DISTINCT ON with ORDER BY without a subquery.
I am trying to express said query in ActiveRecord.
I have been doing
Product.select("*")
.from(Product.select("DISTINCT ON (product.title) product.title, meals.*"))
.where("subquery.active IS true")
.where("subquery.meal_type_id = ?", 1)
.order("created_at DESC")
and, that works! But, it's fairly messy with the string where clauses in there. Is there a better way to express this query with ActiveRecord/Arel, or am I just running into the limits of what ActiveRecord can express?
I think the resulting ActiveRecord call can be improved.
But I would start improving with original SQL query first.
Subquery
SELECT DISTINCT ON (title) products.title, products.* FROM products
(I think that instead of meals there should be products?) has duplicate products.title, which is not necessary there. Worse, it misses ORDER BY clause. As PostgreSQL documentation says:
Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
I would rewrite sub-query as:
SELECT DISTINCT ON (title) * FROM products ORDER BY title ASC
which gives us a call:
Product.select('DISTINCT ON (title) *').order(title: :asc)
In main query where calls use Rails-generated alias for the subquery. I would not rely on Rails internal convention on aliasing subqueries, as it may change anytime. If you do not take this into account you could merge these conditions in one where call with hash-style argument syntax.
The final result:
Product.select('*')
.from(Product.select('DISTINCT ON (title) *').order(title: :asc))
.where(subquery: { active: true, meal_type_id: 1 })
.order('created_at DESC')
I have the following statement:
Customer.where(city_id: cities)
which results in the following SQL statement:
SELECT customers.* FROM customers WHERE customers.city_id IN (SELECT cities.id FROM cities...
Is this intended behavior? Is it documented somewhere? I will not use the Rails code above and use one of the followings instead:
Customer.where(city_id: cities.pluck(:id))
or
Customer.where(city: cities)
which results in the exact same SQL statement.
The AREL querying library allows you to pass in ActiveRecord objects as a short-cut. It'll then pass their primary key attributes into the SQL it uses to contact the database.
When looking for multiple objects, the AREL library will attempt to find the information in as few database round-trips as possible. It does this by holding the query you're making as a set of conditions, until it's time to retrieve the objects.
This way would be inefficient:
users = User.where(age: 30).all
# ^^^ get all these users from the database
memberships = Membership.where(user_id: users)
# ^^^^^ This will pass in each of the ids as a condition
Basically, this way would issue two SQL statements:
select * from users where age = 30;
select * from memberships where user_id in (1, 2, 3);
Each of these involves a call on a network port between applications and the data to then be passsed back across that same port.
This would be more efficient:
users = User.where(age: 30)
# This is still a query object, it hasn't asked the database for the users yet.
memberships = Membership.where(user_id: users)
# Note: this line is the same, but users is an AREL query, not an array of users
It will instead build a single, nested query so it only has to make a round-trip to the database once.
select * from memberships
where user_id in (
select id from users where age = 30
);
So, yes, it's expected behaviour. It's a bit of Rails magic, it's designed to improve your application's performance without you having to know about how it works.
There's also some cool optimisations, like if you call first or last instead of all, it will only retrieve one record.
User.where(name: 'bob').all
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob'
User.where(name: 'bob').first
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' AND ROWNUM <= 1
Or if you set an order, and call last, it will reverse the order then only grab the last one in the list (instead of grabbing all the records and only giving you the last one).
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login) WHERE ROWNUM <= 1
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login DESC) WHERE ROWNUM <= 1
# Notice, login DESC
Why does it work?
Something deep in the ActiveRecord query builder is smart enough to see that if you pass an array or a query/criteria, it needs to build an IN clause.
Is this documented anywhere?
Yes, http://guides.rubyonrails.org/active_record_querying.html#hash-conditions
2.3.3 Subset conditions
If you want to find records using the IN expression you can pass an array to the conditions hash:
Client.where(orders_count: [1,3,5])
This code will generate SQL like this:
SELECT * FROM clients WHERE (clients.orders_count IN (1,3,5))
The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);
I have a model of Widgets. Widgets belong to a Store model, which belongs to an Area model, which belongs to a Company. At the Company model, I need to find all associated widgets. Easy:
class Widget < ActiveRecord::Base
def self.in_company(company)
includes(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
Which will generate this beautiful query:
> Widget.in_company(Company.first).count
SQL (50.5ms) SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1
=> 15088
But, I later need to use this scope in more complex scope. The problem is that AR is expanding the query by selecting individual fields, which fails in PG because selected fields must in the GROUP BY clause or the aggregate function.
Here is the more complex scope.
def self.sum_amount_chart_series(company, start_time)
orders_by_day = Widget.in_company(company).archived.not_void.
where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
group(pg_print_date_group).
select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
end
def self.pg_print_date_group
"CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end
And this is the select it is throwing at PG:
> Widget.sum_amount_chart_series(Company.first, 1.day.ago)
SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)
Which generates this error:
PGError: ERROR: column
"widgets.id" must appear in the
GROUP BY clause or be used in an
aggregate function LINE 1: SELECT
"widgets"."id" AS t0_r0,
"widgets"."user_id...
How do I rewrite the Widget.in_company scope so that AR does not expand the select query to include every Widget model field?
As Frank explained, PostgreSQL will reject any query which doesn't return a reproducible set of rows.
Suppose you've a query like:
select a, b, agg(c)
from tbl
group by a
PostgreSQL will reject it because b is left unspecified in the group by statement. Run that in MySQL, by contrast, and it will be accepted. In the latter case, however, fire up a few inserts, updates and deletes, and the order of the rows on disk pages ends up different.
If memory serves, implementation details are so that MySQL will actually sort by a, b and return the first b in the set. But as far as the SQL standard is concerned, the behavior is unspecified -- and sure enough, PostgreSQL does not always sort before running aggregate functions.
Potentially, this might result in different values of b in result set in PostgreSQL. And thus, PostgreSQL yields an error unless you're more specific:
select a, b, agg(c)
from tbl
group by a, b
What Frank highlighted is that, in PostgreSQL 9.1, if a is the primary key, than you can leave b unspecified -- the planner has been taught to ignore subsequent group by fields when applicable primary keys imply a unique row.
For your problem in particular, you need to specify your group by as you currently do, plus every field that you're basing your aggregate onto, i.e. "widgets"."id", "widgets"."user_id", [snip] but not stuff like sum(amount), which are the aggregate function calls.
As an off topic side note, I'm not sure how your ORM/model works but the SQL it's generating isn't optimal. Many of those left outer joins seem like they should be inner joins. This will result in allowing the planner to pick an appropriate join order where applicable.
PostgreSQL version 9.1 (beta at this moment) might fix your problem, but only if there is a functional dependency on the primary key.
From the release notes:
Allow non-GROUP BY columns in the
query target list when the primary key
is specified in the GROUP BY clause
(Peter Eisentraut)
Some other database system already
allowed this behavior, and because of
the primary key, the result is
unambiguous.
You could run a test and see if it fixes your problem. If you can wait for the production release, this can fix the problem without changing your code.
Firstly simplify your life by storing all dates in a standard time-zone. Changing dates with time-zones should really be done in the view as a user convenience. This alone should save you a lot of pain.
If you're already in production write a migration to create a normalised_date column wherever it would be helpful.
nrI propose that the other problem here is the use of raw SQL, which rails won't poke around for you. To avoid this try using the gem called Squeel (aka Metawhere 2) http://metautonomo.us/projects/squeel/
If you use this you should be able to remove hard coded SQL and let rails get back to doing its magic.
For example:
.select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
becomes (once your remove the need for normalising the date):
.select{sum(amount).as(total_amount)}
Sorry to answer my own question, but I figured it out.
First, let me apologize to those who thought I might be having an SQL or Postgres issue, it is not. The issue is with ActiveRecord and the SQL it is generating.
The answer is... use .joins instead of .includes. So I just changed the line in the top code and it works as expected.
class Widget < ActiveRecord::Base
def self.in_company(company)
joins(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
I'm guessing that when using .includes, ActiveRecord is trying to be smart and use JOINS in the SQL, but it's not smart enough for this particular case and was generating that ugly SQL to select all associated columns.
However, all the replies have taught me quite a bit about Postgres that I did not know, so thank you very much.
sort in mysql:
> ids = [11,31,29]
=> [11, 31, 29]
> Page.where(id: ids).order("field(id, #{ids.join(',')})")
in postgres:
def self.order_by_ids(ids)
order_by = ["case"]
ids.each_with_index.map do |id, index|
order_by << "WHEN id='#{id}' THEN #{index}"
end
order_by << "end"
order(order_by.join(" "))
end
User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id)
#=> [3,2,1]