Rails: Why does where(id: objects) work? - ruby-on-rails

I have the following statement:
Customer.where(city_id: cities)
which results in the following SQL statement:
SELECT customers.* FROM customers WHERE customers.city_id IN (SELECT cities.id FROM cities...
Is this intended behavior? Is it documented somewhere? I will not use the Rails code above and use one of the followings instead:
Customer.where(city_id: cities.pluck(:id))
or
Customer.where(city: cities)
which results in the exact same SQL statement.

The AREL querying library allows you to pass in ActiveRecord objects as a short-cut. It'll then pass their primary key attributes into the SQL it uses to contact the database.
When looking for multiple objects, the AREL library will attempt to find the information in as few database round-trips as possible. It does this by holding the query you're making as a set of conditions, until it's time to retrieve the objects.
This way would be inefficient:
users = User.where(age: 30).all
# ^^^ get all these users from the database
memberships = Membership.where(user_id: users)
# ^^^^^ This will pass in each of the ids as a condition
Basically, this way would issue two SQL statements:
select * from users where age = 30;
select * from memberships where user_id in (1, 2, 3);
Each of these involves a call on a network port between applications and the data to then be passsed back across that same port.
This would be more efficient:
users = User.where(age: 30)
# This is still a query object, it hasn't asked the database for the users yet.
memberships = Membership.where(user_id: users)
# Note: this line is the same, but users is an AREL query, not an array of users
It will instead build a single, nested query so it only has to make a round-trip to the database once.
select * from memberships
where user_id in (
select id from users where age = 30
);
So, yes, it's expected behaviour. It's a bit of Rails magic, it's designed to improve your application's performance without you having to know about how it works.
There's also some cool optimisations, like if you call first or last instead of all, it will only retrieve one record.
User.where(name: 'bob').all
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob'
User.where(name: 'bob').first
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' AND ROWNUM <= 1
Or if you set an order, and call last, it will reverse the order then only grab the last one in the list (instead of grabbing all the records and only giving you the last one).
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login) WHERE ROWNUM <= 1
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login DESC) WHERE ROWNUM <= 1
# Notice, login DESC

Why does it work?
Something deep in the ActiveRecord query builder is smart enough to see that if you pass an array or a query/criteria, it needs to build an IN clause.
Is this documented anywhere?
Yes, http://guides.rubyonrails.org/active_record_querying.html#hash-conditions
2.3.3 Subset conditions
If you want to find records using the IN expression you can pass an array to the conditions hash:
Client.where(orders_count: [1,3,5])
This code will generate SQL like this:
SELECT * FROM clients WHERE (clients.orders_count IN (1,3,5))

Related

Ruby on Rails - Limit Database Query to One Result only

I want to query the database but only find out if there is at least one result or not. I am trying to minimize the cost for this transaction. What would the structure be in Rails to have the query be SELECT TOP or SELECT FIRST in SQL?
You could try exists?
Person.exists?(5) # by primary key
Person.exists?(name: 'David')
Person.exists? # is there at least one row in the table?
Person.where(name: 'Spartacus', rating: 4).exists?
Person.active.exists? # if you have an "active" scope
Note that this limits the result set to 1 in the SQL query and the select clause is something like SELECT 1 AS one

Arel and CTE to Wrap Query

I have to update a search builder which builds a relation with CTE. This is necessary because a complex relation (which includes DISTINCT, JOINs etc) is first built and then it's results have to be ordered – all in one query.
Here's a simplified look at things:
rel = User.select('DISTINCT ON (users.id) users.*').where(<lotsastuff>)
rel.to_sql
# SELECT DISTINCT ON (users.id) users.*
# FROM "users"
# WHERE <lotsastuff>
rel2 = User.from_cte('cte_table', rel).order(:created_at)
rel2.to_sql
# WITH "cte_table" AS (
# SELECT DISTINCT ON (users.id) users.*
# FROM "users"
# WHERE <lotsastuff>
# ) SELECT "cte_table".* FROM "cte_table"
# ORDER BY "cte_table"."created_at" ASC
The beauty of it is that rel2 responds as expected e.g. to count.
The from_cte method is provided by the "posgres_ext" gem which appears to have been abandoned. I'm therefore looking for another way to build the relation rel2 from rel.
The Arel docs mention a case which doesn't seem to help here.
Any hints on how to get there? Thanks a bunch!
PS: I know how to do this with to queries by selecting all user IDs in the first, then build a query with IN over the IDs and order there. However, I'm curious whether this is possible with one query (with or without CTE) as well.
Since your CTE is non-recursive, you can rewrite it as a subquery in the FROM clause. The only change is that Postgres's planner will optimize it as part of the main query instead of separately (because a CTE is an optimization fence). In ActiveRecord this works for me (tested on 5.1.4):
2.4.1 :001 > rel = User.select("DISTINCT ON (users.id) users.*").where("1=1")
2.4.1 :002 > puts User.from(rel, 'users').order(:created_at).to_sql
SELECT "users".* FROM (SELECT DISTINCT ON (users.id) users.* FROM "users" WHERE (1=1)) users ORDER BY "users"."created_at" ASC
I don't see any way to squeeze a CTE into ActiveRecord without extending it though, like what postgres_ext does. Sorry!
From what you've mentioned, I did not understand why do you need to use CTE instead of just a nested query.
rel = User.select('DISTINCT ON (users.id) users.*').where(<lotsastuff>).arel
inner_query = Arel::Table.new(:inner_query)
composed_cte = Arel::Nodes::As.new(inner_query, rel)
select_manager = Arel::SelectManager.new(composed_cte)
rel2 = select_manager.project('COUNT(*)')
rel2.to_sql
rel3 = select_manager.order('created_at ASC')
rel3.to_sql
you can then execute that sql

Solving a PG::GroupingError: ERROR

The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);

Order by foreign key in activerecord: without a join?

I want to expand this question.
order by foreign key in activerecord
I'm trying to order a set of records based on a value in a really large table.
When I use join, it brings all the "other" records data into the objects.. As join should..
#table users 30+ columns
#table bids 5 columns
record = Bid.find(:all,:joins=>:users, :order=>'users.ranking DESC' ).first
Now record holds 35 fields..
Is there a way to do this without the join?
Here's my thinking..
With the join I get this query
SELECT * FROM "bids"
left join users on runner_id = users.id
ORDER BY ranking LIMIT 1
Now I can add a select to the code so I don't get the full user table, but putting a select in a scope is dangerous IMHO.
When I write sql by hand.
SELECT * FROM bids
order by (select users.ranking from users where users.id = runner_id) DESC
limit 1
I believe this is a faster query, based on the "explain" it seems simpler.
More important than speed though is that the second method doesn't have the 30 extra fields.
If I build in a custom select inside the scope, it could explode other searches on the object if they too have custom selects (there can be only one)
What you would like to achieve in active record writing is something along
SELECT b.* from bids b inner join users u on u.id=b.user_id order by u.ranking desc
In active record i would write such as:
Bids.joins("inner join users u on bids.user_id=u.id").order("u.ranking desc")
I think it's the only to make a join without fetching all attributes from the user models.

rails 4 - postgres query syntax

I am working on a rails 4 demo app, and I'm stumped by an activerecord query.
There's two main tables: widgets and orders. The customer can enter an order for widgets, which have a variable price. Until a site admin closes a deal and sets the final_price for the widget, the widgets.final_price is NULL and the order is still open.
I want to select closed orders so I write something like this:
#closed = Order.joins(:widget).where("orders.buyer_id = :user_id AND widgets.final_price IS NOT NULL", {user_id: current_user})
In my rails log this queries the DB like so:
SELECT COUNT(*) FROM "orders" INNER JOIN "widgets" ON "widgets"."id" = "orders"."widget_id" WHERE (orders.buyer_id = 2 AND widgets.final_price IS NOT NULL)
However, at the moment none of the widgets have a final price-- I verify this in the postresql admin, querying:
SELECT * FROM "widgets" WHERE widgets.final_price IS NOT NULL // returns 0 results
Yet in the activerecord query above, #closed.count returns 5, and #closed.inspect shows 5 records. How can this be, when I verified that there's no values for widget.final_price?

Resources