I am working on a rails 4 demo app, and I'm stumped by an activerecord query.
There's two main tables: widgets and orders. The customer can enter an order for widgets, which have a variable price. Until a site admin closes a deal and sets the final_price for the widget, the widgets.final_price is NULL and the order is still open.
I want to select closed orders so I write something like this:
#closed = Order.joins(:widget).where("orders.buyer_id = :user_id AND widgets.final_price IS NOT NULL", {user_id: current_user})
In my rails log this queries the DB like so:
SELECT COUNT(*) FROM "orders" INNER JOIN "widgets" ON "widgets"."id" = "orders"."widget_id" WHERE (orders.buyer_id = 2 AND widgets.final_price IS NOT NULL)
However, at the moment none of the widgets have a final price-- I verify this in the postresql admin, querying:
SELECT * FROM "widgets" WHERE widgets.final_price IS NOT NULL // returns 0 results
Yet in the activerecord query above, #closed.count returns 5, and #closed.inspect shows 5 records. How can this be, when I verified that there's no values for widget.final_price?
Related
I have the following statement:
Customer.where(city_id: cities)
which results in the following SQL statement:
SELECT customers.* FROM customers WHERE customers.city_id IN (SELECT cities.id FROM cities...
Is this intended behavior? Is it documented somewhere? I will not use the Rails code above and use one of the followings instead:
Customer.where(city_id: cities.pluck(:id))
or
Customer.where(city: cities)
which results in the exact same SQL statement.
The AREL querying library allows you to pass in ActiveRecord objects as a short-cut. It'll then pass their primary key attributes into the SQL it uses to contact the database.
When looking for multiple objects, the AREL library will attempt to find the information in as few database round-trips as possible. It does this by holding the query you're making as a set of conditions, until it's time to retrieve the objects.
This way would be inefficient:
users = User.where(age: 30).all
# ^^^ get all these users from the database
memberships = Membership.where(user_id: users)
# ^^^^^ This will pass in each of the ids as a condition
Basically, this way would issue two SQL statements:
select * from users where age = 30;
select * from memberships where user_id in (1, 2, 3);
Each of these involves a call on a network port between applications and the data to then be passsed back across that same port.
This would be more efficient:
users = User.where(age: 30)
# This is still a query object, it hasn't asked the database for the users yet.
memberships = Membership.where(user_id: users)
# Note: this line is the same, but users is an AREL query, not an array of users
It will instead build a single, nested query so it only has to make a round-trip to the database once.
select * from memberships
where user_id in (
select id from users where age = 30
);
So, yes, it's expected behaviour. It's a bit of Rails magic, it's designed to improve your application's performance without you having to know about how it works.
There's also some cool optimisations, like if you call first or last instead of all, it will only retrieve one record.
User.where(name: 'bob').all
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob'
User.where(name: 'bob').first
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' AND ROWNUM <= 1
Or if you set an order, and call last, it will reverse the order then only grab the last one in the list (instead of grabbing all the records and only giving you the last one).
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login) WHERE ROWNUM <= 1
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login DESC) WHERE ROWNUM <= 1
# Notice, login DESC
Why does it work?
Something deep in the ActiveRecord query builder is smart enough to see that if you pass an array or a query/criteria, it needs to build an IN clause.
Is this documented anywhere?
Yes, http://guides.rubyonrails.org/active_record_querying.html#hash-conditions
2.3.3 Subset conditions
If you want to find records using the IN expression you can pass an array to the conditions hash:
Client.where(orders_count: [1,3,5])
This code will generate SQL like this:
SELECT * FROM clients WHERE (clients.orders_count IN (1,3,5))
My relationship is a Client can have many ClientJobs. I want to be able to find clients that perform both Job a and Job b. I'm using 3 select boxes so I can pick a maximum of three jobs to select from. The select boxes are populated from the database.
I know how to test for 1 job with the query below. But I need a way to use an AND operator to test that both jobs exist for that client.
#clients = Client.includes("client_jobs").where(
client_jobs: { job_name: params[:job1]})
Unfortunately it's easy to do an IN operation like below, but I'm thinking the syntax for AND should be similar....I hope
#lients = Client.includes("client_jobs").where(
client_jobs: { job_name: [params[:job1], params[:job2]]})
EDIT: Posting the sql statement that hits the database from the answer below
Core Load (0.6ms) SELECT `clients`.* FROM `clients`
CoreStatistic Load (1.9ms) SELECT `client_jobs`.* FROM `client_jobs`
WHERE `client_jobs `.`client_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,........)
The second query runs through every client_job in the database. It's never tested against the params[:job1], params[:job2] etc. So #clients returns nil crashing my view template
(undefined method `map' for nil:NilClass
In my opinion, a better approach then self-joins is to simply join ClientJobs and then use GROUP BY and HAVING clauses to filter out only those records that exactly match the given associated records.
performed_jobs = %w(job job2 job3)
Client.joins(:client_jobs).
where(client_jobs: { job_name: performed_jobs }).
group("clients.id").
having("count(*) = #{performed_jobs.count}")
Let's walk through this query:
first two clauses join the ClientJobs to Clients and filter out only those, that have any of the three jobs defined (it uses the IN clause)
next, we group these joined records by Client.id so that we get the clients back
finally, the having clause ensures we only return those clients that had exactly 3 ClientJob records joined in, i.e. only those that had all the three client jobs defined.
It is the trick with HAVING(COUNT(*) = ...) that turns the IN clause (which is essentially an OR-ed list of options) into a "must have all these" clause.
To do this in a single SQL query try the following:
jobs_with_same_user = ClientJob.select(:user_id).where(job_name: "<job_name1>", user_id: ClientJob.select(:user_id).where(job_name: "<job_name2>"))
#clients = Client.where(id: jobs_with_same_user)
Here's what this query is doing:
Select the user_ids of all Client jobs with [job_name2]
Select the user_ids of all Client jobs with user_id IN result set from (1) AND having [job_name1]
Select all users with using (2) as a subquery.
Not many know this but Rails 4+ supports subqueries. Basically this is a self join acting as subquery for the clients:
SELECT *
FROM clients
WHERE id IN <jobs_with_same_user>
Also, I'm not sure if you're referencing the client_jobs association in your view, but if you are, add the includes statement to avoid an N+1 query:
#clients = Client.includes(:client_jobs).where(id: jobs_with_same_user)
EDIT
If you prefer, the same result can be achieved with a self-referencing inner join:
jobs_with_same_user = ClientJob
.select("client_jobs.user_id AS user_id")
.joins("JOIN client_jobs inner_client_jobs ON inner_client_jobs.user_id=client_jobs.user_id")
.where(client_jobs: { job_name: "<first_job_name1>" }, inner_client_jobs: { job_name: "<job_name2>" })
#clients = Client.where(id: jobs_with_same_user)
I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..
To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.
Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq
I want to expand this question.
order by foreign key in activerecord
I'm trying to order a set of records based on a value in a really large table.
When I use join, it brings all the "other" records data into the objects.. As join should..
#table users 30+ columns
#table bids 5 columns
record = Bid.find(:all,:joins=>:users, :order=>'users.ranking DESC' ).first
Now record holds 35 fields..
Is there a way to do this without the join?
Here's my thinking..
With the join I get this query
SELECT * FROM "bids"
left join users on runner_id = users.id
ORDER BY ranking LIMIT 1
Now I can add a select to the code so I don't get the full user table, but putting a select in a scope is dangerous IMHO.
When I write sql by hand.
SELECT * FROM bids
order by (select users.ranking from users where users.id = runner_id) DESC
limit 1
I believe this is a faster query, based on the "explain" it seems simpler.
More important than speed though is that the second method doesn't have the 30 extra fields.
If I build in a custom select inside the scope, it could explode other searches on the object if they too have custom selects (there can be only one)
What you would like to achieve in active record writing is something along
SELECT b.* from bids b inner join users u on u.id=b.user_id order by u.ranking desc
In active record i would write such as:
Bids.joins("inner join users u on bids.user_id=u.id").order("u.ranking desc")
I think it's the only to make a join without fetching all attributes from the user models.
We have 2 tables: users and statuses
The status table has a user_id, status and occured_on. The status is either 'removed' or 'added' and occured_on is the date the user was removed or added.
I need the current added users. That is, all the (distinct) users whose newest status record is 'added'.
I'm using Rails, and have tried:
User
.joins(:statuses)
.where('statuses.status = ?', 'added')
.order('statuses.occured_on DESC')
.uniq
Which translates to the SQL:
SELECT DISTINCT users.*
FROM users
INNER JOIN statuses
ON statuses.user_id = users.id
WHERE statuses.status = 'added'
ORDER BY statuses.occured_on DESC
That gives me the error:
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...statuses.status = 'added') ORDER BY statuses.oc...
I'd be happy knowing either the Rails code that would work or the straight SQL.
Also, I'd prefer no sub-selects if possible.
Concider the following database schema change:
StatusTable:
StatusId
Status
UserId
ActiveFrom
ActiveTo
Afterwards you can add additional checks such as:
CONSTRAINT chk_from_to CHECK (ActiveFrom <= ActiveTo)
Then your query would look something like:
SELECT users.*
FROM users
JOIN statuses ON UserId = users.user_id AND ActiveFrom < CURRENT_TIMESTAMP AND ActiveTo > CURRENT_TIMESTAMP
WHERE statuses.Status = 'active'
With such structure you might need to change the way you change statuses, but from my own experience, this structure is much more flexible, and easier to query.
SELECT * FROM users INNER JOIN statuses ON users.id=statuses.user_id WHERE statuses.status='added' ORDER BY statuses.occured_on
After clarification, I don't think the schema is well designed for your goal. Can you clarify why you want the status change history contained in that table? My general approach to this would be that active users should be contained in a table called projects_users, containing project_id, user_id. When they are "removed" they should be removed from that table. Logs of the actions - adding and remove users from projects - should be stored in a separate table.
There's no good way that I'm aware of to write this query given your current design. Even if you fixed the errors, this runs error free in MySQL (which is exactly what you have)
SELECT DISTINCT `users`.* FROM `users`
INNER JOIN `projects_users`
ON `users`.`id`=`projects_users`.`user_id`
WHERE `status`='added'
ORDER BY `projects_users`.`occured_on` DESC
it still won't get you the correct results. The ORDER BY clause will just get you the most recent change to "added", it won't guarantee there is not a more recent "removed" action. To do that you'd need to compare the date of each most recent added record to the date of the most recent removed record, for each user, a nightmare.