Selecting distinct through join

Selecting distinct through join - ruby-on-rails

We have 2 tables: users and statuses
The status table has a user_id, status and occured_on. The status is either 'removed' or 'added' and occured_on is the date the user was removed or added.
I need the current added users. That is, all the (distinct) users whose newest status record is 'added'.
I'm using Rails, and have tried:
User
.joins(:statuses)
.where('statuses.status = ?', 'added')
.order('statuses.occured_on DESC')
.uniq
Which translates to the SQL:
SELECT DISTINCT users.*
FROM users
INNER JOIN statuses
ON statuses.user_id = users.id
WHERE statuses.status = 'added'
ORDER BY statuses.occured_on DESC
That gives me the error:
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...statuses.status = 'added') ORDER BY statuses.oc...
I'd be happy knowing either the Rails code that would work or the straight SQL.
Also, I'd prefer no sub-selects if possible.

Concider the following database schema change:
StatusTable:
StatusId
Status
UserId
ActiveFrom
ActiveTo
Afterwards you can add additional checks such as:
CONSTRAINT chk_from_to CHECK (ActiveFrom <= ActiveTo)
Then your query would look something like:
SELECT users.*
FROM users
JOIN statuses ON UserId = users.user_id AND ActiveFrom < CURRENT_TIMESTAMP AND ActiveTo > CURRENT_TIMESTAMP
WHERE statuses.Status = 'active'
With such structure you might need to change the way you change statuses, but from my own experience, this structure is much more flexible, and easier to query.

SELECT * FROM users INNER JOIN statuses ON users.id=statuses.user_id WHERE statuses.status='added' ORDER BY statuses.occured_on
After clarification, I don't think the schema is well designed for your goal. Can you clarify why you want the status change history contained in that table? My general approach to this would be that active users should be contained in a table called projects_users, containing project_id, user_id. When they are "removed" they should be removed from that table. Logs of the actions - adding and remove users from projects - should be stored in a separate table.
There's no good way that I'm aware of to write this query given your current design. Even if you fixed the errors, this runs error free in MySQL (which is exactly what you have)
SELECT DISTINCT `users`.* FROM `users`
INNER JOIN `projects_users`
ON `users`.`id`=`projects_users`.`user_id`
WHERE `status`='added'
ORDER BY `projects_users`.`occured_on` DESC
it still won't get you the correct results. The ORDER BY clause will just get you the most recent change to "added", it won't guarantee there is not a more recent "removed" action. To do that you'd need to compare the date of each most recent added record to the date of the most recent removed record, for each user, a nightmare.

Related

RoR PostgresQL - Get latest, distinct values from database

I am trying to query my PostgreSQL database to get the latest (by created_at) and distinct (by user_id) Activity objects, where each user has multiple activities in the database. The activity object is structured as such:
Activity(id, user_id, created_at, ...)
I first tried to get the below query to work:
Activity.order('created_at DESC').select('DISTINCT ON (activities.user_id) activities.*')
however, kept getting the below error:
ActiveRecord::StatementInvalid: PG::InvalidColumnReference: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
According to this post: PG::Error: SELECT DISTINCT, ORDER BY expressions must appear in select list, it looks like The ORDER BY clause can only be applied after the DISTINCT has been applied. This does not help me, as I want to get the distinct activities by user_id, but also want the activities to be the most recently created activities. Thus, I need the activities to be sorted before getting the distinct activities.
I have come up with a solution that works, but first grouping the activities by user id, and then ordering the activities within the groups by created_at. However, this takes two queries to do.
I was wondering if what I want is possible in just one query?

This should work, try the following
Solution 1
Activity.select('DISTINCT ON (activities.user_id) activities.*').order('created_at DESC')
Solution 2
If not work Solution 1 then this is helpful if you create a scope for this
activity model
scope :latest, -> {
select("distinct on(user_id) activities.user_id,
activities.*").
order("user_id, created_at desc")
}
Now you can call this anywhere like below
Activity.latest
Hope it helps

Order with DISTINCT ids in rails with postgres

I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..

To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.

Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq

Count, empty? fails for ActiveRecord with outer joins

I have two models, Monkey and Session, where Monkey has_many Session. I have a scope for Monkey:
scope :with_session_counts, -> {
joins("LEFT OUTER JOIN `sessions` ON `sessions`.`monkey_id` = `monkeys`.`id`")
.group(:id)
.select("`monkeys`.*, COUNT(DISTINCT `sessions`.`id`) as session_count")
}
in order to grab the number of associated Sessions (even when 0).
Querying #monkeys = Monkey.with_session_counts works as expected. However, when I test in my view:
<% unless #monkeys.empty?%>
I get this error:
Mysql2::Error: Column 'id' in field list is ambiguous:
SELECT COUNT(*) AS count_all, id AS id FROM `monkeys`
LEFT OUTER JOIN `sessions` ON `sessions`.`monkey_id` = `monkeys`.`id`
GROUP BY `monkeys`.`id`
How would I convince Rails to prefix id with the table name in presence of the JOIN?
Or is there a better alternative for the OUTER JOIN?
This applies equally to calling #monkeys.count(:all). I'm using RoR 4.2.1.
Update:
I have a partial fix for my issue (specify group("monkeys.id") explicitly) I wonder whether this is a bug in the code that generates the SELECT clause for count(:all). Note that in both cases (group("monkeys.id") and group(:id)) the GROUP BY part is generated correctly (i.e. with monkeys.id), but in the latter case the SELECT only contains id AS id. The reason I say 'partial' is because it works in that it does not break a call to empty?, but a call to count(:all) returns a Hash {monkey_id => number_of_sessions} instead of the number of records.
Update 2:
I guess my real question is: How can I get the number of associated sessions for each monkey, so that for all intents and purposes I can work with the query result as with Monkey.all? I know about counter cache but would prefer not to use it.

I believe it is not a bug. Like you added on your update, you have to specify the table that the id column belongs to. In this case group('monkeys.id') would do it.
How would the code responsible for generating the statement know the table to use? Without the count worked fine because it adds points.* to the projection and that is the one used by group by. However, if you actually wanted to group by Sessions id, you would have to specify it anyway.

Order by foreign key in activerecord: without a join?

I want to expand this question.
order by foreign key in activerecord
I'm trying to order a set of records based on a value in a really large table.
When I use join, it brings all the "other" records data into the objects.. As join should..
#table users 30+ columns
#table bids 5 columns
record = Bid.find(:all,:joins=>:users, :order=>'users.ranking DESC' ).first
Now record holds 35 fields..
Is there a way to do this without the join?
Here's my thinking..
With the join I get this query
SELECT * FROM "bids"
left join users on runner_id = users.id
ORDER BY ranking LIMIT 1
Now I can add a select to the code so I don't get the full user table, but putting a select in a scope is dangerous IMHO.
When I write sql by hand.
SELECT * FROM bids
order by (select users.ranking from users where users.id = runner_id) DESC
limit 1
I believe this is a faster query, based on the "explain" it seems simpler.
More important than speed though is that the second method doesn't have the 30 extra fields.
If I build in a custom select inside the scope, it could explode other searches on the object if they too have custom selects (there can be only one)

What you would like to achieve in active record writing is something along
SELECT b.* from bids b inner join users u on u.id=b.user_id order by u.ranking desc
In active record i would write such as:
Bids.joins("inner join users u on bids.user_id=u.id").order("u.ranking desc")
I think it's the only to make a join without fetching all attributes from the user models.

Rails Postgres Error GROUP BY clause or be used in an aggregate function

In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size

I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)

See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Selecting distinct through join - ruby-on-rails

Related

RoR PostgresQL - Get latest, distinct values from database

Order with DISTINCT ids in rails with postgres

Count, empty? fails for ActiveRecord with outer joins

Order by foreign key in activerecord: without a join?

Rails Postgres Error GROUP BY clause or be used in an aggregate function

Categories

Resources