I have a table: people with a column named: age.
How can I get a count of the people with each age, ordered from oldest to youngest, filtered by ages with at least 2 people in it?
How I would write it in raw SQL:
SELECT
COUNT(1) AS people_count,
age
FROM people
GROUP BY age
HAVING people_count > 1
ORDER BY age DESC
In Rails (I'm not sure how to do it):
Person.group(:age).count will get me the counts by age, but I can't figure out how to order it descendingly by age, or add the having clause.
Try something like:
Person.select("id, age").group(:id, :age).having("count(id) > 1").order("age desc")
I find the other answers didn't work for me. I had to do something like this
Person.group(:age).having('count(*) > 1').order('age desc').count
Person.select('COUNT(1) as people_count').order('age DESC').group(:age).having('people_count > 1')
Related
I'm trying to only display the results of people of age 20 and above, how would i do this? I've got code that displays the age:
I've tried to use if statements but not quite sure where to place it.
select idstaff, name, date_of_birth,
date_format(curdate(), '%Y') - date_format(date_of_birth, '%Y') -
(date_format(curdate(), '00-%m-%d') < date_format(date_of_birth, '00-%m-%d')) as age
Try this:
Select idstaff, name, date_of_birth,
TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE()) as age
FROM your_table
WHERE
TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE()) >= 20
TIMESTAMDIFF(YEAR,date_of_birth,CURDATE()) calculates number of years (hence age) and you don't need an IF -- just include it as a WHERE clause: to only display the results of people of age 20 and above.
I have a class called Membership. There could be multiple records with the same email. To retrieve all Membership records by email, I am creating an index on email and doing:
Membership.where(email: "example#example.com")
How expensive is the above operation? Is there a more efficient way of doing this?
Thanks!
I think the most efficient way is with group query like this:
Membership.select(:email).group(:email).having("count(*) > 1")
This will generate following query
SELECT "memberships"."email" FROM "memberships" GROUP BY "memberships"."email" HAVING (count(*) > 1)
That should work on PG and MySQL.
Hope it helps
EDIT
if you want to see duplicates, you can do this (put a count at the end)
Membership.select(:email).group(:email).having("count(*) > 1").count
This will give a collection with each email and number of duplicates listed like this:
{ 'some#duplicate.com' => 2, ...etc}
I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..
To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.
Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq
I try to find over a 3M table, all the users who have the same username. I read something like this may do the trick.
User.find(:all, :group => [:username], :having => "count(*) > 1" )
However since I'm using Postgres this return me ActiveRecord::StatementInvalid: PG::Error: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function.
I'm trying something like this
User.select('users.id, users.username').having("count(*) > 1").group('users.username')
But still get the same error. Any idea what I'm doing wrong?
Update: I made it somehow work using User.select('users.*').group('users.id').having('count(users.username) > 1') but this query returns me this which looks like an empty array even if is founding 5 records.
GroupAggregate (cost=9781143.40..9843673.60 rows=3126510 width=1365)
Filter: (count(username) > 1)
-> Sort (cost=9781143.40..9788959.68 rows=3126510 width=1365)
Sort Key: id
-> Seq Scan on users (cost=0.00..146751.10 rows=3126510 width=1365)
(5 rows)
=> []
Any idea why this is happening and how to get those 5 rows?
I think the best you could get is to get usernames for duplicate records. That can be achieved with
User.select(:username).group(:username).having('COUNT(username) > 1')
"group by" in database collapses each group into one row in output. Most likely what you are intending will be produced by the following query:
User.where("name in (select name from users group by name having count(*)>1)").order(:name)
The inner query above finds all names that appear more than once. Then we find all rows with these names. Ordering by name will make your further processing easier. To speedup, add index to column name in users table.
There are alternate Postgres specific ways to solve this, however the above will work across all databases.
How do I retrieve a set of records, ordered by count in Arel? I have a model which tracks how many views a product get. I want to find the X most frequently viewed products over the last Y days.
This problem has cropped up while migrating to PostgreSQL from MySQL, due to MySQL being a bit forgiving in what it will accept. This code, from the View model, works with MySQL, but not PostgreSQL due to non-aggregated columns being included in the output.
scope :popular, lambda { |time_ago, freq|
where("created_on > ?", time_ago).group('product_id').
order('count(*) desc').limit(freq).includes(:product)
}
Here's what I've got so far:
View.select("id, count(id) as freq").where('created_on > ?', 5.days.ago).
order('freq').group('id').limit(5)
However, this returns the single ID of the model, not the actual model.
Update
I went with:
select("product_id, count(id) as freq").
where('created_on > ?', time_ago).
order('freq desc').
group('product_id').
limit(freq)
On reflection, it's not really logical to expect a complete model when the results are made up of GROUP BY and aggregate functions results, as returned data will (most likely) match no actual model (row).
you have to extend your select clause with all column you wish to retrieve. or
select("views.*, count(id) as freq")
SQL would be:
SELECT product_id, product, count(*) as freq
WHERE created_on > '$5_days_ago'::timestamp
GROUP BY product_id, product
ORDER BY count(*) DESC, product
LIMIT 5;
Extrapolating from your example, it should be:
View.select("product_id, product, count(*) as freq").where('created_on > ?', 5.days.ago).
order("count(*) DESC" ).group('product_id, product').limit(5)
Disclaimer: Ruby syntax is a foreign language to me.