How to count group by rows in rails? - ruby-on-rails

When I use User.count(:all, :group => "name"), I get multiple rows, but it's not what I want. What I want is the count of the rows. How can I get it?

Currently (18.03.2014 - Rails 4.0.3) this is correct syntax:
Model.group("field_name").count
It returns hash with counts as values
e.g.
SurveyReport.find(30).reports.group("status").count
#=> {
"pdf_generated" => 56
}

User.count will give you the total number of users and translates to the following SQL: SELECT count(*) AS count_all FROM "users"
User.count(:all, :group => 'name') will give you the list of unique names, along with their counts, and translates to this SQL: SELECT count(*) AS count_all, name AS name FROM "users" GROUP BY name
I suspect you want option 1 above, but I'm not clear on what exactly you want/need.

Probably you want to count the distinct name of the user?
User.count(:name, :distinct => true)
would return 3 if you have user with name John, John, Jane, Joey (for example) in the database.
________
| name |
|--------|
| John |
| John |
| Jane |
| Joey |
|________|

Try using User.find(:all, :group => "name").count
Good luck!

I found an odd way that seems to work. To count the rows returned from the grouping counts.
User Table Example
________
| name |
|--------|
| Bob |
| Bob |
| Joe |
| Susan |
|________|
Counts in the Groups
User.group(:name).count
# SELECT COUNT(*) AS count_all
# FROM "users"
# GROUP BY "users"."name"
=> {
"Bob" => 2,
"Joe" => 1,
"Susan" => 1
}
Row Count from the Counts in the Groups
User.group(:name).count.count
=> 5
Something Hacky
Here's something interesting I ran into, but it's quite hacky as it will add the count to every row, and doesn't play too well in active record land. I don't remember if I was able to get this into an Arel / ActiveRecord query.
SELECT COUNT(*) OVER() AS count, COUNT(*) AS count_all
FROM "users"
GROUP BY "users"."name"
[
{ count: 3, count_all: 2, name: "Bob" },
{ count: 3, count_all: 1, name: "Joe" },
{ count: 3, count_all: 1, name: "Susan" }
]

Related

Select rows with no match in join table with where condition

In a Rails app with Postgres I have a users, jobs and followers join table. I want to select jobs that are not followed by a specific user. But also jobs with no rows in the join table.
Tables:
users:
id: bigint (pk)
jobs:
id: bigint (pk)
followings:
id: bigint (pk)
job_id: bigint (fk)
user_id: bigint (fk)
Data:
sandbox_development=# SELECT id FROM jobs;
id
----
1
2
3
(3 rows)
sandbox_development=# SELECT id FROM users;
id
----
1
2
sandbox_development=#
SELECT id, user_id, job_id FROM followings;
id | user_id | job_id
----+---------+--------
1 | 1 | 1
2 | 2 | 2
(2 rows)
Expected result
# jobs
id
----
2
3
(2 rows)
Can I create a join query that is the equivalent of this?
sandbox_development=#
SELECT j.id FROM jobs j
WHERE NOT EXISTS(
SELECT 1 FROM followings f
WHERE f.user_id = 1 AND f.job_id = j.id
);
id
----
2
3
(2 rows)
Which does the job but is a PITA to create with ActiveRecord.
So far I have:
Job.joins(:followings).where(followings: { user_id: 1 })
SELECT "jobs".* FROM "jobs"
INNER JOIN "followings"
ON "followings"."job_id" = "jobs"."id"
WHERE "followings"."user_id" != 1
But since its an inner join it does not include jobs with no followers (job id 3). I have also tried various attempts at outer joins that either give all the rows or no rows.
In Rails 5, You can use #left_outer_joins with where not to achieve the result. Left joins doesn't return null rows. So, We need to add nil conditions to fetch the rows.
Rails 5 Query:
Job.left_outer_joins(:followings).where.not(followings: {user_id: 1}).or(Job.left_outer_joins(:followings).where(followings: {user_id: nil}))
Alternate Query:
Job.left_outer_joins(:followings).where("followings.user_id != 1 OR followings.user_id is NULL")
Postgres Query:
SELECT "jobs".* FROM "jobs" LEFT OUTER JOIN "followings" ON "followings"."job_id" = "jobs"."id" WHERE "followings"."user_id" != 1 OR followings.user_id is NULL;
I'm not sure I understand, but this has the output you want and use outer join:
SELECT j.*
FROM jobs j LEFT JOIN followings f ON f.job_id = j.id
LEFT JOIN users u ON u.id = f.user_id AND u.id = 1
WHERE u.id IS NULL;

ActiveRecord: Filtering duplicates

Given this table:
Users
| id | name | active |
| 1 | bob | true |
| 2 | bob | false |
| 3 | alice | false |
How can i query this table using ActiveRecord (Rails 4.2, PostgreSQL), if the resulting relation should
have all attributes populated
not contain duplicate names
prefer active records in favor to inactive ones
be capable of calling .count, where count returns an integer
remain an ActiveRecord::Relation
The correct result set should look like this:
Users
| id | name | active |
| 1 | bob | true |
| 3 | alice | false |
What i tried so far:
# Works as for the result set, but raises when calling .count
User.select('DISTINCT ON (users.name) *')
.order(users.name, users.active DESC')
This should work
User.select('DISTINCT name').order('name, active DESC').count
Maybe
User.where.not(name: nil).where.not(active: nil)
This returns all users with a populated name and active boolean (ids will always be populated)
Then on that we call
.uniq_by(:name)
This gives us only records with a unique name
And finally
.sort_by { |a| a.active ? 1 : 0 }
Puts records with active: true at the start. So the method is:
User.where.not(name: nil).where.not(active: nil).uniq_by(:name).sort_by { |a| a.active ? 1 : 0 }
On this query we can call count to get the number of objects.
EDIT:
To keep as AR change uniq_by to uniq:
User.where.not(name: nil).where.not(active: nil).uniq(name).sort_by { |a| a.active ? 1 : 0 }
Since you're using PostgreSQL, you can use DISTINCT ON. Here's the SQL:
select distinct on (name) * from users order by name, active desc
And in Active Record:
>> User.order(name: :desc, active: :desc).select("distinct on (name) *")
User Load (2.5ms) SELECT distinct on (name) * FROM "users" ORDER BY "users"."name" DESC, "users"."active" DESC LIMIT $1 [["LIMIT", 11]]
=> #<ActiveRecord::Relation [#<User id: 2, name: "bob", active: true, created_at: "2017-09-15 03:58:23", updated_at: "2017-09-15 03:58:23">, #<User id: 3, name: "alice", active: false, created_at: "2017-09-15 03:58:35", updated_at: "2017-09-15 03:58:35">]>

Select unique record with latest created date

| id | user_id | created_at (datetime) |
| 1 | 1 | 17 May 2016 10:31:34 |
| 2 | 1 | 17 May 2016 12:41:54 |
| 3 | 2 | 18 May 2016 01:13:57 |
| 4 | 1 | 19 May 2016 07:21:24 |
| 5 | 2 | 20 May 2016 11:23:21 |
| 6 | 1 | 21 May 2016 03:41:29 |
How can I get the result of unique and latest created_at user_id record, which will be record id 5 and 6 in the above case?
What I have tried so far
So far I am trying to use group_by to return a hash like this:
Table.all.group_by(&:user_id)
#{1 => [record 1, record 2, record 4, record 6], etc}
And select the record with maximum date from it? Thanks.
Updated solution
Thanks to Gordon answer, I am using find_by_sql to use raw sql query in ror.
#table = Table.find_by_sql("Select distinct on (user_id) *
From tables
Order by user_id, created_at desc")
#To include eager loading with find_by_sql, we can add this
ActiveRecord::Associations::Preloader.new.preload(#table, :user)
In Postrgres, you can use DISTINCT ON:
SELECT DISTINCT ON (user_id) *
FROM tables
ORDER BY user_id, created_at DESC;
I am not sure how to express this in ruby.
Table
.select('user_id, MAX(created_at) AS created_at')
.group(:user_id)
.order('created_at DESC')
Notice created_at is passed in as string in order call, since it's a result of aggregate function, not a column value.
1) Extract unique users form the table
Table.all.uniq(:user_id)
2) Find all records of each user.
Table.all.uniq(:user_id).each {|_user_id| Table.where(user_id: _user_id)}
3) Select the latest created
Table.all.uniq(:user_id).each {|_user_id| Table.where(user_id: _user_id).order(:created_at).last.created_at}
4) Return result in form of: [[id, user_id], [id, user_id] ... ]
Table.all.uniq(:user_id).map{|_user_id| [Table.where(user_id: _user_id).order(:created_at).last.id, _user_id]}
This should return [[6,1], [2,5]]

What is the difference between count and select('DISTINCT COUNT(xxx)') in ActiveRecord?

I have two queries that are similar:
StoreQuery.group(:location).count(:name)
vs
StoreQuery.group(:location).select('DISTINCT COUNT(name)')
I was expecting the results to be exactly the same but they're not. What is the difference between the two?
The difference is that the first query counts all names, and the second query counts unique names, ignoring duplicates. They will return different numbers if you have some names listed more than once.
With this sample data
id | name | location |
---+------+----------+
1 | NULL | US
2 | A | UK
3 | A | UK
4 | B | AUS
Let check the generated queries the results
1st query
StoreQuery.group(:location).count(:name)
Generated query:
SELECT location, COUNT(name) AS count FROM store_queries GROUP BY location
Result:
{US => 0, UK => 2, AUS => 1}
2nd query
StoreQuery.group(:location).select('DISTINCT COUNT(name)')
Generated query:
SELECT DISTINCT COUNT(name) FROM store_queries GROUP BY location
Result:
ActiveRecord::Relation [StoreQuery count: 0, StoreQuery count: 1, StoreQuery count: 1]
# Mean {US => 0, UK => 1, AUS => 1}
So the differences will be:
|1st query | 2nd query |
|----------+-----------+
# returned fields| 2 | 1 |
distinction | no | yes |
Btw, rails supports this:
StoreQuery.group(:location).count(:name, distinct: true)

How to find posts tagged with more than one tag in Rails and Postgresql

I have the models Post, Tag, and PostTag. A post has many tags through post tags. I want to find posts that are exclusively tagged with more than one tag.
has_many :post_tags
has_many :tags, through: :post_tags
For example, given this data set:
posts table
--------------------
id | title |
--------------------
1 | Carb overload |
2 | Heart burn |
3 | Nice n Light |
tags table
-------------
id | name |
-------------
1 | tomato |
2 | potato |
3 | basil |
4 | rice |
post_tags table
-----------------------
id | post_id | tag_id |
-----------------------
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
4 | 2 | 3 |
5 | 3 | 1 |
I want to find posts tagged with tomato AND basil. This should return only the "Heart burn" post (id 2). Likewise, if I query for posts tagged with tomato AND potato, it should return the "Carb overload" post (id 1).
I tried the following:
Post.joins(:tags).where(tags: { name: ['basil', 'tomato'] })
SQL
SELECT "posts".* FROM "posts"
INNER JOIN "post_tags" ON "post_tags"."post_id" = "posts"."id"
INNER JOIN "tags" ON "tags"."id" = "post_tags"."tag_id"
WHERE "tags"."name" IN ('basil', 'tomato')
This returns all three posts because all share the tag tomato. I also tried this:
Post.joins(:tags).where(tags: { name 'basil' }).where(tags: { name 'tomato' })
SQL
SELECT "posts".* FROM "posts"
INNER JOIN "post_tags" ON "post_tags"."post_id" = "posts"."id"
INNER JOIN "tags" ON "tags"."id" = "post_tags"."tag_id"
WHERE "tags"."name" = 'basil' AND "tags"."name" = 'tomato'
This returns no records.
How can I query for posts tagged with multiple tags?
You may want to review the possible ways to write this kind of query in this answer for applying conditions to multiple rows in a join. Here is one possible option for implementing your query in Rails using 1B, the sub-query approach...
Define a query in the PostTag model that will grab up the Post ID values for a given Tag name:
# PostTag.rb
def self.post_ids_for_tag(tag_name)
joins(:tag).where(tags: { name: tag_name }).select(:post_id)
end
Define a query in the Post model that will grab up the Post records for a given Tag name, using a sub-query structure:
# Post.rb
def self.for_tag(tag_name)
where("id IN (#{PostTag.post_ids_for_tag(tag_name).to_sql})")
end
Then you can use a query like this:
Post.for_tag("basil").for_tag("tomato")
Use method .includes, like this:
Item.where(xpto: "test")
.includes({:orders =>[:suppliers, :agents]}, :manufacturers)
Documentation to .includes here.

Resources