Rails - Active Record: Find all records which have a count on has_many association with certain attributes - ruby-on-rails

A user has many identities.
class User < ActiveRecord::Base
has_many :identities
end
class Identity < ActiveRecord::Base
belongs_to :user
end
An identity has an a confirmed:boolean column. I'd like to query all users that have an only ONE identity. This identity must also be confirmed false.
I've tried this
User.joins(:identities).group("users.id").having( 'count(user_id) = 1').where(identities: { confirmed: false })
But this returns users with one identity confirmed:false but they could also have additional identities if they are confirmed true. I only want users with only one identity confirmed:false and no additional identities that are have confirmed attribute as true.
I've also tried this but obviously it's slow and I'm looking for the right SQL to just do this in one query.
def self.new_users
users = User.joins(:identities).where(identities: { confirmed: false })
users.select { |user| user.identities.count == 1 }
end
Apologies upfront if this was already answered but I could not find a similar post.

One solution is to use rails nested queries
User.joins(:identities).where(id: Identity.select(:user_id).unconfirmed).group("users.id").having( 'count(user_id) = 1')
And here's the SQL generated by the query
SELECT "users".* FROM "users"
INNER JOIN "identities" ON "identities"."user_id" = "users"."id"
WHERE "users"."id" IN (SELECT "identities"."user_id" FROM "identities" WHERE "identities"."confirmed" = 'f')
GROUP BY users.id HAVING count(user_id) = 1
I still don't think this is the most efficient way. While I'm able to generate only one SQL query (meaning only one network call to the db), I'm still have to do two scans: one scan on the USERS table and one scan on the IDENTITIES table. This can be optimized by indexing the identities.confirmed column but this still doesn't solve the two full scans problem.
For those who understand the query plan here it is:
QUERY PLAN
-------------------------------------------------------------------------------------------
HashAggregate (cost=32.96..33.09 rows=10 width=3149)
Filter: (count(identities.user_id) = 1)
-> Hash Semi Join (cost=21.59..32.91 rows=10 width=3149)
Hash Cond: (identities.user_id = identities_1.user_id)
-> Hash Join (cost=10.45..21.61 rows=20 width=3149)
Hash Cond: (identities.user_id = users.id)
-> Seq Scan on identities (cost=0.00..10.70 rows=70 width=4)
-> Hash (cost=10.20..10.20 rows=20 width=3145)
-> Seq Scan on users (cost=0.00..10.20 rows=20 width=3145)
-> Hash (cost=10.70..10.70 rows=35 width=4)
-> Seq Scan on identities identities_1 (cost=0.00..10.70 rows=35 width=4)
Filter: (NOT confirmed)
(12 rows)

def self.new_users
joins(:identities).group("identities.user_id").having("count(identities.user_id) = 1").where(identities: {confirmed: false}).uniq
end

I think group_concat may be the answer here, if you have the function in your DBMS. (if not there may be an equivalent). This will collect all the values for the field from the group into a comma-separated string. We want ones where this string is equal to "false": ie, there's just one, and it's false (which i think is your requirement, it's a little unclear). . I think this should work if we let Rails handle the translation of false into however the DB stores it.
User.joins(:identities).group("identities.user_id").having("group_concat(identities.confirmed) = ?", false)
EDIT - if your database stores false as 0 then the above will generate sql like having group_concat(identities.confirmed) = 0. Because the result of the group_concat is a string, then it may (in some DBMS's) do a string-to-integer cast on the results before comparing it to 0, which will return lots of false positives if all the other strings cast to 0. In that case you can try this:
User.joins(:identities).group("identities.user_id").having("group_concat(identities.confirmed) = '?'", false)
(note quotes around ?)
EDIT2 - postgres version.
I've not tried this but it looks like recent versions of postgres have a function array_agg() which does the same as mysql's group_concat(). Because postgres stores true/false as 't'/'f' we shouldn't need to wrap the ? in quotes. Try this:
User.joins(:identities).group("identities.user_id").having("array_agg(identities.confirmed) = ?", false)

Related

ActiveRecord select with OR and exclusive limit

I have the need to query the database and retrieve the last 10 objects that are either active or declined. We use the following:
User.where(status: [:active, :declined]).limit(10)
Now we need to get the last 10 of each status (total of 20 users)
I've tried the following:
User.where(status: :active).limit(10).or(User.where(status: : declined).limit(10))
# SELECT "users".* FROM "users" WHERE ("users"."status" = $1 OR "users"."status" = $2) LIMIT $3
This does the same as the previous query and returns only 10 users, of mixed statuses.
How can I get the last 10 active users and the last 10 declined users with a single query?
I'm not sure that SQL allows doing what you want. First thing I would try would be to use a subquery, something like this:
class User < ApplicationRecord
scope :active, -> { where status: :active }
scope :declined, -> { where status: :declined }
scope :last_active_or_declined, -> {
where(id: active.limit(10).pluck(:id))
.or(where(id: declined.limit(10).pluck(:id))
}
end
Then somewhere else you could just do
User.last_active_or_declined()
What this does is to perform 2 different subqueries asking separately for each of the group of users and then getting the ones in the propper group ids. I would say you could even forget about the pluck(:id) parts since ActiveRecord is smart enough to add the proper select clause to your SQL, but I'm not 100% sure and I don't have any Rails project at hand where I can try this.
limit is not a permitted value for #or relationship. If you check the Rails code, the Error raised come from here:
def or!(other) # :nodoc:
incompatible_values = structurally_incompatible_values_for_or(other)
unless incompatible_values.empty?
raise ArgumentError, "Relation passed to #or must be structurally compatible. Incompatible values: #{incompatible_values}"
end
# more code
end
You can check which methods are restricted further down in the code here:
STRUCTURAL_OR_METHODS = Relation::VALUE_METHODS - [:extending, :where, :having, :unscope, :references]
def structurally_incompatible_values_for_or(other)
STRUCTURAL_OR_METHODS.reject do |method|
get_value(method) == other.get_value(method)
end
end
You can see in the Relation class here that limit is restricted:
SINGLE_VALUE_METHODS = [:limit, :offset, :lock, :readonly, :reordering,
:reverse_order, :distinct, :create_with, :skip_query_cache,
:skip_preloading]
So you will have to resort to raw SQL I'm afraid
I don't think you can do it with a single query, but you can do it with two queries, get the record ids, and then build a query using those record ids.
It's not ideal but as you're just plucking ids the impact isn't too bad.
user_ids = User.where(status: :active).limit(10).pluck(:id) + User.where(status: :declined).limit(10).pluck(id)
users = User.where(id: user_ids)
I think you can use UNION. Install active_record_union and replace or with union:
User.where(status: :active).limit(10).union(User.where(status: :declined).limit(10))

How to get a most recent value group by year by using SQL

I have a Company model that has_many Statement.
class Company < ActiveRecord::Base
has_many :statements
end
I want to get statements that have most latest date field grouped by fiscal_year_end field.
I implemented the function like this:
c = Company.first
c.statements.to_a.group_by{|s| s.fiscal_year_end }.map{|k,v| v.max_by(&:date) }
It works ok, but if possible I want to use ActiveRecord query(SQL), so that I don't need to load unnecessary instance to memory.
How can I write it by using SQL?
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
For these kinds of things, I find it helpful to get the raw SQL working first, and then translate it into ActiveRecord afterwards. It sounds like a textbook case of GROUP BY:
SELECT fiscal_year_end, MAX(date) AS max_date
FROM statements
WHERE company_id = 1
GROUP BY fiscal_year_end
Now you can express that in ActiveRecord like so:
c = Company.first
c.statements.
group(:fiscal_year_end).
order(nil). # might not be necessary, depending on your association and Rails version
select("fiscal_year_end, MAX(date) AS max_date")
The reason for order(nil) is to prevent ActiveRecord from adding ORDER BY id to the query. Rails 4+ does this automatically. Since you aren't grouping by id, it will cause the error you're seeing. You could also order(:fiscal_year_end) if that is what you want.
That will give you a bunch of Statement objects. They will be read-only, and every attribute will be nil except for fiscal_year_end and the magically-present new field max_date. These instances don't represent specific statements, but statement "groups" from your query. So you can do something like this:
- #statements_by_fiscal_year_end.each do |s|
%tr
%td= s.fiscal_year_end
%td= s.max_date
Note there is no n+1 query problem here, because you fetched everything you need in one query.
If you decide that you need more than just the max date, e.g. you want the whole statement with the latest date, then you should look at your options for the greatest n per group problem. For raw SQL I like LATERAL JOIN, but the easiest approach to use with ActiveRecord is DISTINCT ON.
Oh one more tip: For debugging weird errors, I find it helpful to confirm what SQL ActiveRecord is trying to use. You can use to_sql to get that:
c = Company.first
puts c.statements.
group(:fiscal_year_end).
select("fiscal_year_end, MAX(date) AS max_date").
to_sql
In that example, I'm leaving off order(nil) so you can see that ActiveRecord is adding an ORDER BY clause you don't want.
for example you want to get all statements by start of the months you should use this
#companey = Company.first
#statements = #companey.statements.find(:all, :order => 'due_at, id', :limit => 50)
then group them as you want
#monthly_statements = #statements.group_by { |statement| t.due_at.beginning_of_month }
Building upon Bharat's answer you can do this type of query in Rails using find_by_sql in this way:
Statement.find_by_sql ["Select t.* from statements t INNER JOIN (
SELECT fiscal_year_end, max(date) as MaxDate GROUP BY fiscal_year_end
) tm on t.fiscal_year_end = tm.fiscal_year_end AND
t.created_at = tm.MaxDate WHERE t.company_id = ?", company.id]
Note the last where part to make sure the statements belong to a specific company instance, and that this is called from the class. I haven't tested this with the array form, but I believe you can turn this into a scope and use it like this:
# In Statement model
scope :latest_from_fiscal_year, lambda |enterprise_id| {
find_by_sql[..., enterprise_id] # Query above
}
# Wherever you need these statements for a particular company
company = Company.find(params[:id])
latest_statements = Statement.latest_from_fiscal_year(company.id)
Note that if you somehow need all the latest statements for all companies then this most likely leave you with a N+1 queries problem. But that is a beast for another day.
Note: If anyone else has a way to have this query work on the association without using the last where part (company.statements.latest_from_year and such) let me know and I'll edit this, in my case in rails 3 it just pulled em from the whole table without filtering.

How to write query in active record to select from two or more tables in rails 3

I don't want to use join
I want to manually compare any field with other table field
for example
SELECT u.user_id, t.task_id
FROM tasks t, users u
WHERE u.user_id = t.user_id
how can i write this query in Rails ??
Assuming you have associations in your models, you can simply do as follow
User.joins(:tasks).select('users.user_id, tasks.task_id')
you can also do as follow
User.includes(:tasks).where("user.id =tasks.user_id")
includes will do eager loading check the example below or read eager loading at here
users = User.limit(10)
users.each do |user|
puts user.address.postcode
end
This will run 11 queries, it is called N+1 query problem(first you query to get all the rows then you query on each row again to do something). with includes Active Record ensures that all of the specified associations are loaded using the minimum possible number of queries.
Now when you do;
users = User.includes(:address).limit(10)
user.each do |user|
puts user.address.postcode
end
It will generate just 2 queries as follow
SELECT * FROM users LIMIT 10
SELECT addresses.* FROM addresses
WHERE (addresses.user_id IN (1,2,3,4,5,6,7,8,9,10))
Plus if you don't have associations then read below;
you should be have to look at http://guides.rubyonrails.org/association_basics.html
Assuming your are trying to do inner join, by default in rails when we associate two models and then query on them then we are doing inner join on those tables.
You have to create associations between the models example is given below
class User
has_many :reservations
...# your code
end
And in reservations
class Reservations
belongs_to :user
... #your code
end
Now when you do
User.joins(:reservations)
the generated query would look like as follow
"SELECT `users`.* FROM `users` INNER JOIN `reservations` ON `reservations`.`user_id` = `users`.`id`"
you can check the query by doing User.joins(:reservations).to_sql in terminal
Hopefully it would answer your question
User.find_by_sql("YOUR SQL QUERY HERE")
You can use as follows..
User.includes(:tasks).where("user.id =tasks.user_id").order(:user.id)

Sequel -- How To Construct This Query?

I have a users table, which has a one-to-many relationship with a user_purchases table via the foreign key user_id. That is, each user can make many purchases (or may have none, in which case he will have no entries in the user_purchases table).
user_purchases has only one other field that is of interest here, which is purchase_date.
I am trying to write a Sequel ORM statement that will return a dataset with the following columns:
user_id
date of the users SECOND purchase, if it exists
So users who have not made at least 2 purchases will not appear in this dataset. What is the best way to write this Sequel statement?
Please note I am looking for a dataset with ALL users returned who have >= 2 purchases
Thanks!
EDIT FOR CLARITY
Here is a similar statement I wrote to get users and their first purchase date (as opposed to 2nd purchase date, which I am asking for help with in the current post):
DB[:users].join(:user_purchases, :user_id => :id)
.select{[:user_id, min(:purchase_date)]}
.group(:user_id)
You don't seem to be worried about the dates, just the counts so
DB[:user_purchases].group_and_count(:user_id).having(:count > 1).all
will return a list of user_ids and counts where the count (of purchases) is >= 2. Something like
[{:count=>2, :user_id=>1}, {:count=>7, :user_id=>2}, {:count=>2, :user_id=>3}, ...]
If you want to get the users with that, the easiest way with Sequel is probably to extract just the list of user_ids and feed that back into another query:
DB[:users].where(:id => DB[:user_purchases].group_and_count(:user_id).
having(:count > 1).all.map{|row| row[:user_id]}).all
Edit:
I felt like there should be a more succinct way and then I saw this answer (from Sequel author Jeremy Evans) to another question using select_group and select_more : https://stackoverflow.com/a/10886982/131226
This should do it without the subselect:
DB[:users].
left_join(:user_purchases, :user_id=>:id).
select_group(:id).
select_more{count(:purchase_date).as(:purchase_count)}.
having(:purchase_count > 1)
It generates this SQL
SELECT `id`, count(`purchase_date`) AS 'purchase_count'
FROM `users` LEFT JOIN `user_purchases`
ON (`user_purchases`.`user_id` = `users`.`id`)
GROUP BY `id` HAVING (`purchase_count` > 1)"
Generally, this could be the SQL query that you need:
SELECT u.id, up1.purchase_date FROM users u
LEFT JOIN user_purchases up1 ON u.id = up1.user_id
LEFT JOIN user_purchases up2 ON u.id = up2.user_id AND up2.purchase_date < up1.purchase_date
GROUP BY u.id, up1.purchase_date
HAVING COUNT(up2.purchase_date) = 1;
Try converting that to sequel, if you don't get any better answers.
The date of the user's second purchase would be the second row retrieved if you do an order_by(:purchase_date) as part of your query.
To access that, do a limit(2) to constrain the query to two results then take the [-1] (or last) one. So, if you're not using models and are working with datasets only, and know the user_id you're interested in, your (untested) query would be:
DB[:user_purchases].where(:user_id => user_id).order_by(:user_purchases__purchase_date).limit(2)[-1]
Here's some output from Sequel's console:
DB[:user_purchases].where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT * FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"
Add the appropriate select clause:
.select(:user_id, :purchase_date)
and you should be done:
DB[:user_purchases].select(:user_id, :purchase_date).where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT user_id, purchase_date FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"

Merge 2 relations on OR instead of AND

I have these two pieces of code that each return a relation inside the Micropost model.
scope :including_replies, lambda { |user| where("microposts.in_reply_to = ?", user.id)}
def self.from_users_followed_by(user)
followed_user_ids = user.followed_user_ids
where("user_id IN (?) OR user_id = ?", followed_user_ids, user)
end
When I run r1 = Micropost.including_replies(user) I get a relation with two results with the following SQL:
SELECT `microposts`.* FROM `microposts` WHERE (microposts.in_reply_to = 102) ORDER BY
microposts.created_at DESC
When I run r2 = Micropost.from_users_followed_by(user) I get a relation with one result with the following SQL:
SELECT `microposts`.* FROM `microposts` WHERE (user_id IN (NULL) OR user_id = 102) ORDER
BY microposts.created_at DESC
Now when I merge the relations like so r3 = r1.merge(r2) I got zero results but was expecting three. The reason for this is that the SQL looks like this:
SELECT `microposts`.* FROM `microposts` WHERE (microposts.in_reply_to = 102) AND
(user_id IN (NULL) OR user_id = 102) ORDER BY microposts.created_at DESC
Now what I need is (microposts.in_reply_to = 102) OR (user_id IN (NULL) OR user_id = 102)
I need an OR instead of an AND in the merged relation.
Is there a way to do this?
Not directly with Rails. Rails does not expose any way to merge ActiveRelation (scoped) objects with OR. The reason is that ActiveRelation may contain not only conditions (what is described in the WHERE clause), but also joins and other SQL clauses for which merging with OR is not well-defined.
You can do this either with Arel directly (which ActiveRelation is built on top of), or you can use Squeel, which exposes Arel functionality through a DSL (which may be more convenient). With Squeel, it is still relevant that ActiveRelations cannot be merged. However Squeel also provides Sifters, which represent conditions (without any other SQL clauses), which you can use. It would involve rewriting the scopes as sifters though.

Resources