How to select max(date) and group by client_id? - ruby-on-rails

So, I got this working the way I want in pure sql:
select * from clients c
join insurance_providers p on c.id = p.client_id
where p.effective_on =
(select max(effective_on)
from insurance_providers group by client_id having p.client_id = client_id)
and user_id = 2; #user_id =2 where 2 represents current_user.id
Here what I tried in the console:
Client.joins(:insurance_providers)
.select('max(insurance_providers.effective_on)')
.group(:client_id)
.where('user_id = 2')
It promptly exploded in my face with:
NoMethodError: undefined method `group' for Mon, 08 Jul 2013:Date
It looks like I'm just getting the date itself returned from the select statement. I need something like "where effective_on = .select('max..."
any help would be greatly appreciated.
UPDATE: I'm getting closer with this:
InsuranceProvider.maximum(:effective_on, :group => 'client_id')
but I'm not sure how to join the clients table in to get all the info I need.
In the rails console, both of these:
Client.joins(:insurance_providers).maximum(:effective_on, :group => 'client_id')
Client.joins(:insurance_providers.maximum(:effective_on, :group => 'client_id'))
cause this error:
NoMethodError: undefined method `group' for Mon, 08 Jul 2013:Date
UPDATE:
this is closer, but I need a having clause on the end - just not sure how to tie the inner table and outer table together (like in the sql: p.client_id = client_id):
insuranceProvider = InsuranceProvider.where("effective_on = (SELECT MAX(effective_on) FROM insurance_providers group by client_id)")
InsuranceProvider Load (1.0ms) SELECT "insurance_providers".* FROM "insurance_providers" WHERE (effective_on = (SELECT MAX(effective_on) FROM insurance_providers group by client_id))
PG::CardinalityViolation: ERROR: more than one row returned by a subquery used as an expression
: SELECT "insurance_providers".* FROM "insurance_providers" WHERE (effective_on = (SELECT MAX(effective_on) FROM insurance_providers group by client_id))
ActiveRecord::StatementInvalid: PG::CardinalityViolation: ERROR: more than one row returned by a subquery used as an expression
: SELECT "insurance_providers".* FROM "insurance_providers" WHERE (effective_on = (SELECT MAX(effective_on) FROM insurance_providers group by client_id))
UPDATE: here's some progress.
This seems to be what I need, but I don't know how to join it to the clients table:
InsuranceProvider.where("effective_on = (SELECT MAX(effective_on) FROM insurance_providers p group by client_id having p.client_id = insurance_providers.client_id)")
This gives me the insurance_providers grouped by the client_id. I need to join to this resultset on the client_id.
This does NOT work:
InsuranceProvider.where("effective_on = (SELECT MAX(effective_on) FROM insurance_providers p group by client_id having p.client_id = insurance_providers.client_id)").client
resulting in a "NoMethodError":
undefined method `client' for #<ActiveRecord::Relation::ActiveRecord_Relation_InsuranceProvider:0x007fe987725790>
UPDATE:
This is getting me the clients I need!
Client.joins(:insurance_providers).where("insurance_providers.effective_on = (SELECT MAX(effective_on) FROM insurance_providers p group by client_id having p.client_id = insurance_providers.client_id)")
But I can't get to the insurance_providers table. UGH! This is getting stickier....
UPDATE:
client.insurance_providers.order('effective_on DESC').first.copay
sometimes taking a break is all you need.

So, in my controller, I have this:
#clients = Client.joins(:insurance_providers)
.where("insurance_providers.effective_on = (
SELECT MAX(effective_on)
FROM insurance_providers p
GROUP BY client_id
HAVING p.client_id = insurance_providers.client_id
)")
Then in my view, I have this:
client.insurance_providers.order('effective_on DESC').first.copay

I'm not sure why you were getting the undefined method 'group' error with the select in your first query, it shouldn't return anything at that point until you attempt to use one of the fields explicitly, or call .load or .first, etc.
Maximum is not the way to go either because that WILL immediately return the data, and its basically the same as saying in SQL "SELECT MAX(effective_on) FROM insurance_providers", which is obviously not what you want.
To answer your question directly, to summarize what you're looking to do, I believe you're trying to find the Insurance Provider with the hightest effective_on date, and then be able to view the client associated with that provider. Please correct me if I am mistaken.
I think this will work:
insuranceProvider =
InsuranceProvider.where("effective_on = (SELECT MAX(effective_on) FROM insurance_providers)")
.group(:client_id)
You shouldn't need anything else beyond that. Notice how I'm invoking the query on the InsuranceProvider model instead; Remember that with Rails, you can easily get the model object's associated records using your has_many and belongs_to relationship descriptors.
Therefore, in order to get the associated Client model information, it is important that your InsuranceProvider class has a line that looks like belongs_to :client. This is required in this case, otherwise Rails doesn't know that this model relates to anything else. If you have that in there, then in order to get the Client information, all you simply need to do is
client = insuranceProvider.client
This will result in a second query which lazy-loads the client information for that insurance provider, and you're good to go.
EDIT
Based on the discussion in the comments, my original solution is not quite what you're looking for (and syntactically invalid for non-MySQL databases).
I answered a similar question here once that is somewhat related to this, so maybe that information could be helpful.
Basically, what I think you'll need to do is grab your list of Clients, or grab your single client object, whatever you need to do, and invoke the association with a .where clause, like so:
client = Client.first # this is for illustration so that we have a client object to work with
insurance_provider =
client.insurance_providers
.where("effective_on = (SELECT MAX(effective_on) FROM insurance_providers WHERE client_id = ?", client.id)
Or, if you want to avoid having to inject the client.id manually, you can cause a dependent subquery, like so:
insurance_provider =
client.insurance_providers
.joins(:client)
.where("effective_on = (SELECT MAX(effective_on) FROM insurance_providers WHERE client_id = clients.id")

Related

How to get a most recent value group by year by using SQL

I have a Company model that has_many Statement.
class Company < ActiveRecord::Base
has_many :statements
end
I want to get statements that have most latest date field grouped by fiscal_year_end field.
I implemented the function like this:
c = Company.first
c.statements.to_a.group_by{|s| s.fiscal_year_end }.map{|k,v| v.max_by(&:date) }
It works ok, but if possible I want to use ActiveRecord query(SQL), so that I don't need to load unnecessary instance to memory.
How can I write it by using SQL?
select t.username, t.date, t.value
from MyTable t
inner join (
select username, max(date) as MaxDate
from MyTable
group by username
) tm on t.username = tm.username and t.date = tm.MaxDate
For these kinds of things, I find it helpful to get the raw SQL working first, and then translate it into ActiveRecord afterwards. It sounds like a textbook case of GROUP BY:
SELECT fiscal_year_end, MAX(date) AS max_date
FROM statements
WHERE company_id = 1
GROUP BY fiscal_year_end
Now you can express that in ActiveRecord like so:
c = Company.first
c.statements.
group(:fiscal_year_end).
order(nil). # might not be necessary, depending on your association and Rails version
select("fiscal_year_end, MAX(date) AS max_date")
The reason for order(nil) is to prevent ActiveRecord from adding ORDER BY id to the query. Rails 4+ does this automatically. Since you aren't grouping by id, it will cause the error you're seeing. You could also order(:fiscal_year_end) if that is what you want.
That will give you a bunch of Statement objects. They will be read-only, and every attribute will be nil except for fiscal_year_end and the magically-present new field max_date. These instances don't represent specific statements, but statement "groups" from your query. So you can do something like this:
- #statements_by_fiscal_year_end.each do |s|
%tr
%td= s.fiscal_year_end
%td= s.max_date
Note there is no n+1 query problem here, because you fetched everything you need in one query.
If you decide that you need more than just the max date, e.g. you want the whole statement with the latest date, then you should look at your options for the greatest n per group problem. For raw SQL I like LATERAL JOIN, but the easiest approach to use with ActiveRecord is DISTINCT ON.
Oh one more tip: For debugging weird errors, I find it helpful to confirm what SQL ActiveRecord is trying to use. You can use to_sql to get that:
c = Company.first
puts c.statements.
group(:fiscal_year_end).
select("fiscal_year_end, MAX(date) AS max_date").
to_sql
In that example, I'm leaving off order(nil) so you can see that ActiveRecord is adding an ORDER BY clause you don't want.
for example you want to get all statements by start of the months you should use this
#companey = Company.first
#statements = #companey.statements.find(:all, :order => 'due_at, id', :limit => 50)
then group them as you want
#monthly_statements = #statements.group_by { |statement| t.due_at.beginning_of_month }
Building upon Bharat's answer you can do this type of query in Rails using find_by_sql in this way:
Statement.find_by_sql ["Select t.* from statements t INNER JOIN (
SELECT fiscal_year_end, max(date) as MaxDate GROUP BY fiscal_year_end
) tm on t.fiscal_year_end = tm.fiscal_year_end AND
t.created_at = tm.MaxDate WHERE t.company_id = ?", company.id]
Note the last where part to make sure the statements belong to a specific company instance, and that this is called from the class. I haven't tested this with the array form, but I believe you can turn this into a scope and use it like this:
# In Statement model
scope :latest_from_fiscal_year, lambda |enterprise_id| {
find_by_sql[..., enterprise_id] # Query above
}
# Wherever you need these statements for a particular company
company = Company.find(params[:id])
latest_statements = Statement.latest_from_fiscal_year(company.id)
Note that if you somehow need all the latest statements for all companies then this most likely leave you with a N+1 queries problem. But that is a beast for another day.
Note: If anyone else has a way to have this query work on the association without using the last where part (company.statements.latest_from_year and such) let me know and I'll edit this, in my case in rails 3 it just pulled em from the whole table without filtering.

Sequel -- How To Construct This Query?

I have a users table, which has a one-to-many relationship with a user_purchases table via the foreign key user_id. That is, each user can make many purchases (or may have none, in which case he will have no entries in the user_purchases table).
user_purchases has only one other field that is of interest here, which is purchase_date.
I am trying to write a Sequel ORM statement that will return a dataset with the following columns:
user_id
date of the users SECOND purchase, if it exists
So users who have not made at least 2 purchases will not appear in this dataset. What is the best way to write this Sequel statement?
Please note I am looking for a dataset with ALL users returned who have >= 2 purchases
Thanks!
EDIT FOR CLARITY
Here is a similar statement I wrote to get users and their first purchase date (as opposed to 2nd purchase date, which I am asking for help with in the current post):
DB[:users].join(:user_purchases, :user_id => :id)
.select{[:user_id, min(:purchase_date)]}
.group(:user_id)
You don't seem to be worried about the dates, just the counts so
DB[:user_purchases].group_and_count(:user_id).having(:count > 1).all
will return a list of user_ids and counts where the count (of purchases) is >= 2. Something like
[{:count=>2, :user_id=>1}, {:count=>7, :user_id=>2}, {:count=>2, :user_id=>3}, ...]
If you want to get the users with that, the easiest way with Sequel is probably to extract just the list of user_ids and feed that back into another query:
DB[:users].where(:id => DB[:user_purchases].group_and_count(:user_id).
having(:count > 1).all.map{|row| row[:user_id]}).all
Edit:
I felt like there should be a more succinct way and then I saw this answer (from Sequel author Jeremy Evans) to another question using select_group and select_more : https://stackoverflow.com/a/10886982/131226
This should do it without the subselect:
DB[:users].
left_join(:user_purchases, :user_id=>:id).
select_group(:id).
select_more{count(:purchase_date).as(:purchase_count)}.
having(:purchase_count > 1)
It generates this SQL
SELECT `id`, count(`purchase_date`) AS 'purchase_count'
FROM `users` LEFT JOIN `user_purchases`
ON (`user_purchases`.`user_id` = `users`.`id`)
GROUP BY `id` HAVING (`purchase_count` > 1)"
Generally, this could be the SQL query that you need:
SELECT u.id, up1.purchase_date FROM users u
LEFT JOIN user_purchases up1 ON u.id = up1.user_id
LEFT JOIN user_purchases up2 ON u.id = up2.user_id AND up2.purchase_date < up1.purchase_date
GROUP BY u.id, up1.purchase_date
HAVING COUNT(up2.purchase_date) = 1;
Try converting that to sequel, if you don't get any better answers.
The date of the user's second purchase would be the second row retrieved if you do an order_by(:purchase_date) as part of your query.
To access that, do a limit(2) to constrain the query to two results then take the [-1] (or last) one. So, if you're not using models and are working with datasets only, and know the user_id you're interested in, your (untested) query would be:
DB[:user_purchases].where(:user_id => user_id).order_by(:user_purchases__purchase_date).limit(2)[-1]
Here's some output from Sequel's console:
DB[:user_purchases].where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT * FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"
Add the appropriate select clause:
.select(:user_id, :purchase_date)
and you should be done:
DB[:user_purchases].select(:user_id, :purchase_date).where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT user_id, purchase_date FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"

Merge 2 relations on OR instead of AND

I have these two pieces of code that each return a relation inside the Micropost model.
scope :including_replies, lambda { |user| where("microposts.in_reply_to = ?", user.id)}
def self.from_users_followed_by(user)
followed_user_ids = user.followed_user_ids
where("user_id IN (?) OR user_id = ?", followed_user_ids, user)
end
When I run r1 = Micropost.including_replies(user) I get a relation with two results with the following SQL:
SELECT `microposts`.* FROM `microposts` WHERE (microposts.in_reply_to = 102) ORDER BY
microposts.created_at DESC
When I run r2 = Micropost.from_users_followed_by(user) I get a relation with one result with the following SQL:
SELECT `microposts`.* FROM `microposts` WHERE (user_id IN (NULL) OR user_id = 102) ORDER
BY microposts.created_at DESC
Now when I merge the relations like so r3 = r1.merge(r2) I got zero results but was expecting three. The reason for this is that the SQL looks like this:
SELECT `microposts`.* FROM `microposts` WHERE (microposts.in_reply_to = 102) AND
(user_id IN (NULL) OR user_id = 102) ORDER BY microposts.created_at DESC
Now what I need is (microposts.in_reply_to = 102) OR (user_id IN (NULL) OR user_id = 102)
I need an OR instead of an AND in the merged relation.
Is there a way to do this?
Not directly with Rails. Rails does not expose any way to merge ActiveRelation (scoped) objects with OR. The reason is that ActiveRelation may contain not only conditions (what is described in the WHERE clause), but also joins and other SQL clauses for which merging with OR is not well-defined.
You can do this either with Arel directly (which ActiveRelation is built on top of), or you can use Squeel, which exposes Arel functionality through a DSL (which may be more convenient). With Squeel, it is still relevant that ActiveRelations cannot be merged. However Squeel also provides Sifters, which represent conditions (without any other SQL clauses), which you can use. It would involve rewriting the scopes as sifters though.

How do I get Rails ActiveRecord to generate optimized SQL?

Let's say that I have 4 models which are related in the following ways:
Schedule has foreign key to Project
Schedule has foreign key to User
Project has foreign key to Client
In my Schedule#index view I want the most optimized SQL so that I can display links to the Schedule's associated Project, Client, and User. So, I should not pull all of the columns for the Project, Client, and User; only their IDs and Name.
If I were to manually write the SQL it might look like this:
select
s.id,
s.schedule_name,
s.schedule_type,
s.project_id,
p.name project_name,
p.client_id client_id,
c.name client_name,
s.user_id,
u.login user_login,
s.created_at,
s.updated_at,
s.data_count
from
Users u inner join
Clients c inner join
Schedules s inner join
Projects p
on p.id = s.project_id
on c.id = p.client_id
on u.id = s.user_id
order by
s.created_at desc
My question is: What would the ActiveRecord code look like to get Rails 3 to generate that SQL? For example, somthing like:
#schedules = Schedule. # ?
I already have the associations setup in the models (i.e. has_many / belongs_to).
I think this will build (or at least help) you get what you're looking for:
Schedule.select("schedules.id, schedules.schedule_name, projects.name as project_name").joins(:user, :project=>:client).order("schedules.created_at DESC")
should yield:
SELECT schedules.id, schedules.schedule_name, projects.name as project_name FROM `schedules` INNER JOIN `users` ON `users`.`id` = `schedules`.`user_id` INNER JOIN `projects` ON `projects`.`id` = `schedules`.`project_id` INNER JOIN `clients` ON `clients`.`id` = `projects`.`client_id`
The main problem I see in your approach is that you're looking for schedule objects but basing your initial "FROM" clause on "User" and your associations given are also on Schedule, so I built this solution based on the plain assumption that you want schedules!
I also didn't include all of your selects to save some typing, but you get the idea. You will simply have to add each one qualified with its full table name.

Ruby/Rails - Find Foreign Key With Most Instances in Model

I have a joined model that has a foreign key for the model event.
The joined model is called Goals. I'm trying to find the proper find condition to figure out which event_id has the most instances in the Goal join model. Essenially which foreign key id has the most entries in the join model.
Is there a way to do this?
Goal.where(:event.id => ??????? ).first
Couldn't come up with a more elegant solution but try this:
results = Goal.connection.select_all('SELECT COUNT(*) as amount, event_id FROM goals GROUP BY event_id ORDER BY amount DESC LIMIT 0, xx')
raise results.inspect
If you just want the one most event_id with most entries you can also use:
event_id = Goal.connection.select_one('SELECT COUNT(*) as amount, event_id FROM goals GROUP BY event_id ORDER BY amount DESC LIMIT 1').first
If you have set up your models correctly you should be able to do this (if this is not the case then: setup your models correctly):
if Event.all.length == 0
return
end
eventMax = Event.first
Event.all.each do |e|
eventMax = e.goals.length>eventMax.goals.length?e:eventMax
end
#output or do whatever with your newly found event
puts eventMax.to_json
The solution from Danny is not really good.
You should never (or at least very rarely) have to write sql by yourself in rails.

Resources