ARel: Add additional conditions to an outer join - ruby-on-rails

I have the following models in my Rails application:
class Shift < ActiveRecord::Base
has_many :schedules
scope :active, where(:active => true)
end
class Schedule < ActiveRecord::Base
belongs_to :shift
end
I wish to generate a collection of all active shifts and eager load any associated schedules that have occurs_on between two given dates. If a shift has no schedules between those dates, it should still be returned in the results.
Essentially, I want to generate SQL equivalent to:
SELECT shifts.*, schedules.*
FROM shifts
LEFT JOIN schedules ON schedules.shift_id = shifts.id
AND schedules.occurs_on BETWEEN '01/01/2012' AND '01/31/2012'
WHERE shifts.active = 't';
My first attempt was:
Shift.active.includes(:schedules).where("schedules.occurs_on BETWEEN '01/01/2012' AND '01/31/2012')
The problem is that the occurs_on filtering is done in the where clause, and not in the join. If a shift has no schedules in that period, it is not returned at all.
My second attempt was to use the joins method, but this does an inner join. Again, this will drop all shifts that have no schedules for that period.
I'm frustrated because I know the SQL I want AREL to generate, but I can't figure out how to express it with the API. Anyone?

you could try some pretty raw AREL. Disclaimer: I didn't have actual Schedule and Shift classes so i couldn't test this properly, but i used some existing tables to troubleshoot it on my own machine.
on = Arel::Nodes::On.new(
Arel::Nodes::Equality.new(Schedule.arel_table[:shift_id], Shift.arel_table[:id]).\
and(Arel::Nodes::Between.new(
Schedule.arel_table[:occurs_on],
Arel::Nodes::And.new(2.days.ago, Time.now)
))
)
join = Arel::Nodes::OuterJoin.new(Schedule.arel_table, on)
Shift.joins(join).where(active: true).to_sql

You can use a SQL fragment as the argument of your joins method call :
Shift.active.joins('LEFT OUTER JOIN schedules ON schedules.occurs_on...')

You can construct a raw sql query using Arel as follows:
#start_date
#end_date
#shift = Shift.arel_table
#schedule = Schedule.arel_table
#shift.join(#schedule)
.on(#schedule[:shift_id].eq(#shift[:id])
.and(#schedule[:occurs_on].between(#start_date..#end_date)))
.to_sql

Related

Count number of associations with a status in Ruby on Rails

I have a model named Project and Project has many Tasks
Task can have 3 different status(integer).
I want to get a list of Projects with counts of associated Tasks in status = 1, 2 and 3.
The best i can get to is have a method on Project
def open_tasks
self.tasks.where(:status => 1).count
end
But this will make another SQL for each count and it is very bad performance when loading 100 projects.
Is there a way to get it out in one SQL statement?
I can think of a couple of ways to do this...
(It's not a single sql statement but two, still quite performant though)...
Task.where(status: 1).group(:project_id).count
will give you a hash where the keys are project ids and the values are the task counts. You can then combine this with the list of projects.
You can use the ActiveRecord counter_cache to save in the project records a value for the number of open tasks. ActiveRecord will automatically update this for you. I believe you will need to add an association to the project model like this:
# app/models/project.rb
# needs to include a column called open_task_count
class Project < ActiveRecord::Base
has_many :open_tasks, class_name: Task, -> { where status: 1 }
end
class Task < ActiveRecord::Base
belongs_to :project, counter_cache: true
end
Project.select(
'projects.*',
'(SELECT COUNT(tasks.*) FROM tasks WHERE tasks.project_id = projects.id AND tasks.status = 0) AS status_0_count',
'(SELECT COUNT(tasks.*) FROM tasks WHERE tasks.project_id = projects.id AND tasks.status = 1) AS status_1_count'
).left_joins(:tasks)
Although there are more elegant ways (like lateral joins and CTEs) subqueries work on most DBs. If statuses is an ActiveRecord::Enum you can construct the subqueries by looping over the enum mapping:
class Project < ApplicationRecord
has_many :tasks
def self.with_task_counts
# constucts an array of SQL strings
statuses = Task.statuses.map do |key, int|
sql = Task.select('COUNT(*)')
.where('tasks.project_id = projects.id')
.where(status: key)
.to_sql
"(#{sql}) AS #{key}_tasks_count"
end
select(
'projects.*',
*statuses # * turns the array into a list of args
).left_joins(:tasks)
end
end
In Rails 4 you can still do a LEFT OUTER JOIN by using a SQL string:
class Project
def self.left_joins_tasks(*args)
deprecator = ActiveSupport::Deprecation.new("5.0", "MyApp")
deprecator.deprecation_warning("left_joins_tasks is deprecated, use `.left_joins(:tasks)` instead")
joins('LEFT OUTER JOIN tasks ON tasks.project_id = projects.id')
end
end
Using .joins works as well but gives an INNER join so rows with no tasks are filtered out. You can also use .includes.
I ended up using the counter_culture gem.
https://github.com/magnusvk/counter_culture

SQL not working for pg

I'm trying to use SQL to get information from a Postgres database using Rails.
This is what I've tried:
Select starts_at, ends_at, hours, employee.maxname, workorder.wonum from events where starts_at>'2018-03-14'
inner join employees on events.employee_id = employees.id
inner join workorders on events.workorder_id = workorders.id;
I get the following error:
ERROR: syntax error at or near "inner"
LINE 2: inner join employees on events.employee_id = employees.id
Sami's comment is correct, but since this question is tagged with ruby-on-rails you can try to use ActiveRecord's API to do the same:
Make sure that your models relations are defined
class Event < ActiveRecord::Base
belongs_to :employee
belongs_to :workorder
end
And then you can do something like:
Event
.where('starts_at > ?', '2018-03-14')
.joins(:employee, :workorder)
or
Event
.joins(:employee, :workorder)
.where('starts_at > ?', '2018-03-14')
And you don't need to worry which one goes first.
In general, it's suboptimal to create the SQL queries in rails if you don't absolutely need to because they're harder to maintain.
You request should look at this :
select starts_at, ends_at, hours, employee.maxname, workorder.wonum
from events
inner join employees on events.employee_id = employees.id
inner join workorders on events.workorder_id = workorders.id
where starts_at>'2018-03-14';

Increase performance: avoid looking for the right element in a collection

I have this situation.
activity.rb
belongs_to :user
belongs_to :cause
belongs_to :sub_cause
belongs_to :client
def amount
duration / 60.0 * user.hourly_cost_by_year(date.year).amount rescue 0
end
user.rb
has_many :hourly_costs # one hourly_cost for year
has_many :activities
def hourly_cost_by_year(year = Date.today.year)
hourly_costs.find { |hc| hc.year == year }
end
hourly_cost.rb
belongs_to :user
I have a big report where I achieved good performance (the number of SQL queries is fixed) but I think I could do better. The query I use is
activities = Activity.includes(:client, :cause, :sub_cause, user: :hourly_costs)
And this is ok, it's fast, but I think is improvable because hourly_cost_by_year method. I mean, activity has a date and I can use that date to know which of those hourly costs I should use. Something like this in activity
def self.user_with_single_hourly_cost
joins('LEFT JOIN users u ON u.id = activities.user_id').
joins('LEFT JOIN hourly_costs hc ON hc.user_id = u.id AND hc.year = EXTRACT(year from activities.date)')
end
But I don't how integrate this in my query. Whatever I tried did not work. I could use raw SQL but I'm trying to use ActiveRecord. I even thought to use redis to cache every hourly cost by user and year, could work, but I think this query, with the extract part, should do the best job because I'd have a flat table.
Update: I try to clarify. Whatever query I use in my action at some point I have to do
activities.sum(&:amount)
and that method, you know, is
def amount
duration / 60.0 * user.hourly_cost_by_year(date.year).amount rescue 0
end
And I don't know how to pick directly the hourly_cost I want without search between hourly_costs. Is this possible?
You may consider using Arel for this. Arel is the underlying query assembler for rails/activerecord (so no new dependencies) and can be very useful when building complex queries because it offers far more depth than the high level ActiveRecord::QueryMethods.
Obviously with a broader API comes more verbosity (which actually adds quite a bit to the readability) and less syntactical sugar which takes some getting used to but has proven indispensable for me on multiple occasions.
While I did not take the time to recreate your data structure something like this may work for you
activities = Activity.arel_table
users = User.arel_table
hourly_costs = HourlyCost.arel_table
activity_users_hourly_cost = activities
.join(users,Arel::Nodes::OuterJoin)
.on(activities[:user_id].eq(users[:id]))
.join(hourly_costs,Arel::Nodes::OuterJoin)
.on(hourly_costs[:user_id].eq(users[:id])
.and(hourly_costs[:year].eq(Arel::Nodes::Extract.new(activities[:date],'year'))
)
)
Activity.includes(:client, :cause, :sub_cause).joins(activity_users_hourly_cost.join_sources)
This will add the requested join e.g.
activity_users_hourly_cost.to_sql
#=> SELECT
FROM [activities]
LEFT OUTER JOIN [users] ON [activities].[user_id] = [users].[id]
LEFT OUTER JOIN [hourly_costs] ON [hourly_costs].[user_id] = [users].[id]
AND [hourly_costs].[year] = EXTRACT(YEAR FROM [activities].[date])
Update
If you just want to add the "hourly_cost" this should work for you
Activity.includes(:client, :cause, :sub_cause)
.joins(activity_users_hourly_cost.join_sources)
.select("activities.*, activities.duration / 60.0 * ISNULL([hourly_costs].[amount],0) as hourly_cost_by_year")
Please note that this will only return Activity objects but they will now have a method called hourly_cost_by_year which will return the result of that calculation. Full SQL will look like
SELECT
[activities].*,
activities.duration / 60.0 * ISNULL([hourly_costs].[amount],0) as hourly_cost_by_year
FROM [activities]
-- Dependant upon WHERE Clause
LEFT OUTER JOIN causes ON [activities].[cause_id] = [causes].[id]
LEFT OUTER JOIN sub_causes ON [activities].[subcause_id] = [subcauses].[id]
LEFT OUTER JOIN clients [activities].[client_id] = [clients].[id]
--
LEFT OUTER JOIN [users] ON [activities].[user_id] = [users].[id]
LEFT OUTER JOIN [hourly_costs] ON [hourly_costs].[user_id] = [users].[id]
AND [hourly_costs].[year] = EXTRACT(YEAR FROM [activities].[date])
You could build the select portion in Arel too if you like but seems overkill for such a simple statement.

ActiveRecord sort model on attribute of last has_many relation

I've been digging around for this for awhile... I can't find a graceful solution. I have loans and loans has_many :decisions. decisions has an attribute that I care about, called risk_rating.
I'd like to sort loans based on the most recent decision (based on created_at, per usual), but by the risk_rating.
Loan.includes(:decisions).references(:decisions).order('decisions.risk_rating DESC') doesn't work...
I want loans... sorted by their most recent decision's risk_rating. This seems like it should be easier than it is.
I'm currently doing this outside of the database like this, but it's chewing up time and memory:
Loan.all.sort do |x,y|
x.decisions.last.try(:risk_rating).to_f <=> y.decisions.last.try(:risk_rating).to_f
end
I'd like to show the performance I'm getting with the proposed answer, along with an inaccuracy...
Benchmark.bm do |x|
x.report{ Loan.joins('LEFT JOIN decisions ON decisions.loan_id = loans.id').group('loans.id').order('MAX(decisions.risk_rating) DESC').limit(10).map{|l| l.decisions.last.try(:risk_rating)} }
end
user system total real
0.020000 0.000000 0.020000 ( 20.573096)
=> [0.936775, 0.934465, 0.932088, 0.922352, 0.921882, 0.794724, 0.919432, 0.918385, 0.916952, 0.914938]
The order isn't right. That 0.794724 is out of place.
To that extent... I'm only seeing one attribute in the proposed answer. I don't see the connection =/
Alright, it looks like I'm working late tonight because I couldn't help but jump in:
class Loan < ApplicationRecord
has_many :decisions
has_one :latest_decision, -> { merge(Decision.latest) }, class_name: 'Decision'
end
class Decision < ApplicationRecord
belongs_to :loan
def latest
t1 = arel_table
t2 = arel_table.alias('t2')
# Self join based on `loan_id` prefer latest `created_at`
join_on = t1[:loan_id].eq(t2[:loan_id]).and(
t1[:created_at].lt(t2[:created_at]))
where(t2[:loan_id].eq(nil)).joins(
t1.create_join(t2, t1.create_on(join_condition), Arel::Nodes::OuterJoin)
)
end
end
Loan.includes(:latest_decision)
This doesn't sort, just provides the latest decision for each loan. Throwing an order that references access_codes messes things up because of the table aliasing. I don't have the time to work that kink out now, but I bet you can figure it out if you check out some of the great resources on Arel and how to use it with ActiveRecord. I really enjoy this one.
At first let's write sql-query which will select necessary data. SO contains a question which may helps here: Select most recent row with GROUP BY in MySQL. My best version:
SELECT loans.*
FROM loans
LEFT JOIN (
SELECT loan_id, MAX(id) as id
FROM decisions
GROUP BY loan_id) d ON d.loan_id = loans.id
LEFT JOIN decisions ON decisions.id = d.id
ORDER BY decisions.risk_rating DESC
This code suppose MAX(id) gives id of the recent row in group.
You may do the same query by this Rails code:
sub_query =
Decision.select('loan_id, MAX(id) as id').
group(:loan_id).to_sql
Loan.
joins("LEFT JOIN (#{sub_query}) d ON d.loan_id = loans.id").
joins("LEFT JOIN decisions ON decisions.id = d.id").
order("decisions.risk_rating DESC")
Unfortunately, I don't have MySQL at hand and I can't try this code. Hope it will work.

Rails - can't access custom join data

I've got a really complicated query (which finds bus connections between two towns) and I haven't got any idea how to access data from joins (I'd like to know at which stop does the connection start and at which does it end). Is it possible to access this data using ActiveRecord?
Course.joins("INNER JOIN stop_times as start_stop ON start_stop.course_id=courses.id")
.joins("INNER JOIN stop_times as end_stop ON end_stop.course_id = courses.id")
.joins('INNER JOIN stops as start_stopi ON start_stop.stop_id = start_stopi.id')
.joins('INNER JOIN stops as end_stopi ON end_stop.stop_id = end_stopi.id')
.where('start_stop.hour>= ? OR (start_stop.hour>= ? AND start_stop.minute>= ?)',hour,(hour+1)%24,minute)
.where('start_stopi.town_id = ? and end_stopi.town_id = ?',start_town,end_town)
.where('start_stop."order"<end_stop."order"').order('start_stop.minute ASC').order('start_stop.hour ASC')
EDIT:
I've managed to rewrite it to use active record joins, although it broken my names, it works.
Course.joins(end_stop_times: :stop).joins(start_stop_times: :stop)
.where('start_stop_times_courses.hour>= ? OR (start_stop_times_courses.hour>= ? AND start_stop_times_courses.minute>= ?)',hour,(hour+1)%24,minute)
.where('stops_stop_times.town_id = ? and stops.town_id = ?',start_town,end_town)
.where('start_stop_times_courses."order"<stop_times."order"')
.order('start_stop_times_courses.minute ASC').order('start_stop_times_courses.hour ASC')
Using this new query models are:
class Course < ActiveRecord::Base
belongs_to :carrier
has_many :end_stop_times, class_name: 'StopTime'
has_many :start_stop_times, class_name: 'StopTime'
class Stop < ActiveRecord::Base
belongs_to :town
class StopTime < ActiveRecord::Base
belongs_to :stop
belongs_to :course
You need to add sth like:
your_query.select('courses.*, start_stopi.id as start_stop_id, end_stopi.id as end_stop_id)
and then you can access it by calling start_stop_id and end_stop_id on course object.
However you should probably use association for this kind of operations. Could you show us you models?
Check your log for the output of this query, you should find that it starts with select courses.* - therefore it will not bring through data from the included tables.
You can add some select other_table.some_column statements to your query, but this isn't the rails way.
I would suggest you separate your scope into the relevant models - put scopes in the stop_times model (and others) so that you can call the scopes on the object you actually want to get data from.
When you're constructing custom SQL of that complexity I think you've taken the Rails-way of doing things too far. You're using practically no activerecord association information to construct it, and you've built a programming construct that is horribly ugly and difficult to read.
I'd advise that you rewrite it as well formatted SQL
results = ActiveRecord::Base.connection.execute(
"select c.*
from courses c
join stop_times ss on ss.course_id = c.id
join stop_times es on es.course_id = c.id
... etc ...
where (start_stop.hour >= #{ActiveRecord::Base.sanitize(hour)} or
... etc ...")
Now it could be that you can improve your models and associations to the point where this level of complexity is not required (eg. the associations between courses, stop_times (start) and stop_times (end) could probably be encapsulated in activerecord pretty well, but at the moment you seem to be falling between the pure SQL and the pure activerecord approaches in a very uncomfortable way.

Resources