Avoiding N+1 queries in a Rails multi-table query - ruby-on-rails

This is the query I've got at present:
SELECT t1.discipline_id AS discipline1,
t2.discipline_id AS discipline2,
COUNT(DISTINCT t1.product_id) as product_count
FROM (SELECT "product_disciplines".* FROM "product_disciplines") t1
INNER JOIN (SELECT "product_disciplines".* FROM "product_disciplines") t2
ON t1.product_id = t2.product_id
WHERE (t1.discipline_id < t2.discipline_id)
GROUP BY t1.discipline_id, t2.discipline_id
ORDER BY "product_count" DESC
Basically, I've got a list of Products and Disciplines, and each Product may be associated with one or more Disciplines. This query lets me figure out, for each possible (distinct) pair of disciplines, how many products are associated with them. I'll use this as input to a dependency wheel in Highcharts.
The problem arises when I involve Active Model Serializers. This is my controller:
class StatsController < ApplicationController
before_action :get_relationships, only: [:relationships]
def relationships
x = #relationships
.select('t1.discipline_id AS discipline1, t2.discipline_id AS discipline2, COUNT(DISTINCT t1.product_id) as product_count')
.order(product_count: :DESC)
.group('t1.discipline_id, t2.discipline_id')
render json: x, each_serializer: RelationshipSerializer
end
private
def get_relationships
query = ProductDiscipline.all
#relationships = ProductDiscipline
.from(query, :t1)
.joins("INNER JOIN (#{query.to_sql}) t2 on t1.product_id = t2.product_id")
.where('t1.discipline_id < t2.discipline_id')
end
end
each_serializer points to this class:
class RelationshipSerializer < ApplicationSerializer
has_many :disciplines do
Discipline.where(id: [object.discipline1, object.discipline2])
end
attribute :product_count
end
When I query the database, there are ~1300 possible pairs, which translates my single query in ~1300 Discipline lookups.
Is there a way to avoid the N+1 queries problem with this structure?

I ended up splitting this in two separate API queries. RelationshipSerializer saves just the discipline IDs,
class RelationshipSerializer < ApplicationSerializer
# has_many :disciplines do
# # Discipline.where(id: [object.discipline1, object.discipline2])
# [object.discipline1, object.discipline2].to_json
# end
attributes :discipline1, :discipline2
attribute :product_count
end
Since in my app I already need the list of available disciplines, I chose to correlate them client-side.

Related

Count number of associations with a status in Ruby on Rails

I have a model named Project and Project has many Tasks
Task can have 3 different status(integer).
I want to get a list of Projects with counts of associated Tasks in status = 1, 2 and 3.
The best i can get to is have a method on Project
def open_tasks
self.tasks.where(:status => 1).count
end
But this will make another SQL for each count and it is very bad performance when loading 100 projects.
Is there a way to get it out in one SQL statement?
I can think of a couple of ways to do this...
(It's not a single sql statement but two, still quite performant though)...
Task.where(status: 1).group(:project_id).count
will give you a hash where the keys are project ids and the values are the task counts. You can then combine this with the list of projects.
You can use the ActiveRecord counter_cache to save in the project records a value for the number of open tasks. ActiveRecord will automatically update this for you. I believe you will need to add an association to the project model like this:
# app/models/project.rb
# needs to include a column called open_task_count
class Project < ActiveRecord::Base
has_many :open_tasks, class_name: Task, -> { where status: 1 }
end
class Task < ActiveRecord::Base
belongs_to :project, counter_cache: true
end
Project.select(
'projects.*',
'(SELECT COUNT(tasks.*) FROM tasks WHERE tasks.project_id = projects.id AND tasks.status = 0) AS status_0_count',
'(SELECT COUNT(tasks.*) FROM tasks WHERE tasks.project_id = projects.id AND tasks.status = 1) AS status_1_count'
).left_joins(:tasks)
Although there are more elegant ways (like lateral joins and CTEs) subqueries work on most DBs. If statuses is an ActiveRecord::Enum you can construct the subqueries by looping over the enum mapping:
class Project < ApplicationRecord
has_many :tasks
def self.with_task_counts
# constucts an array of SQL strings
statuses = Task.statuses.map do |key, int|
sql = Task.select('COUNT(*)')
.where('tasks.project_id = projects.id')
.where(status: key)
.to_sql
"(#{sql}) AS #{key}_tasks_count"
end
select(
'projects.*',
*statuses # * turns the array into a list of args
).left_joins(:tasks)
end
end
In Rails 4 you can still do a LEFT OUTER JOIN by using a SQL string:
class Project
def self.left_joins_tasks(*args)
deprecator = ActiveSupport::Deprecation.new("5.0", "MyApp")
deprecator.deprecation_warning("left_joins_tasks is deprecated, use `.left_joins(:tasks)` instead")
joins('LEFT OUTER JOIN tasks ON tasks.project_id = projects.id')
end
end
Using .joins works as well but gives an INNER join so rows with no tasks are filtered out. You can also use .includes.
I ended up using the counter_culture gem.
https://github.com/magnusvk/counter_culture

How to order cumulated payments in ActiveRecord?

In my Rails app I have the following models:
class Person < ApplicationRecord
has_many :payments
end
class Payment < ApplicationRecord
belongs_to :person
end
How can I get the payments for each person and order them by sum?
This is my controller:
class SalesController < ApplicationController
def index
#people = current_account.people.includes(:payments).where(:payments => { :date => #range }).order("payments.amount DESC")
end
end
It gives me the correct numbers but the order is wrong. I want it to start with the person having the highest sum of payments within a range.
This is the current Payments table:
How can this be done?
This should work for you:
payments = Payment.arel_table
sum_payments = Arel::Table.new('sum_payments')
payments_total = payments.join(
payments.project(
payments[:person_id],
payments[:amount].sum.as('total')
)
.where(payments[:date].between(#range))
.group( payments[:person_id])
.as('sum_payments'))
.on(sum_payments[:person_id].eq(Person.arel_table[:id]))
This will create broken SQL (selects nothing from payments which is syntactically incorrect and joins to people which does not even exist in this query) but we really only need the join e.g.
payments_total.join_sources.first.to_sql
#=> INNER JOIN (SELECT payments.person_id,
# SUM(payments.amount) AS total
# FROM payments
# WHERE
# payments.date BETWEEN ... AND ...
# GROUP BY payments.person_id) sum_payments
# ON sum_payments.id = people.id
So knowing this we can pass the join_sources to ActiveRecord::QueryMethods#joins and let rails and arel handle the rest like so
current_account
.people
.includes(:payments)
.joins(payments_total.join_sources)
.where(:payments => { :date => #range })
.order("sum_payments.total DESC")
Which should result in SQL akin to
SELECT
-- ...
FROM
people
INNER JOIN payments ON payments.person_id = people.id
INNER JOIN ( SELECT payments.person_id,
SUM(payments.amount) as total
FROM payments
WHERE
payments.date BETWEEN -- ... AND ...
GROUP BY payments.person_id) sum_payments ON
sum_payments.person_id = people.id
WHERE
payments.date BETWEEN -- ... AND ..
ORDER BY
sum_payments.total DESC
This will show all the people having made payments in a given date range (along with those payments) sorted by the sum of those payments in descending order.
This is untested as I did not bother to set up a whole rails application but it should be functional.

How to use `join` method to access columns from multiple tables in rails

I have two tables, and I want to display all these columns on page.
Tables:
1.Users:
name, email, sex_id
abc, abc#q.com, 0
2. Masters:
type, sex, sexn
8, 0, female
I want to display:
name, email, sex
abc, abc#q.com, femail
Models' definition:
class Master < ApplicationRecord
has_many :users
end
class User < ApplicationRecord
belongs_to :master
def self.search(search)
where("name LIKE ?", "%#{search}%")
end
end
using #users = User.joins("INNER JOIN masters ON masters.sex = users.sex_id AND masters.type = 8"), I can only access columns from Users.
I want to access data from Masters. Using #users.first.master, I just get nil.
using#users = User.find_by_sql("SELECT * FROM users INNER JOIN masters ON masters.sex = users.sex_id AND masters.type = 8"), I can access columns from these two tables. So there's no problem with my data.
How do I use join method to access columns from multiple tables?
You have missed the select statement, try this one.
#users = User.select("users.*, masters.*").joins("INNER JOIN masters ON masters.sex = users.sex_id AND masters.type = 8")
You're almost there! You can have Rails do the join for you with the .includes() method:
def self.search(search)
# :master is singular
includes(:master).where("name LIKE ?", "%#{search}%")
end
Then:
#users.each do |user|
# singular again
puts user.master.sexn
end

grouping with a non-primary key in postgres / activerecord

I have a model Lap:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
scoped = joins(:car_model).where(:car_models => {:name => carmodel})
scoped
end
def self.fastest_per_car
scoped = select("laps.id",:car_id, :time, :mph).group("laps.id", :car_id, :time, :mph).order("time").limit(1)
scoped
end
end
I want to only return the fastest lap for each car.
So, I need to group the Laps by the Lap.car_id and then only return the fastest lap time based on that car, which would determined by the column Lap.time
Basically I would like to stack my methods in my controller:
#corvettes = Lap.by_carmodel("Corvette").fastest_per_car
Hopefully that makes sense...
When trying to run just Lap.fastest_per_car I am limiting everything to 1 result, rather than 1 result per each Car.
Another thing I had to do was add "laps.id" as :id was showing up empty in my results as well. If i just select(:id) it was saying ambiguous
I think a decent approach to this would be to add a where clause based on an efficient SQL syntax for returning the single fastest lap.
Something like this correlated subquery ...
select ...
from laps
where id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)
It's a little complex because of the need to tie-break on created_at.
The rails scope would just be:
where("laps.id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)")
An index on car_id would be pretty essential, and if that was a composite index on (car_id, time asc) then so much the better.
You are using limit which will return you one single value. Not one value per car. To return one car value per lap you just have to join the table and group by a group of columns that will identify one lap (id is the simplest).
Also, you can have a more ActiveRecord friendly friendly with:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
joins(:car_model).where(:car_models => {:name => carmodel})
end
def self.fastest_per_car
joins(:car_model)
.select("laps.*, MIN(car_models.time) AS min_time")
.group("laps.id")
.order("min_time ASC")
end
end
This is what I did and its working. If there is a better way to go about these please post your answer:
in my model:
def self.fastest_per_car
select('DISTINCT ON (car_id) *').order('car_id, time ASC').sort_by! {|ts| ts.time}
end

Rails: How do you sort a model by a column in a tabel two associations away?

The problem I'm having is like this: The model to sort is SchoolClass which has_many Students which in turn has_many Projects and each project has an end_date. I need to sort the SchoolClasses four ways: First by the earliest project end_date sort ascending and descending, and second by the latest project end_date sort ascending and descending. Does this make sense?
class SchoolClass < ActiveRecord::Base
has_many :students
end
class Student < ActiveRecord::Base
has_many :projects
belongs_to :school_class
end
class Project < ActiveRecord::Base
belongs_to :student
end
The only way I can think of doing it is very brute force and involves having a methods in the SchoolClass model that return the earliest and latest project dates for that instance like so:
students.collect(&:projects).flatten.select(&:end_date).sort.last
to find the latest project end_date for that class and then fetching out all the classes of the database and sorting them by that method. Surely this is just awful though, right? I would really like to find the rails way to get this ordering (with scopes maybe?). I thought something like SchoolClasses.joins(:students).joins(:projects).order('projects.end_date ASC') might work but that will crash rails (and looking at it now the logic is wrong anyway i think).
Any suggestions?
Try this:
scs = SchoolClass.joins({:students => :projects}).
select("school_classes.id,
MIN(projects.end_date) AS earliest_end_date,
MAX(projects.end_date) AS latest_end_date").
group("school_classes.id").
order("earliest_end_date ASC")
The objects in the scs array has following attributes:
id
earliest_end_date
latest_end_date
If you need additional attributes you can do the following
1) Add the additional attributes to the group and select methods
2) Query the full SchoolClass object using the id
3) Rewrite the query to use a nested JOIN
scs = SchoolClass.joins(
"JOIN (
SELECT a.id,
MIN(c.end_date) AS earliest_end_date,
MAX(c.end_date) AS latest_end_date
FROM school_classes a
JOIN students b ON b.class_id = a.id
JOIN projects c ON c.student_id = b.id
GROUP BY a.id
) d ON d.id = school_classes.id
").select("school_classes.*,
d.earliest_end_date AS earliest_end_date,
d.latest_end_date AS latest_end_date").
order("earliest_end_date ASC")

Resources