grouping with a non-primary key in postgres / activerecord - ruby-on-rails

I have a model Lap:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
scoped = joins(:car_model).where(:car_models => {:name => carmodel})
scoped
end
def self.fastest_per_car
scoped = select("laps.id",:car_id, :time, :mph).group("laps.id", :car_id, :time, :mph).order("time").limit(1)
scoped
end
end
I want to only return the fastest lap for each car.
So, I need to group the Laps by the Lap.car_id and then only return the fastest lap time based on that car, which would determined by the column Lap.time
Basically I would like to stack my methods in my controller:
#corvettes = Lap.by_carmodel("Corvette").fastest_per_car
Hopefully that makes sense...
When trying to run just Lap.fastest_per_car I am limiting everything to 1 result, rather than 1 result per each Car.
Another thing I had to do was add "laps.id" as :id was showing up empty in my results as well. If i just select(:id) it was saying ambiguous

I think a decent approach to this would be to add a where clause based on an efficient SQL syntax for returning the single fastest lap.
Something like this correlated subquery ...
select ...
from laps
where id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)
It's a little complex because of the need to tie-break on created_at.
The rails scope would just be:
where("laps.id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)")
An index on car_id would be pretty essential, and if that was a composite index on (car_id, time asc) then so much the better.

You are using limit which will return you one single value. Not one value per car. To return one car value per lap you just have to join the table and group by a group of columns that will identify one lap (id is the simplest).
Also, you can have a more ActiveRecord friendly friendly with:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
joins(:car_model).where(:car_models => {:name => carmodel})
end
def self.fastest_per_car
joins(:car_model)
.select("laps.*, MIN(car_models.time) AS min_time")
.group("laps.id")
.order("min_time ASC")
end
end

This is what I did and its working. If there is a better way to go about these please post your answer:
in my model:
def self.fastest_per_car
select('DISTINCT ON (car_id) *').order('car_id, time ASC').sort_by! {|ts| ts.time}
end

Related

ActiveRecord sort model on attribute of last has_many relation

I've been digging around for this for awhile... I can't find a graceful solution. I have loans and loans has_many :decisions. decisions has an attribute that I care about, called risk_rating.
I'd like to sort loans based on the most recent decision (based on created_at, per usual), but by the risk_rating.
Loan.includes(:decisions).references(:decisions).order('decisions.risk_rating DESC') doesn't work...
I want loans... sorted by their most recent decision's risk_rating. This seems like it should be easier than it is.
I'm currently doing this outside of the database like this, but it's chewing up time and memory:
Loan.all.sort do |x,y|
x.decisions.last.try(:risk_rating).to_f <=> y.decisions.last.try(:risk_rating).to_f
end
I'd like to show the performance I'm getting with the proposed answer, along with an inaccuracy...
Benchmark.bm do |x|
x.report{ Loan.joins('LEFT JOIN decisions ON decisions.loan_id = loans.id').group('loans.id').order('MAX(decisions.risk_rating) DESC').limit(10).map{|l| l.decisions.last.try(:risk_rating)} }
end
user system total real
0.020000 0.000000 0.020000 ( 20.573096)
=> [0.936775, 0.934465, 0.932088, 0.922352, 0.921882, 0.794724, 0.919432, 0.918385, 0.916952, 0.914938]
The order isn't right. That 0.794724 is out of place.
To that extent... I'm only seeing one attribute in the proposed answer. I don't see the connection =/
Alright, it looks like I'm working late tonight because I couldn't help but jump in:
class Loan < ApplicationRecord
has_many :decisions
has_one :latest_decision, -> { merge(Decision.latest) }, class_name: 'Decision'
end
class Decision < ApplicationRecord
belongs_to :loan
def latest
t1 = arel_table
t2 = arel_table.alias('t2')
# Self join based on `loan_id` prefer latest `created_at`
join_on = t1[:loan_id].eq(t2[:loan_id]).and(
t1[:created_at].lt(t2[:created_at]))
where(t2[:loan_id].eq(nil)).joins(
t1.create_join(t2, t1.create_on(join_condition), Arel::Nodes::OuterJoin)
)
end
end
Loan.includes(:latest_decision)
This doesn't sort, just provides the latest decision for each loan. Throwing an order that references access_codes messes things up because of the table aliasing. I don't have the time to work that kink out now, but I bet you can figure it out if you check out some of the great resources on Arel and how to use it with ActiveRecord. I really enjoy this one.
At first let's write sql-query which will select necessary data. SO contains a question which may helps here: Select most recent row with GROUP BY in MySQL. My best version:
SELECT loans.*
FROM loans
LEFT JOIN (
SELECT loan_id, MAX(id) as id
FROM decisions
GROUP BY loan_id) d ON d.loan_id = loans.id
LEFT JOIN decisions ON decisions.id = d.id
ORDER BY decisions.risk_rating DESC
This code suppose MAX(id) gives id of the recent row in group.
You may do the same query by this Rails code:
sub_query =
Decision.select('loan_id, MAX(id) as id').
group(:loan_id).to_sql
Loan.
joins("LEFT JOIN (#{sub_query}) d ON d.loan_id = loans.id").
joins("LEFT JOIN decisions ON decisions.id = d.id").
order("decisions.risk_rating DESC")
Unfortunately, I don't have MySQL at hand and I can't try this code. Hope it will work.

Rails: How to get objects with at least one child?

After googling, browsing SO and reading, there doesn't seem to be a Rails-style way to efficiently get only those Parent objects which have at least one Child object (through a has_many :children relation). In plain SQL:
SELECT *
FROM parents
WHERE EXISTS (
SELECT 1
FROM children
WHERE parent_id = parents.id)
The closest I've come is
Parent.all.reject { |parent| parent.children.empty? }
(based on another answer), but it's really inefficient because it runs a separate query for each Parent.
Parent.joins(:children).uniq.all
As of Rails 5.1, uniq is deprecated and distinct should be used instead.
Parent.joins(:children).distinct
This is a follow-up on Chris Bailey's answer. .all is removed as well from the original answer as it doesn't add anything.
The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:
Parent.where<<-SQL
EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
SQL
EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:
class Post < ApplicationRecord
has_many :comments
has_many :likes
end
class Comment < ApplicationRecord
belongs_to :post
end
class Like < ApplicationRecord
belongs_to :post
end
In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:
# Create posts with comments and likes
100.times do |i|
post = Post.create!(title: "Post #{i}")
50.times do |j|
post.comments.create!(content: "Comment #{j} for #{post.title}")
post.likes.create!(user_name: "User #{j} for #{post.title}")
end
end
# Create a post without comment and like
Post.create!(title: 'Hidden post')
If you want to get posts which have at least one comment and like, you might write like this:
# NOTE: uniq method will be removed in Rails 5.1
Post.joins(:comments, :likes).distinct
The query above generates SQL like this:
SELECT DISTINCT "posts".*
FROM "posts"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"
But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.
In this case you should write like this:
Post.where <<-SQL
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
SQL
This query generates SQL like this:
SELECT "posts".*
FROM "posts"
WHERE (
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
)
This query does not generate useless duplicated rows, so it could be faster.
Here is benchmark:
user system total real
Uniq: 0.010000 0.000000 0.010000 ( 0.074396)
Exists: 0.000000 0.000000 0.000000 ( 0.003711)
It shows EXISTS is 20.047661 times faster than DISTINCT.
I pushed the sample application in GitHub, so you can confirm the difference by yourself:
https://github.com/JunichiIto/exists-query-sandbox
I have just modified this solution for your need.
Parent.joins("left join childrens on childrends.parent_id = parents.id").where("childrents.parent_id is not null")
You just want an inner join with a distinct qualifier
SELECT DISTINCT(*)
FROM parents
JOIN children
ON children.parent_id = parents.id
This can be done in standard active record as
Parent.joins(:children).uniq
However if you want the more complex result of find all parents with no children
you need an outer join
Parent.joins("LEFT OUTER JOIN children on children.parent_id = parent.id").
where(:children => { :id => nil })
which is a solution which sux for many reasons. I recommend Ernie Millers squeel library which will allow you to do
Parent.joins{children.outer}.where{children.id == nil}
try including the children with #includes()
Parent.includes(:children).all.reject { |parent| parent.children.empty? }
This will make 2 queries:
SELECT * FROM parents;
SELECT * FROM children WHERE parent_id IN (5, 6, 8, ...);
[UPDATE]
The above solution is usefull when you need to have the Child objects loaded.
But children.empty? can also use a counter cache1,2 to determine the amount of children.
For this to work you need to add a new column to the parents table:
# a new migration
def up
change_table :parents do |t|
t.integer :children_count, :default => 0
end
Parent.reset_column_information
Parent.all.each do |p|
Parent.update_counters p.id, :children_count => p.children.length
end
end
def down
change_table :parents do |t|
t.remove :children_count
end
end
Now change your Child model:
class Child
belongs_to :parent, :counter_cache => true
end
At this point you can use size and empty? without touching the children table:
Parent.all.reject { |parent| parent.children.empty? }
Note that length doesn't use the counter cache whereas size and empty? do.

Activerecord opitimization - best way to query all at once?

I am trying to achieve by reducing the numbers of queries using ActiveRecord 3.0.9. I generated about 'dummy' 200K customers and 500K orders.
Here's Models:
class Customer < ActiveRecord::Base
has_many :orders
end
class Orders < ActiveRecord::Base
belongs_to :customer
has_many :products
end
class Product < ActiveRecord::Base
belongs_to :order
end
when you are using this code in the controller:
#customers = Customer.where(:active => true).paginate(page => params[:page], :per_page => 100)
# SELECT * FROM customers ...
and use this in the view (I removed HAML codes for easier to read):
#order = #customers.each do |customer|
customer.orders.each do |order| # SELECT * FROM orders ...
%td= order.products.count # SELECT COUNT(*) FROM products ...
%td= order.products.sum(:amount) # SELECT SUM(*) FROM products ...
end
end
However, the page is rendered the table with 100 rows per page. The problem is that it kinda slow to load because its firing about 3-5 queries per customer's orders. thats about 300 queries to load the page.
There's alternative way to reduce the number of queries and load the page faster?
Notes:
1) I have attempted to use the includes(:orders), but it included more than 200,000 order_ids. that's issue.
2) they are already indexed.
If you're only using COUNT and SUM(amount) then what you really need is to retrieve only that information and not the orders themselves. This is easily done with SQL:
SELECT customer_id, order_id, COUNT(id) AS order_count, SUM(amount) AS order_total FROM orders LEFT JOIN products ON orders.id=products.order_id GROUP BY orders.customer_id, products.order_id
You can wrap this in a method that returns a nice, orderly hash by re-mapping the SQL results into a structure that fits your requirements:
class Order < ActiveRecord::Base
def self.totals
query = "..." # Query from above
result = { }
self.connection.select_rows(query).each do |row|
# Build out an array for each unique customer_id in the results
customer_set = result[row[0].to_i] ||= [ ]
# Add a hash representing each order to this customer order set
customer_set << { :order_id => row[1].to_i, :count => row[2].to_i, :total => row[3].to_i } ]
end
result
end
end
This means you can fetch all order counts and totals in a single pass. If you have an index on customer_id, which is imperative in this case, then the query will usually be really fast even for large numbers of rows.
You can save the results of this method into a variable such as #order_totals and reference it when rendering your table:
- #order = #customers.each do |customer|
- #order_totals[customer.id].each do |order|
%td= order[:count]
%td= order[:total]
You can try something like this (yes, it looks ugly, but you want performance):
orders = Order.find_by_sql([<<-EOD, customer.id])
SELECT os.id, os.name, COUNT(ps.amount) AS count, SUM(ps.amount) AS amount
FROM orders os
JOIN products ps ON ps.order_id = os.id
WHERE os.customer_id = ? GROUP BY os.id, os.name
EOD
%td= orders.name
%td= orders.count
%td= orders.amount
Added: Probably it is better to create count and amount cache in Orders, but you will have to maintain it (count can be counter-cache, but I doubt there is a ready recipe for amount).
You can join the tables in with Arel (I prefer to avoid writing raw sql when possible). I believe that for your example you would do something like:
Customer.joins(:orders -> products).select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
The first method--
Customer.joins(:orders -> products)
--pulls in the nested association in one statement. Then the second part--
.select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
--specifies exactly what columns you want back.
Chain those and I believe you'll get a list of Customer instances only populated with what you've specified in the select method. You have to be careful though because you now have in hand read only objects that are possibly in in invalid state.
As with all the Arel methods what you get from those methods is an ActiveRecord::Relation instance. It's only when you start to access that data that it goes out and executes the SQL.
I have some basic nervousness that my syntax is incorrect but I'm confident that this can be done w/o relying on executing raw SQL.

How do I calculate the most popular combination of a order lines? (or any similar order/order lines db arrangement)

I'm using Ruby on Rails. I have a couple of models which fit the normal order/order lines arrangement, i.e.
class Order
has_many :order_lines
end
class OrderLines
belongs_to :order
belongs_to :product
end
class Product
has_many :order_lines
end
(greatly simplified from my real model!)
It's fairly straightforward to work out the most popular individual products via order line, but what magical ruby-fu could I use to calculate the most popular combination(s) of products ordered.
Cheers,
Graeme
My suggestion is to create an array a of Product.id numbers for each order and then do the equivalent of
h = Hash.new(0)
# for each a
h[a.sort.hash] += 1
You will naturally need to consider the scale of your operation and how much you are willing to approximate the results.
External Solution
Create a "Combination" model and index the table by the hash, then each order could increment a counter field. Another field would record exactly which combination that hash value referred to.
In-memory Solution
Look at the last 100 orders and recompute the order popularity in memory when you need it. Hash#sort will give you a sorted list of popularity hashes. You could either make a composite object that remembered what order combination was being counted, or just scan the original data looking for the hash value.
Thanks for the tip digitalross. I followed the external solution idea and did the following. It varies slightly from the suggestion as it keeps a record of individual order_combos, rather than storing a counter so it's possible to query by date as well e.g. most popular top 10 orders in the last week.
I created a method in my order which converts the list of order items to a comma separated string.
def to_s
order_lines.sort.map { |ol| ol.id }.join(",")
end
I then added a filter so the combo is created every time an order is placed.
after_save :create_order_combo
def create_order_combo
oc = OrderCombo.create(:user => user, :combo => self.to_s)
end
And finally my OrderCombo class looks something like below. I've also included a cached version of the method.
class OrderCombo
belongs_to :user
scope :by_user, lambda{ |user| where(:user_id => user.id) }
def self.top_n_orders_by_user(user,count=10)
OrderCombo.by_user(user).count(:group => :combo).sort { |a,b| a[1] <=> b[1] }.reverse[0..count-1]
end
def self.cached_top_orders_by_user(user,count=10)
Rails.cache.fetch("order_combo_#{user.id.to_s}_#{count.to_s}", :expiry => 10.minutes) { OrderCombo.top_n_orders_by_user(user, count) }
end
end
It's not perfect as it doesn't take into account increased popularity when someone orders more of one item in an order.

how to modify complex find_by_sql query w/ union into rails 3

here's the current query:
#feed = RatedActivity.find_by_sql(["(select *, null as queue_id, 3 as model_table_type from rated_activities where user_id in (?)) " +
"UNION (select *, null as queue_id, null as rating, 2 as model_table_type from watched_activities where user_id in (?)) " +
"UNION (select *, null as rating, 1 as model_table_type from queued_activities where user_id in (?)) " +"ORDER BY activity_datetime DESC limit 100", friend_ids, friend_ids, friend_ids])
Now, this is a bit of kludge, since there are actually models set up for:
class RatedActivity < ActiveRecord::Base
belongs_to :user
belongs_to :media
end
class QueuedActivity < ActiveRecord::Base
belongs_to :user
belongs_to :media
end
class WatchedActivity < ActiveRecord::Base
belongs_to :user
belongs_to :media
end
would love to know how to use activerecord in rails 3.0 to achieve basically the same thing as is done with the crazy union i have there.
It sounds like you should consolidate these three separate models into a single model. Statuses such as "watched", "queued", or "rated" are then all implicit based on attributes of that model.
class Activity < ActiveRecord::Base
belongs_to :user
belongs_to :media
scope :for_users, lambda { |u|
where("user_id IN (?)", u)
}
scope :rated, where("rating IS NOT NULL")
scope :queued, where("queue_id IS NOT NULL")
scope :watched, where("watched IS NOT NULL")
end
Then, you can call Activity.for_users(friend_ids) to get all three groups as you are trying to accomplish above... or you can call Activity.for_users(friend_ids).rated (or queued or watched) to get just one group. This way, all of your Activity logic is consolidated in one place. Your queries become simpler (and more efficient) and you don't have to maintain three different models.
I think that your current solution is OK in case of legacy DB. As native query it is also most efficient as your DBMS does all hard work (union, sort, limit).
If you really want to get rid of SQL UNION without changing schema then you can move union to Ruby array sum - but this may be slower.
result = RatedActivity.
select("*, null as queue_id, 3 as model_table_type").
where(:user_id=>friend_ids).
limit(100).all +
QueuedActivity...
Finally you need to sort and limit that product with
result.sort(&:activity_datetime)[0..99]
This is just proof of concept, as you see it is inefficient is some points (3 queries, sorting in Ruby, limit). I would stay with find_by_sql.

Resources