Activerecord opitimization - best way to query all at once? - ruby-on-rails

I am trying to achieve by reducing the numbers of queries using ActiveRecord 3.0.9. I generated about 'dummy' 200K customers and 500K orders.
Here's Models:
class Customer < ActiveRecord::Base
has_many :orders
end
class Orders < ActiveRecord::Base
belongs_to :customer
has_many :products
end
class Product < ActiveRecord::Base
belongs_to :order
end
when you are using this code in the controller:
#customers = Customer.where(:active => true).paginate(page => params[:page], :per_page => 100)
# SELECT * FROM customers ...
and use this in the view (I removed HAML codes for easier to read):
#order = #customers.each do |customer|
customer.orders.each do |order| # SELECT * FROM orders ...
%td= order.products.count # SELECT COUNT(*) FROM products ...
%td= order.products.sum(:amount) # SELECT SUM(*) FROM products ...
end
end
However, the page is rendered the table with 100 rows per page. The problem is that it kinda slow to load because its firing about 3-5 queries per customer's orders. thats about 300 queries to load the page.
There's alternative way to reduce the number of queries and load the page faster?
Notes:
1) I have attempted to use the includes(:orders), but it included more than 200,000 order_ids. that's issue.
2) they are already indexed.

If you're only using COUNT and SUM(amount) then what you really need is to retrieve only that information and not the orders themselves. This is easily done with SQL:
SELECT customer_id, order_id, COUNT(id) AS order_count, SUM(amount) AS order_total FROM orders LEFT JOIN products ON orders.id=products.order_id GROUP BY orders.customer_id, products.order_id
You can wrap this in a method that returns a nice, orderly hash by re-mapping the SQL results into a structure that fits your requirements:
class Order < ActiveRecord::Base
def self.totals
query = "..." # Query from above
result = { }
self.connection.select_rows(query).each do |row|
# Build out an array for each unique customer_id in the results
customer_set = result[row[0].to_i] ||= [ ]
# Add a hash representing each order to this customer order set
customer_set << { :order_id => row[1].to_i, :count => row[2].to_i, :total => row[3].to_i } ]
end
result
end
end
This means you can fetch all order counts and totals in a single pass. If you have an index on customer_id, which is imperative in this case, then the query will usually be really fast even for large numbers of rows.
You can save the results of this method into a variable such as #order_totals and reference it when rendering your table:
- #order = #customers.each do |customer|
- #order_totals[customer.id].each do |order|
%td= order[:count]
%td= order[:total]

You can try something like this (yes, it looks ugly, but you want performance):
orders = Order.find_by_sql([<<-EOD, customer.id])
SELECT os.id, os.name, COUNT(ps.amount) AS count, SUM(ps.amount) AS amount
FROM orders os
JOIN products ps ON ps.order_id = os.id
WHERE os.customer_id = ? GROUP BY os.id, os.name
EOD
%td= orders.name
%td= orders.count
%td= orders.amount
Added: Probably it is better to create count and amount cache in Orders, but you will have to maintain it (count can be counter-cache, but I doubt there is a ready recipe for amount).

You can join the tables in with Arel (I prefer to avoid writing raw sql when possible). I believe that for your example you would do something like:
Customer.joins(:orders -> products).select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
The first method--
Customer.joins(:orders -> products)
--pulls in the nested association in one statement. Then the second part--
.select("id, name, count(products.id) as count, sum(product.amount) as total_amount")
--specifies exactly what columns you want back.
Chain those and I believe you'll get a list of Customer instances only populated with what you've specified in the select method. You have to be careful though because you now have in hand read only objects that are possibly in in invalid state.
As with all the Arel methods what you get from those methods is an ActiveRecord::Relation instance. It's only when you start to access that data that it goes out and executes the SQL.
I have some basic nervousness that my syntax is incorrect but I'm confident that this can be done w/o relying on executing raw SQL.

Related

grouping with a non-primary key in postgres / activerecord

I have a model Lap:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
scoped = joins(:car_model).where(:car_models => {:name => carmodel})
scoped
end
def self.fastest_per_car
scoped = select("laps.id",:car_id, :time, :mph).group("laps.id", :car_id, :time, :mph).order("time").limit(1)
scoped
end
end
I want to only return the fastest lap for each car.
So, I need to group the Laps by the Lap.car_id and then only return the fastest lap time based on that car, which would determined by the column Lap.time
Basically I would like to stack my methods in my controller:
#corvettes = Lap.by_carmodel("Corvette").fastest_per_car
Hopefully that makes sense...
When trying to run just Lap.fastest_per_car I am limiting everything to 1 result, rather than 1 result per each Car.
Another thing I had to do was add "laps.id" as :id was showing up empty in my results as well. If i just select(:id) it was saying ambiguous
I think a decent approach to this would be to add a where clause based on an efficient SQL syntax for returning the single fastest lap.
Something like this correlated subquery ...
select ...
from laps
where id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)
It's a little complex because of the need to tie-break on created_at.
The rails scope would just be:
where("laps.id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)")
An index on car_id would be pretty essential, and if that was a composite index on (car_id, time asc) then so much the better.
You are using limit which will return you one single value. Not one value per car. To return one car value per lap you just have to join the table and group by a group of columns that will identify one lap (id is the simplest).
Also, you can have a more ActiveRecord friendly friendly with:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
joins(:car_model).where(:car_models => {:name => carmodel})
end
def self.fastest_per_car
joins(:car_model)
.select("laps.*, MIN(car_models.time) AS min_time")
.group("laps.id")
.order("min_time ASC")
end
end
This is what I did and its working. If there is a better way to go about these please post your answer:
in my model:
def self.fastest_per_car
select('DISTINCT ON (car_id) *').order('car_id, time ASC').sort_by! {|ts| ts.time}
end

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

Simple ActiveRecord Question

I have a database model set up such that a post has many votes, a user has many votes and a post belongs to both a user and a post. I'm using will paginate and I'm trying to create a filter such that the user can sort a post by either the date or the number of votes a post has. The date option is simple and looks like this:
#posts = Post.paginate :order => "date DESC"
However, I can't quite figure how to do the ordering for the votes. If this were SQL, I would simply use GROUP BY on the votes user_id column, along with the count function and then I would join the result with the posts table.
What's the correct way to do with with ActiveRecord?
1) Use the counter cache mechanism to store the vote count in Post model.
# add a column called votes_count
class Post
has_many :votes
end
class Vote
belongs_to :post, :counter_cache => true
end
Now you can sort the Post model by vote count as follows:
Post.order(:votes_count)
2) Use group by.
Post.select("posts.*, COUNT(votes.post_id) votes_count").
join(:votes).group("votes.post_id").order(:votes_count)
If you want to include the posts without votes in the result-set then:
Post.select("posts.*, COUNT(votes.post_id) votes_count").
join("LEFT OUTER JOIN votes ON votes.post_id=posts.id").
group("votes.post_id").order(:votes_count)
I prefer approach 1 as it is efficient and the cost of vote count calculation is front loaded (i.e. during vote casting).
Just do all the normal SQL stuff as part of the query with options.
#posts = Post.paginate :order => "date DESC", :join => " inner join votes on post.id..." , :group => " votes.user_id"
http://apidock.com/rails/ActiveRecord/Base/find/class
So I don't know much about your models, but you seem to know somethings about SQL so
named scopes: you basically just put the query into a class method:
named_scope :index , :order => 'date DESC', :join => .....
but they can take parameters
named_scope :blah, {|param| #base query on param }
for you, esp if you are more familiar with SQL you can write your own query,
#posts = Post.find_by_sql( <<-SQL )
SELECT posts.*
....
SQL

ActiveRecord and SELECT AS SQL statements

I am developing in Rails an app where I would like to rank a list of users based on their current points. The table looks like this: user_id:string, points:integer.
Since I can't figure out how to do this "The Rails Way", I've written the following SQL code:
self.find_by_sql ['SELECT t1.user_id, t1.points, COUNT(t2.points) as user_rank FROM registrations as t1, registrations as t2 WHERE t1.points <= t2.points OR (t1.points = t2.points AND t1.user_id = t2.user_id) GROUP BY t1.user_id, t1.points ORDER BY t1.points DESC, t1.user_id DESC']
The thing is this: the only way to access the aliased column "user_rank" is by doing ranking[0].user_rank, which brinks me lots of headaches if I wanted to easily display the resulting table.
Is there a better option?
how about:
#ranked_users = User.all :order => 'users.points'
then in your view you can say
<% #ranked_users.each_with_index do |user, index| %>
<%= "User ##{index}, #{user.name} with #{user.points} points %>
<% end %>
if for some reason you need to keep that numeric index in the database, you'll need to add an after_save callback to update the full list of users whenever the # of points anyone has changes. You might look into using the acts_as_list plugin to help out with that, or that might be total overkill.
Try adding user_rank to your model.
class User < ActiveRecord::Base
def rank
#determine rank based on self.points (switch statement returning a rank name?)
end
end
Then you can access it with #user.rank.
What if you did:
SELECT t1.user_id, COUNT(t1.points)
FROM registrations t1
GROUP BY t1.user_id
ORDER BY COUNT(t1.points) DESC
If you want to get all rails-y, then do
cool_users = self.find_by_sql ['(sql above)']
cool_users.each do |cool_user|
puts "#{cool_user[0]} scores #{cool_user[1]}"
end

Find all objects with no associated has_many objects

In my online store, an order is ready to ship if it in the "authorized" state and doesn't already have any associated shipments. Right now I'm doing this:
class Order < ActiveRecord::Base
has_many :shipments, :dependent => :destroy
def self.ready_to_ship
unshipped_orders = Array.new
Order.all(:conditions => 'state = "authorized"', :include => :shipments).each do |o|
unshipped_orders << o if o.shipments.empty?
end
unshipped_orders
end
end
Is there a better way?
In Rails 3 using AREL
Order.includes('shipments').where(['orders.state = ?', 'authorized']).where('shipments.id IS NULL')
You can also query on the association using the normal find syntax:
Order.find(:all, :include => "shipments", :conditions => ["orders.state = ? AND shipments.id IS NULL", "authorized"])
One option is to put a shipment_count on Order, where it will be automatically updated with the number of shipments you attach to it. Then you just
Order.all(:conditions => [:state => "authorized", :shipment_count => 0])
Alternatively, you can get your hands dirty with some SQL:
Order.find_by_sql("SELECT * FROM
(SELECT orders.*, count(shipments) AS shipment_count FROM orders
LEFT JOIN shipments ON orders.id = shipments.order_id
WHERE orders.status = 'authorized' GROUP BY orders.id)
AS order WHERE shipment_count = 0")
Test that prior to using it, as SQL isn't exactly my bag, but I think it's close to right. I got it to work for similar arrangements of objects on my production DB, which is MySQL.
Note that if you don't have an index on orders.status I'd strongly advise it!
What the query does: the subquery grabs all the order counts for all orders which are in authorized status. The outer query filters that list down to only the ones which have shipment counts equal to zero.
There's probably another way you could do it, a little counterintuitively:
"SELECT DISTINCT orders.* FROM orders
LEFT JOIN shipments ON orders.id = shipments.order_id
WHERE orders.status = 'authorized' AND shipments.id IS NULL"
Grab all orders which are authorized and don't have an entry in the shipments table ;)
This is going to work just fine if you're using Rails 6.1 or newer:
Order.where(state: 'authorized').where.missing(:shipments)

Resources