Which of the following query has the lowest cost?
A.
def recent_followers
self.followers.recent.includes(:user).collect {|f| f.user.name }.to_sentence
end
B.
Select followers where user_id = 1
Select users where user_id in (2,3,4,5)
Database querying is always faster, than Ruby processing.
Your first option uses collect, which has a disadvantage since it will have to load the whole collection into memory before processing.
You could rewrite your first try as:
followers.recent.joins(:user).pluck('users.name') # no need for self, btw
Related
Following up on my question here, I'm trying to improve a search further. We first search a replays table (searching 2k records) and then get unique players associated with that table (10 per, so 20k records) and render a JSON. This is done through the controller, the search reads as:
def index
#replays = Replay.includes(:players).where(map_id: params['map_id'].to_i).order(id: :desc).limit(2000)
render json: #replays[0..2000].to_json(include: [:players])
end
The performance:
Completed 200 OK in 254032ms (Views: 34.1ms | ActiveRecord: 20682.4ms)
The actual Active Record search reads as:
Replay Load (80.4ms) SELECT "replays".* FROM "replays" WHERE "replays"."map_id" = $1 ORDER BY "replays"."id" DESC LIMIT $2 [["map_id", 1], ["LIMIT", 2000]]
Player Load (20602.0ms) SELECT "players".* FROM "players" WHERE "players"."replay_id" IN (117217...
This mostly works, but still takes an exceptional amount of time. Is there are way to improve performance?
You're getting bitten by this issue https://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#Predicate_IN_optimalization
I found note pg_performance about optimalization possibility of IN predicate when list of values is longer than eighty numbers. For longer list is better create constant subqueries with using multi values:
SELECT *
FROM tab
WHERE x IN (1,2,3,..n); -- n > 70
-- faster case
SELECT *
FROM tab
WHERE x IN (VALUES(10),(20));
Using VALUES is faster for bigger number of items, so don't use it for small set of values.
Basically, SELECT * FROM WHERE IN ((1),(2)...) with a long list of values is very slow. It's ridiculously faster if you can convert it to a list of values, like SELECT * FROM WHERE IN (VALUES(1),(2) ...)
Unfortunately, since this is happening in active record, it's a little tricky to exercise control over the query. You can either avoid using an includes call and just manually construct the SQL to load all your child records, and then manually build up the associations.
Alternatively, you can monkey patch active record. Here's what I've done on rails 4.2, in an initializer.
module PreloaderPerformance
private
def query_scope(ids)
if ids.count > 100
type = klass.columns_hash[association_key_name.to_s].sql_type
values_list = ids.map do |id|
if id.kind_of?(Integer)
" (#{id})"
elsif type == "uuid"
" ('#{id.to_s}'::uuid)"
else
" ('#{id.to_s}')"
end
end.join(",")
scope.where("#{association_key_name} in (VALUES #{values_list})")
else
super
end
end
end
module ActiveRecord
module Associations
class Preloader
class Association #:nodoc:
prepend PreloaderPerformance
end
end
end
end
Doing this I've seen a 50x speed up of some of my queries, with no issues as of yet. Note it's not fully battle tested, and I bet it will have some issues if you're association is using a unique data type for the foreign_key relationship. In my data base, I only use uuids or integers for our associations. The usual caveats about monkey patching core rails behavior applies.
I know find_each can be used to batch queries, which might lighten the memory load here. Could you try out the following and see how it impacts upon the time?
Replay.where(map_id: params['map_id'].to_i).includes(:players).find_each(batch_size: 100).map do |replay|
replay.to_json(includes: :players)
end
I'm not sure this will work. It might be the mapping negates the benefits of batching - there are certainly more queries, but it'll use less memory as it doesn't need to store > 20k records at a time.
Have a play and see how it looks - fiddle with the batch size too, see how that affects things.
There's a caveat in that you can't apply a limit, so bear that in mind.
I'm sure someone else'll come up with a far slicker solution, but hope this might help in the meantime. If it's awful when you check the speed, let me know and I'll delete this answer :)
So I just want to make sure that this is an N+1 query and I want to know how to fix it. ON a high level, where am I losing the most time in an N+1 query? Is it the request over the network to the database url that is costing me the most time? Or is the actual query IN the database that is costing me the most time?
Say we have this:
products = Product.where(user_id: user.id) # This is one network database query right?
products.select { |product| !product.restrictions.map(&:state).includes?(user.address.state) }
# restriction is another table. We're trying to filter out products that are restricted to the user's state.
Questions
So technically, is this an N+1 query? It is because we're making 1 query to get all products for the user AND another to filter out the restricted products by comparing the product's restrictions by state to the user's state.
So, on a high level, what can I do? Can I eager load the restrictions table in my first query? Or should I just make one trip and do everything in the first query? Are these my two options?
Update
So assume I did Product.includes(:restrictions).where(user_id: user.id), this is all stil oone query right?
Is this also one query if this was all in one method:
products = Product.where(user_id: user.id)`
products.includes(:restrictions).select do |product|
!product.restrictions.map(&:state_name).include?("CT")
end
After #codelitt's answer, here is how you solve this.
You should call
products = Product.includes(:restrictions).where(user_id: user.id) // or you can write the includes in your scope
This will fetch all the related restriction records with the products in the memory. So, in the next line, there won't be a bunch of db query which is costly. Rather it will fetch the records from the memory.
There are some great articles like this one that go into this in more detail, but the general concept is that any time you're iterating through a list and making queries based on that list then you're performing an N+1 query. You're losing the most time over the network, but also each query comes with a set amount of overhead.
Question 1. Yes this is an N+1 query.
You're making an N+1 query when you return each product and then for each product you return whether it is restricted or not. You can typically recognize them because you are iterating through queries like you do above with products.select { |product| }. This could potentially product hundreds of queries when you could just batch them instead.
In your example, you are returning an array of products and making another request to filter out the list.
Your code currently produces SQL similar to:
SELECT * FROM products WHERE user_id=1;
and then performs another filter where you're checking product restrictions which produces these queries:
SELECT restrictions FROM products WHERE product_id=1, user_id=1, state="x";
SELECT restrictions FROM products WHERE product_id=2, user_id=1, state="x";
SELECT restrictions FROM products WHERE product_id=3, user_id=1, state="x";
...
Question 2. You should just do this all in one query and batch the results. This is psuedocode, but you get the idea:
products = Product.where(user_id: #user).select(restrictions: state)`
Response to your update: That will still create two queries. The method just runs from the top to bottom. The only way to make it only create one query is to use and chain the provided Rails ORM methods which create proxy objects and then create a single query. Read more here: https://stackoverflow.com/a/10747692/1336988
I have inherited a Rails 3 app that stores much of it's data as a fairly sophisticated tree structure. The application works pretty well in general but we are noticing some problems with performance, mostly around database interactions.
I am seeing a lot of queries along the lines of these showing up in my logs:
SELECT `messages`.* FROM `messages` WHERE `messages`.`context_type` = 'Node' AND `messages`.`context_id` IN (153740, 153741, /* about a thousand records... */ 154837, 154838, 154839, 154840, 154841, 154842, 154843)
Followed by many single queries where it looks as though the same record is being queried time and again:
[1m[35mCACHE (0.0ms)[0m SELECT `plans`.* FROM `plans` WHERE `plans`.`type` IN ('Scenario') AND `plans`.`id` = 1435 LIMIT 1
My log has that exact query roughly eighty times- now I'm guessing that initial Cache message means it is probably being pulled from a cache rather than going back to the database every time, but it still looks like a lot and this type of thing is happening repeatedly.
I am guessing that the above queries are an association being pulled out backwards so that message belongs_to plan and it is loading all the messages then pulling out the plan for each one rather than, as one might do in a sane world, starting with the plan and then loading all the messages.
Working in this vein, a single request contains 1641 SELECT statements and it seems very likely to me that the sheer amount of database traffic ( not to mention the number of sequential LIMIT 1 queries for neighbouring data in the same table ) is a significant bottleneck. I am reluctant to post too much code but this is a typical example of one of the larger queries:
def nodes
include_objects = [:sector, :market, :messages, :node_user_prefs, :reference_node, :project, {:available_events => :available_event_nodes}, :customer_factors, :active_components, :tags, { :event_histories => :node}, {:event_histories => :user }]
project = self
#cached_nodes ||= begin
all_nodes = orig_nodes.includes(include_objects)
all_nodes = all_nodes.map { |n| n.tap { |i| i.cached_project = project } }
all_node_ids = all_nodes.map(&:id)
all_nodes.select{ |n| n.type == 'InitialNode' || all_node_ids.include?(n.parent_node_id) }
end
end
Obviously, the queries are pretty diverse and the application is large, but this is fairly representative of the standard approach taken.
What are the easy wins with ActiveRecord that I can use to try and speed things up? I can easily enough put together a query that would pull all the required data out in a single round trip, but how easy would it be to form that - redundancies and all - into my model hierarchy? What other techniques can I use to help out here?
Ancestry Gem
Not a direct fix by any means, but you may wish to consider the ancestry gem -
This will give you a way to create a tree structure, whereby you'll be able to call single records & then have their descendents called a you wish. This will cut back on your SQL queries:
If you set up your nodes / objects in this fashion, it will allow you to call the records you require once & ancestry will populate the rest. If you want me to divulge more information on this, please let me know in the comments & I'll detail more specifics
I'm building a store and would like to randomize a product page, but only change it once per day.
I know that a randomizer with a seed number can return consistent results, so perhaps using the current day as a seed would work.
Caching would also work, or storing the results in a table.
What would be a good way to do this?
Create a materialized view. That's just another table in current PostgreSQL, updated with the results of a query. I might install a cron job that triggers the refill. You can have any amount of caching on top of that.
The upcoming Postgres 9.3 will have a new feature.
More on materialized views in the Postgres wiki.
For a fast method to pull random rows you may be interested in this related question:
Best way to select random rows PostgreSQL
You definitely want to cache the results. Sorting things randomly is slow (especially in large datasets). You could have a cron job that ran every night to clear out the old cache and pick new random products. Page cache is best if you can pull that off, but a fragment cache would work fine too.
I found a different way to accomplish this that will also let me use the will_paginate gem and have fresh info when the products are updated.
I added a sort_order long integer to the table. Then, once a day, I will run a query to update that field with random numbers. I'll sort that field.
Conceptual Rails code:
# Pulling in the products in the specified random order
def show
#category = Category.where(slug: params[:id].to_s).first
if #category
#random_products = #category.products.order(sort_order: :desc) # desc so new products are at the end
end
end
# Elsewhere...
def update_product_order
products = Product.order("RANDOM()").all
order_index = 0
products.each do |p|
product.sort_order = order_index
product.save! # this can be done much more efficiently, obviously
order_index += 1
end
end
In a rails 2 app I'm building, I have a need to update a collection of records with specific attributes. I have a named scope to find the collection, but I have to iterate over each record to update the attributes. Instead of making one query to update several thousand records, I'll have to make several thousand queries.
What I've found so far is something like Model.find_by_sql("UPDATE products ...)
This feels really junior, but I've googled and looked around SO and haven't found my answer.
For clarity, what I have is:
ps = Product.last_day_of_freshness
ps.each { |p| p.update_attributes(:stale => true) }
What I want is:
Product.last_day_of_freshness.update_attributes(:stale => true)
It sounds like you are looking for ActiveRecord::Base.update_all - from the documentation:
Updates all records with details given if they match a set of conditions supplied, limits and order can also be supplied. This method constructs a single SQL UPDATE statement and sends it straight to the database. It does not instantiate the involved models and it does not trigger Active Record callbacks or validations.
Product.last_day_of_freshness.update_all(:stale => true)
Actually, since this is rails 2.x (You didn't specify) - the named_scope chaining may not work, you might need to pass the conditions for your named scope as the second parameter to update_all instead of chaining it onto the end of the Product scope.
Have you tried using update_all ?
http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-update_all
For those who will need to update big amount of records, one million or even more, there is a good way to update records by batches.
product_ids = Product.last_day_of_freshness.pluck(:id)
iterations_size = product_ids.count / 5000
puts "Products to update #{product_ids.count}"
product_ids.each_slice(5000).with_index do |batch_ids, i|
puts "step #{i} of iterations_size"
Product.where(id: batch_ids).update_all(stale: true)
end
If your table has a lot indexes, it also will increase time for such operations, because it will need to rebuild them. When I called update_all for all records in table, there were about two million records and twelve indexes, operation didn't accomplish in more than one hour. With this approach it took about 20 minutes in development env and about 4 minutes in production, of course it depends on application settings and server hardware. You can put it in rake task or some background worker.
Loos like update_all is the best option... though I'll maintain my hacky version in case you're curious:
You can use just plain-ole SQL to do what you want thus:
ps = Product.last_day_of_freshness
ps_ids = ps.map(%:id).join(',') # local var just for readability
Product.connection.execute("UPDATE `products` SET `stale` = TRUE WHERE id in (#{ps_ids)")
Note that this is db-dependent - you may need to adjust quoting style to suit.