So, I've inherited a medium sized legacy app that was quickly built to address a customer's need. Luckily it was well written for the most part but because it was rushed there are a lot of places that ActiveRecord relations were not eager loaded. As the traffic of the site starts to increase, these n+1 bugs are really starting to surface.
My question is how can I easily find these problems and write something to the logs or generate a report, anything to alert me and other devs while developing?
What I have so far is an object that can wrap around any view or template that is being rendered:
class EagerLoadIssueLogger
def track(&block)
# Start tracking eager load issues
result = block.call
# Stop tracking eager load issues
result
end
end
Then call in a layout or view like so:
<body>
<%= #eager_load_tracker.track { yield } %>
</body>
My issue is I can't figure out how to determine when an association is called that hasn't been eager loaded. I know there is a method loaded? that I can use to check any one relation like:
#team.users.loaded? # returns true or false
but I want to check any relation loaded while in my tracker block and if not loaded, log it, otherwise, good job, just ignore. I know I can probably accomplish this by monkey-patching into ActiveRecord::Relation or some other ActiveRecord class/module but have been fruitlessly searching for where to get started.
Any ideas?
The bullet gem can be used to log n+1 queries. It works pretty well for me.
Related
In my code I join 2 models (e.g. Post and Comment) then I eager load a third model (e.g. Author). When I print each record rails uses the eager loaded model data for the first instance of Post but triggers additional database queries for each additional instance of Post. This is a bit of an n+1 problem.
What I do not understand is why I'm only able to use the eager loaded data for the first record of each post and not the additional records returned by the join?
I have looked at similar questions and do not feel these address the exact same issue. These include:
why is this rails association loading individually after an eager load?
belongs_to association loaded individually even after eager loading
Eager loading not loading in rails, and others
I've also looked into the rails source code, specifically ActiveRecord::Associations::Preloader and have tried to modify it a bit locally to see if I could narrow down where the issue is or if it happens here. I had thought that possibly this class was stripping out non-unique record instances by id (which I think it is at line 92) but when I change this my issue is not resolved.
I did stumble across this whacky edge case which seemed applicable but in my actual code I'm using find_by_sql and have not been able to implement this effectively.
I have created a gist with steps to reproduce the issue I'm experiencing and the output I'm getting. Any help would be appreciated.
Note: This example is the simplest setup I could think of that demonstrates the same issue I'm having in my actual code. In my example case I know I could eager load the comments and authors however, in my actual code there is a much more complex join and the join is required. I would not be able to eager load the Comment model from this example.
For those that find this in the future, here is what I've found and what I implemented.
Simple gist example
For the simple example outlined in my gist, I found that I can get the desired results be changing the following.
Gist:
# Triggers n+1 for author after first puts of an author.
posts = Post.joins(:comments).select("posts.*, comments.body").includes(:author)
posts.each do |x|
puts "#{x.title} #{x.author.name} #{x.body}"
end
Changed to:
# Works as expected.
posts = Post.joins(:comments, :author).select("posts.*, comments.body").includes(:author)
posts.each do |x|
puts "#{x.title} #{x.author.name} #{x.body}"
end
By joining the author I get the results I wanted without additional queries.
More complex example and Implementation
Although the above solved my simple example I used to demonstrate my problem, in my more complex real-world problem (as noted in my question) I was not able to implement this additional join to solve things.
Instead I implemented the Query Object Pattern and Decorator Pattern as outlined in this great Thoughbot post and in the rails handbook on github (Query Object Pattern and Decorator Pattern).
These patterns allowed me to get all the benefits of eager loading the specific associations I wanted but for my much more complex query.
The special_item_id_list method is responsible for returning an array of ids. The query and logic is complicated enough that I only want to have to run it once per any page request, but I'll be utilizing that resulting array of ids in many different places. The idea is to be able to use the is_special? method or the special_items scope freely without worrying about incurring overhead each time they are used, so they rely on the special_item_id_list method to do the heavy lifting and caching.
I don't want the results of this query to persist between page loads, but I'd like the query ran only once per page load. I don't want to use a global variable and thought a class variable on the model might work, however it appears that the class variable does persist between page loads. I'm guessing the Item class is part of the Rails stack and stays in memory.
So where would be the preferred place for storing my id list so that it's rebuilt on each page load?
class Item < ActiveRecord::Base
scope :special_items, lambda { where(:id => special_item_id_list) }
def self.special_item_id_list
#special_item_id_list ||= ... # some complicated queries
end
def is_special?
self.class.special_item_id_list.include?(id)
end
end
UPDATE: What about using Thread? I've done this before for tracking the current user and I think it could be applied here, but I wonder if there's another way? Here's a StackOverflow conversation discussing threads! and also mentions the request_store! gem as possibly a cleaner way of doing so.
This railscast covers what you're looking for. In short, you're going to want to do something like this:
after_commit :flush_cache
def self.cached_special_item_list
Rails.cache.fetch("special_items") do
special_item_id_list
end
end
private
def flush_cache
Rails.cache.delete("special_items")
end
At first I went with a form of Jonathan Bender's suggestion of utilizing Rails.cache (thanks John), but wasn't quite happy with how I was having to expire it. For lack of a better idea I thought it might be better to use Thread after all. I ultimately installed the request_store gem to store the query results. This keeps the data around for the duration I wanted (the lifetime of the request/response) and no longer, without any need for expiration.
Are you really sure this optimisation is necessary? Are you having performance issues because of it? Unless it's actually a problem I would not worry about it.
That said; you could create a new class, make special_item_id_list an instance method on that class and then pass the class around to anything needs to use that expensive-to-calculate data.
Or it might suffice to cache the data on instances of Item (possibly by making special_item_id_list an instance method), and not worry about different instances not being able to share the cache.
Introduction
I have a (mostly) single-page application built with BackboneJS and a Rails backend.
Because most of the interaction happens on one page of the webapp, when the user first visits the page I basically have to pull a ton of information out of the database in one large deeply joined query.
This is causing me some rather extreme load times on this one page.
NewRelic appears to be telling me that most of my problems are because of 457 individual fast method calls.
Now I've done all the eager loading I can do (I checked with the Bullet gem) and I still have a problem.
These method calls are most likely ocurring in my Rabl serializer which I use to serialize a bunch of JSON to embed into the page for initializing Backbone. You don't need to understand all this but suffice to say it could add up to 457 method calls.
object #search
attributes :id, :name, :subscription_limit
# NOTE: Include a list of the members of this search.
child :searchers => :searchers do
attributes :id, :name, :gravatar_icon
end
# Each search has many concepts (there could be over 100 of them).
child :concepts do |search|
attributes :id, :title, :search_id, :created_at
# The person who suggested each concept.
child :suggester => :suggester do
attributes :id, :name, :gravatar_icon
end
# Each concept has many suggestions (approx. 4 each).
node :suggestions do |concept|
# Here I'm scoping suggestions to only ones which meet certain conditions.
partial "suggestions/show", object: concept.active_suggestions
end
# Add a boolean flag to tell if the concept is a favourite or not.
node :favourite_id do |concept|
# Another method call which occurs for each concept.
concept.favourite_id_for(current_user)
end
end
# Each search has subscriptions to certain services (approx. 4).
child :service_subscriptions do
# This contains a few attributes and 2 fairly innocuous method calls.
extends "service_subscriptions/show"
end
So it seems that I need to do something about this but I'm not sure what approach to take. Here is a list of potential ideas I have:
Performance Improvement Ideas
Dumb-Down the Interface
Maybe I can come up with ways to present information to the user which don't require the actual data to be present. I don't see why I should absolutely need to do this though, other single-page apps such as Trello have incredibly complicated interfaces.
Concept Pagination
If I paginate concepts it will reduct the amount of data being extracted from the database each time. Would product an inferior user interface though.
Caching
At the moment, refreshing the page just extracts the entire search out of the DB again. Perhaps I can cache parts of the app to reduce on DB hits. This seems messy though because not much of the data I'm dealing with is static.
Multiple Requests
It is technically bad to serve the page without embedding the JSON into the page but perhaps the user will feel like things are happening faster if I load the page unpopulated and then fetch the data.
Indexes
I should make sure that I have indexes on all my foreign keys. I should also try to think about places where it would help to have indexes (such as favourites?) and add them.
Move Method Calls into DB
Perhaps I can cache some of the results of the iteration I do in my view layer into the DB and just pull them out instead of computing them. Or I could sync things on write rather than on read.
Question
Does anyone have any suggestions as to what I should be spending my time on?
This is a hard question to answer without being able to see the actual user interface, but I would focus on loading exactly only as much data as is required to display the initial interface. For example, if the user has to drill down to see some of the data you're presenting, then you can load that data on demand, rather than loading it as part of the initial payload. You mention that a search can have as many as 100 "concepts," maybe you don't need to fetch all of those concepts initially?
Bottom line, it doesn't sound like your issue is really on the client side -- it sounds like your server-side code is slowing things down, so I'd explore what you can do to fetch less data, or to defer the complex queries until they are definitely required.
I'd recommend separating your JS code-base into modules that are dynamically loaded using an asset loader like RequireJS. This way you won't have so many XHRs firing at load time.
When a specific module is needed it can load and initialize at an appropriate time instead of every page load.
If you complicate your code a little, each module should be able to start and stop. So, if you have any polling occurring or complex code executing you can stop the module to increase performance and decrease the network load.
I've got three nested models: user has many plates and plate has many fruits. I also have a current_user helper method that runs in the before filter to provide authentication. So when I get to my controller, I already have my user object. How can I load all the user's plates and fruits at once?
In other words, I'd like to do something like:
#plates = current_user.plates(include: :fruits)
How can I achieve this?
I'm using Rails 3.1.3.
You will probably want to use the provided #includes method on your relation. DO NOT USE #all unless you intend to immediately work through the records, it will immediately defeat many forms of caching.
Perhaps something like: #plates = current_user.plates.includes(:fruits)
Unfortunately, there are portions of the Rails API that are not as well documented as they should be. I would recommend checking out the following resources if you have any further questions about the Rails query interface:
Query Interface Guide
ActiveRecord::Relation Walkthrough (screencast)
The query interface is possibly the most difficult part of the Rails stack to keep up with, especially with the changes made with Rails 3.0 and 3.1.
You can do
ActiveRecord::Associations::Preloader.new([current_user], :plates => :fruit).run
To eager load associations after current_user was loased. The second argument can be anything you would normally pass to includes: a symbol, an array of symbols, a hash etc
#plates = current_user.plates.all(:include => :fruits)
should do it.
In my app there're objects, and they belong to countries, regions, cities, types, groups, companies and other sets. Every set is rather simple - it has id, name and sometimes some pointers to other sets, and it never changes. Some sets are small and I load them in before_filter like that:
#countries = Country.all
#regions = Region.all
But then I call, for example,
offer.country.name
or
region.country.name
and my app performs a separate db query-by-id, although I've already loaded them all. After that I perform query through :include, and this case ids, generated by eager loading, do not depend on either I've already loaded this data with another query-by-id or not.
So I want some cache. For example, I may generate hashes with keys as records-ids in my before_filter and then call #countries[offer.country_id].name. This case it seems I don't need eager loading and it's easy turn on Rails.cache here. But maybe there's some smart built-in rails solution that does not require to rewrite everything?
Caching lists of models like that won't cache individual instances of that exist in other model's associations.
The Rails team has worked on implementing Identity Maps in Rails 3.1 to solve this exact problem, but it is disabled by default for now. You can enable it and see if it works for your problem.