I have an E-Commerce Rails Application where we need to output Orders placed by Customers on a page within last one year for reporting reasons. Now, the data set is quite large and displaying these Orders on a single page takes quite a bit of SQL processing. This task initially was very slow and hence I moved all the required order details to a Redis Server and fetching of data has become really fast now but we are still not quite there.
Here's what we have:
Rendered **path**/sales_orders.html.haml within layouts/admin (39421.1ms)
Completed 200 OK in 44925ms (Views: 39406.8ms | ActiveRecord: 417.2ms)
The application is hosted on Heroku and if a request takes more than 30s it is killed. As you can see we are well above that limit. Most of the time is lost in rendering the view.
The page contains a date filter where the user gets to choose what Date Range to select the Orders from. So, caching is not the ideal solution since Date Ranges might change every time.
Any ideas how this can be done?
The Redis keys are of the format (The following is a Redis Hash):
orders:2012-01-01:123
orders:yyyy-mm-dd:$order-id
User simply provides a Date range and I get all the keys within that date range under the orders namespace.
Here's how I would get the Customer Name for instance from the Redis order key:
= REDIS.hget(order_key, "customer_name")
Consider building the report with a periodic task using the Heroku Scheduler addon.
As long as last-minute orders are not required to be included in the report, you can build your reports nightly and have them available for immediate download to read with your morning coffee, or even have them mailed to you (or whoever needs to read them.)
If you need interactive selection of periods for reports, you will need to queue the requests up and build the reports using background jobs.
Almost all your time is spent in rendering views. That probably means you have a lot of partials or other complex view logic. Some of your options are:
Paginate your output, but offer a PDF or CSV for unpaginated output.
Simplify your view logic...a lot.
Try a helper like cycle instead of rendering complex tables or nested partials.
Move your rendering into the client with JSON and JavaScript.
That's about it, really. If one or more of those don't get you where you need to go, it may be time to revisit your requirements.
I suggest you use fragment caching. Reading a fragment is very fast (~0.5 ms) and in my experience you'll see a huge speedup gain by not re-rendering your partials again and again. It's also a fairly cheap solution as Rails takes care of invalidating the fragments (if you use the model as part of the cache key) and it requires minimal changes in your template. I.e. the solution could be as simple as:
<% #orders.each do |order| %>
<% cache ["v1", order] do %>
<%= render order %>
<% end %>
<% end %>
Related
I have just started using caching in a production application to speed things up. I've read the primary Rails guide, various blogs, the source itself, etc. But my head is still not clear on one simple thing when it comes to fragment caching:
When you destroy the cache after updating the object, are you only updating the single object, or the class? I think just the single object.
Here's an example:
<% #jobs.each do |job| %>
<% cache("jobs_index_table_environment_#{session[:merchant_id]}_job_#{job}") do %>
stuff
<% end %>
<% end %>
I use the code above in my jobs index page. Each row is rendered with some information the user wants, some CSS, clickable to view the individual job, etc.
I wrote this in my Job class (model)
after_save do
Rails.cache.delete("jobs_index_table_environment_#{merchant_id}_job_#{self}")
end
after_destroy do
Rails.cache.delete("jobs_index_table_environment_#{merchant_id}_job_#{self}")
end
I want the individual job objects destroyed from the cache if they are updated or destroyed, and of course newly created jobs get their own cache key the first time they pop on the page.
I don't do the Russian doll thing with #jobs because this is my "god" object and is changing all the time. The cache would almost never be helpful as the collection probably morphs by the minute.
Is my understanding correct that in the above view, if I rendered, say, 25 jobs to the first page, I would get 25 objects in my cache with the cache key, and then if I only change the first, it's cached value would be destroyed and the next time the jobs page loads, it would be re-cached while the other 24 would just be pulled from the cache?
I'm a novice to fragment caching as well, and I just encountered a very similar use-case so I feel my (limited) knowledge is fresh enough to be of help.
Trosborn is correct, your terminal will highlight when you READ and when you WRITE, which shows you how many "hits" you got on your cache. It should only WRITE when you've changed an object. And based on what I see above, your delete is only deleting individual records.
However, I think there is a potentially simpler way to accomplish this, which is passing the ActiveRecord object to the cache, such as:
<% #jobs.each do |job| %>
<% cache(job) do %>
stuff
<% end %>
<% end %>
Read this post from DHH on how this works. In short, when an AR object is passed to cache, the key is generated not just on the model name, but also on the id and the updated_at fields.
Obsolete fragments eventually get pushed out of the cache when memory runs out, so you don't need to worry about deleting old cache objects.
I have a model "Wrapper", which has_many of another model "Category", which in turn has_many of another model, "Thing".
"Thing" has the integer attributes :count and :number. It also has an instance method defined as such in models/thing.rb:
def ratio
(self.count + self.number).to_f / Thing.all.count.to_f
end
"Category", then, has this instance method, defined in models/category.rb:
def thing_ratios
self.things.sum(&:ratio.to_f)
end
Finally, my wrapper.html.erb view shows Categories, listed in order of thing_ratios:
<%= #wrapper.categories.all.order(&:thing_ratios).each do |category| %>
...
My question is this: every time someone reloads the page wrapper.html.erb, would every single relavant calculation have to be recalculated, all the way down to self.count for every Thing associated with every Category on the page?
In addition to the resources that #Kelseydh provided, you can also consider memoization for when you hit the same function multiple times as part of the same request. However, it will not retain its value after the request is processed.
Yes it will be recalculated every time. If this is an expensive operation you should add a counter_cache (guide: http://railscasts.com/episodes/23-counter-cache-column) for the count and look into caching the query result using a service like memcache.
Many caching strategies exist, but for the database/Rails app itself Russian doll caching is considered the most flexible approach. If your data doesn't update often (meaning you don't need to worry about cache expiration often) you may be able to get way with page caching -- if so, count yourself lucky.
Some resources to get you started:
DHH on Russian Doll Caching: https://signalvnoise.com/posts/3113-how-key-based-cache-expiration-works).
Railscast on cache keys: http://railscasts.com/episodes/387-cache-digests
Advanced caching guide: http://hawkins.io/2012/07/advanced_caching_revised/
Not free, but I found this series was what really let me understand various forms of caching correctly:
http://www.pluralsight.com/courses/rails-4-1-performance-fundamentals
I have a RoR application which contains an API to manage applications, each of which contain recipes (and groups, ingredients, measurements).
Once the user has finished managing the recipes, they download a JSON file of the entire application. Because each application could have hundreds of recipes, the files can be large. It also means there is a lot of DB calls to get all the required data to export.
Now because of this, the request to download the application can take upwards of 30 seconds, sometimes more.
My current code looks something like this:
application.categories.each do |c|
c.recipes.each do |r|
r.groups.each do |r|
r.ingredients.each do |r|
Within each loop I'm storing the data in a HASH and then giving it to the user.
My question is: where do I go from here?
Is there a way to grab all the data I require from the DB in one query? From looking at the log, I can see it is running hundreds of queries.
If the above solution is still slow, is this something I should put into a background process, and then email the user a link (or similar)?
There are of course ways to grab more data at once. This is done with Rails includes or joins, depending on your needs. See this article for some detailed information.
The basic idea is that you can join between your tables so that each time new queries aren't generated. When you do application.categories, that's one query. For each of those categories, you'll do another query: c.recipes - this creates N+1 queries, where N is the number of categories you have. Rather, you can include them off the get go to create 1 or 2 queries (depending on what Rails does).
The basic syntax is easy:
Application.includes(:categories => :recipes).each do |application| ...
This generates 1 (or 2 - again, see article) query that grabs all applications, their categories, and each categories recipies all at once. You can tack on the groups and ingredients too.
As for putting the work in the background, my suggestion would be to just have a loading image, or get fancy by using a progress bar.
First of all I have to assume that the required has_many and belongs_to associations exist.
Generally you can do something like
c.recipes.includes(:groups)
or even
c.recipes.includes(:groups => :ingredients)
which will fetch recipes and groups (and ingredients) at once.
But since you have a quite big data set IMO it would be better if you limited that technique to the deepest levels.
The most usefull approach would be to use find_each and includes together.
(find_each fetches the items in batches in order to keep the memory usage low)
perhaps something like
application.categories.each do |c|
c.recipes.find_each do |r|
r.groups.includes(:ingredients).each do |r|
r.ingredients.each do |r|
...
end
end
end
end
Now even that can take quite a long time (for an http request) so you can consider using some async processing where the client will generate a request that is going to be processed by the server as a background job, and when that is ready, you can provide a download link (or send an email) to the client.
Resque is one possible solution for handling the async part.
I've taken the quote below, which I can see some sense in:
"Cached pages and fragments usually depend on model states. The cache doesn't care about which actions create, change or destroy the relevant model(s). So using a normal observer seems to be the best choice to me for expiring caches."
For example. I've got a resque worker that updates a model. I need a fragment cache to expire when a model is updated / created. This can't be done with a sweeper.
However, using an observer will mean I would need something like, either in the model or in the Resque job:
ActionController::Base.new.expire_fragment('foobar')
The model itself should not know about caching. Which will also break MVC principles that will lead to ugly ugly results down the road.
Use an ActiveRecord::Observer to watch for model changes. It can expire the cache.
You can auto-expire the cache by passing the model as an argument in your view template:
<% cache #model do %>
# your code here
<% end %>
What's happening behind the scenes is a cache named [model]/[id]-[updated_at] is created. Models have a method cache_key, which returns a string containing the model id and updated_at timestamp. When a model changes, the fragment's updated_at timestamp won't match and the cache will re-generate.
This is a much nicer approach and you don't have to worry about background workers or expiring the cache in your controllers/observers.
Ryan Bates also has a paid Railscast on the topic: Fragment Caching
A good and simple solution would be not to expire but to cache it with a key that will be different if the content is different. Here is an example
<% cache "post-#{#post.id}", #post.updated_at.to_i do %>
When that post gets updated or deleted and you fetch it again, it will miss the cache since the hash is different, so it will kind of expire and cache the new value. I think you can have some problems by doing this, for example if you are using the Rails default cache wich creates html files as cache, so you would end up with a lot of files in your public dir after some time, so you better set your application to use something like memcached, wich manages memory deleting old cached records/pages/parcials if needed to cache others or something like that.
I'd recommend reviewing this section on sweepers in the Rails Guide - Caching with Rails: An overview
http://guides.rubyonrails.org/caching_with_rails.html#sweepers
It looks like this can be done without specifically creating lots of cache expiration observers.
Introduction
I have a (mostly) single-page application built with BackboneJS and a Rails backend.
Because most of the interaction happens on one page of the webapp, when the user first visits the page I basically have to pull a ton of information out of the database in one large deeply joined query.
This is causing me some rather extreme load times on this one page.
NewRelic appears to be telling me that most of my problems are because of 457 individual fast method calls.
Now I've done all the eager loading I can do (I checked with the Bullet gem) and I still have a problem.
These method calls are most likely ocurring in my Rabl serializer which I use to serialize a bunch of JSON to embed into the page for initializing Backbone. You don't need to understand all this but suffice to say it could add up to 457 method calls.
object #search
attributes :id, :name, :subscription_limit
# NOTE: Include a list of the members of this search.
child :searchers => :searchers do
attributes :id, :name, :gravatar_icon
end
# Each search has many concepts (there could be over 100 of them).
child :concepts do |search|
attributes :id, :title, :search_id, :created_at
# The person who suggested each concept.
child :suggester => :suggester do
attributes :id, :name, :gravatar_icon
end
# Each concept has many suggestions (approx. 4 each).
node :suggestions do |concept|
# Here I'm scoping suggestions to only ones which meet certain conditions.
partial "suggestions/show", object: concept.active_suggestions
end
# Add a boolean flag to tell if the concept is a favourite or not.
node :favourite_id do |concept|
# Another method call which occurs for each concept.
concept.favourite_id_for(current_user)
end
end
# Each search has subscriptions to certain services (approx. 4).
child :service_subscriptions do
# This contains a few attributes and 2 fairly innocuous method calls.
extends "service_subscriptions/show"
end
So it seems that I need to do something about this but I'm not sure what approach to take. Here is a list of potential ideas I have:
Performance Improvement Ideas
Dumb-Down the Interface
Maybe I can come up with ways to present information to the user which don't require the actual data to be present. I don't see why I should absolutely need to do this though, other single-page apps such as Trello have incredibly complicated interfaces.
Concept Pagination
If I paginate concepts it will reduct the amount of data being extracted from the database each time. Would product an inferior user interface though.
Caching
At the moment, refreshing the page just extracts the entire search out of the DB again. Perhaps I can cache parts of the app to reduce on DB hits. This seems messy though because not much of the data I'm dealing with is static.
Multiple Requests
It is technically bad to serve the page without embedding the JSON into the page but perhaps the user will feel like things are happening faster if I load the page unpopulated and then fetch the data.
Indexes
I should make sure that I have indexes on all my foreign keys. I should also try to think about places where it would help to have indexes (such as favourites?) and add them.
Move Method Calls into DB
Perhaps I can cache some of the results of the iteration I do in my view layer into the DB and just pull them out instead of computing them. Or I could sync things on write rather than on read.
Question
Does anyone have any suggestions as to what I should be spending my time on?
This is a hard question to answer without being able to see the actual user interface, but I would focus on loading exactly only as much data as is required to display the initial interface. For example, if the user has to drill down to see some of the data you're presenting, then you can load that data on demand, rather than loading it as part of the initial payload. You mention that a search can have as many as 100 "concepts," maybe you don't need to fetch all of those concepts initially?
Bottom line, it doesn't sound like your issue is really on the client side -- it sounds like your server-side code is slowing things down, so I'd explore what you can do to fetch less data, or to defer the complex queries until they are definitely required.
I'd recommend separating your JS code-base into modules that are dynamically loaded using an asset loader like RequireJS. This way you won't have so many XHRs firing at load time.
When a specific module is needed it can load and initialize at an appropriate time instead of every page load.
If you complicate your code a little, each module should be able to start and stop. So, if you have any polling occurring or complex code executing you can stop the module to increase performance and decrease the network load.