Page caching using delayed job - ruby-on-rails

Hey all, if you've ever posted on [craigslist], this question should make sense to you. Whenever you post a listing (to sell furniture or an apartment, for example), your listing is not immediately thrown up on the site. Rather, listings will appear in batches (numbers vary) about every 10-15 minutes. At first I was really over-thinking this behavior, trying to hold records and then do mass inserts, but I realized it was much simpler. After talking with some colleagues, it made sense that Craigslist is caching their pages and then emptying that cache every 10-15 minutes. This severely decreases the load on their database.
Now, to my question. How do I accomplish the same thing in Rails? I know how to implement caching - I've read the [caching with Rails guide]. I will be using action caching and fragment caching (because I can't cache the whole page). I still need to do validations and access controls, so I can't fully cache the page...

To accomplish timed page caching you can utilize the standard Rails' caching plus a little timed cleverness.
First you want to determine your level of caching. You've got three options:
Page Caching - caches the whole page but subsequent requests don't go to through Rails stack. So if this is a Craiglist-esque page that will be hit thousands of times a second this request will only go to your webserver (e.g. apache) not Rails or your db making it much faster. The trade off is that you lose authentication, session variables, etc that Rails provides.
Action Caching - caches the whole page but brings the request into Rails so that it can execute any filters associated with that action.
Fragment Caching - caches a segment of the page, essentially bypassing the need to execute the code with in the block (and any consequential calls to the DB).
Then you'll need pick the appropriate level of caching and implement it in your app (check out the links above for implementation examples).
Once you have implemented the caching you now have to figure out a way to expire the cache. I can think of two ways to do this, both come with benefits and drawbacks. For now let's assume you've chosen to use action caching.
Reliable but more involved - create an action within your controller that expires the cache and a cron job task that makes a request to that action. I've asked a similar question that addresses this 'built in' scheduled task. For security precautions, you may want to include a generated hash or something similar so that someone can't manually expire your cache by going to '/products/expire_cache'.
class ProductsController < ApplicationController
caches_action :index
def index
# implementation
end
def expire_cache
if params[:verification_hash] == 'sa89sf8sfsfehiwaf89yfea98fh'
expire_action :action => :index
end
end
end
Unreliable but easier - simply expire the cache in your action with an arbitrary conditional. This implementation assumes that there will be enough traffic to regularly ensure that someone will come to your site on the 0, 15, 30, and 45 minutes. You could decrease this interval in order ensure that the cache will be reset at a more probable interval.
class ProductsController < ApplicationController
caches_action :index
def index
# implementation
expire_action :action => :index if Time.now.min % 15 == 0
end
end

I think there are two approaches to what you want to accomplish:
Simply cache with fragments and actions so that on the first hit of the page the database is accessed and the page loads normally, but every subsequent hit is from the cached version. The major upside of this approach is that you don't need to deal with delayed jobs and rendering your pages outside of the regular flow of things.
Create a delayed job that renders your page or the individual fragments that get cached. During the rendering in the delayed job the page fragments will actually get cached as if a user were viewing them (provided you have implemented fragment and action caching normally). Once the delayed job is done, populate a column in your database that indicates that this record/page is ready for viewing.

Probably the easiest way for caching in Rails is using Memcached with :expires_in option.
You will probably need a VPS server to use it which could be expensive for smaller sites.
But you can do good time based caching even without memcached. This little SimpleCache snippet has worked wonders for my shared hosted sites.

Related

Rails expire fragment cache from models

I am working with cache in my Rails project and wants to expire the cache of a particular url. I got the following command to expire fragment corresponding to the URL passed:
ActionController::Base.new.expire_fragment("localhost:3000/users/55-testing-devise/boards/")
I am confused where to put this code in my Rails project so that it gets executed as soon as the url in a text field is added and expire button is clicked.
You should probably consider a different approach. Models should not be concerned with how caching works, and traditionally the whole sweeper approach tends to become complex, unwieldy and out of sync with the rest of the code.
Basically, you should never have to expire fragments manually. Instead, you change your cache key/url once your model is updated (so that you have a new cache entry for the new version).
Common wisdom nowadays is to use the Russian Doll Caching approach. The link goes to an article that explains the basics, and the upcoming Rails 4 will contain even better support.
This is probably the best way to go for the majority of standard Rails applications.
The ActionController::Caching::Sweeper is a nice way to do this, its part of Rails observer.
http://api.rubyonrails.org/classes/ActionController/Caching/Sweeping.html
class MyModelSweeper < ActionController::Caching::Sweeper
observe MyModel
def after_save(object)
expire_fragment('...')
end
end
The expire_fragment won't work, as you don't have the digest added to the key. See DHH here: https://github.com/rails/cache_digests/issues/35
I've posted a related answer about caching a json response: https://stackoverflow.com/a/23783119/252799

Caching large numbers of ActiveRecord objects

There's an oft-called method in my Rails application that retrieves ~200 items from the database. Rather than do this again and again, I store the results using Rails.cache.write. However, when I retrieve the results using Rails.cache.read, it's still very slow: about 400ms. Is there any way to speed this up?
This is happening in a controller action, and I'd prefer users not have to wait so long to load the page.
FYI regarding Rails caching, from the Rails Guides, "...It’s important to note that query caches are created at the start of an action and destroyed at the end of that action and thus persist only for the duration of the action."
If you can share the method, I may be able to help more quickly. Otherwise, a couple performance best practices:
Use .includes to avoid N+1 queries. Define this in the model and
call it in the controller.
How are your indexes set-up (if any)?

Keep value in memory across requests and across users in Rails controller? Use class variable?

We're on Rails 3.0.6.
We maintain a list of numbers that changes only once a month, but nearly every page request requires access to this list.
We store the list in the database.
Instead of hitting the database on every request and grabbing the list, we would like to grab the data once and stash it in memory for efficient access.
If we store the list in each user session, we still need to hit the database for each session.
Is there a way to only hit the database once and let the values persist in memory across all users and all sessions? We need access to the list from the controller. Should we define a class variable in the controller?
Thanks!
I think Rails.cache is the answer to your problem here. It's a simple interface with multiple backends, the default stores the cache in memory, but if you're already using Memcached, Redis or similar in your app you can plug it into those instead.
Try throwing something similar to this in your ApplicationController
def list_of_numbers
#list_of_numbers ||= Rails.cache.fetch(:list_of_numbers, :expires_in => 24.hours) do
# Read from database
end
end
It will try to read from the cache, but if it doesn't find it, will do the intensive stuff and store it for next time
The pattern you're looking for is known as a singleton which is a simple way to cache stuff that doesn't change over time, for example, you'll often see something like this in application_controller.rb -- your code always calls the method
def current_user(user_id)
#current_user ||= User.find user_id
end
When it does, it checks the instance variable #current_user and returns it if not nil, otherwise it does the database lookup and assigns the result to the instance variable, which it returns.
Your problem is similar, but broader, since it applies to all instances.
One solution is with a class variable, which is documented here http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_classes.html#S3 -- a similar solution to the one above applies here.
This might be a good solution in your case, but has some issues. In specific, (assuming this is a web app) depending on your configuration, you may have multiple instances of Rails loaded in different processes, and class variables only apply to their specific instance. The popular Passenger module (for Apache and Nginx) can be configured to allow class variables to be accessible to all of it's instances ... which works great if you have only one server.
But when you have multiple servers, things get a little tricky. Sure, you could use a class variable and accept that you'll have to make one hit to the database for each server. This works great except for the when that the variable ... varies! You'll need some way of invalidating the variable across all servers. Depending on how critical the it is, this could create various very gnarly and difficult to track down errors (I learned the hard way :-).
Enter memcached. This is a wonderful tool that is a general purpose caching tool. It's very lightweight, and very, very smart. In particular, it can create distributed caches across a cluster of servers -- the value is only ever stored once (thus avoiding the synchronization problem noted above) and each server knows which server to look on to find any given cache key. It even handles when servers go down and all sorts of other unpleasantries.
Setup is remarkably easy, and Rails almost assumes you'll use it for your various caching needs, and the Rails gem just makes it as simple as pie.
On the assumption that there will be other opportunities to cache stuff that might not be as simple as a value you can store in a class variable, that's probably the first place to start.

Best Practices for Optimizing Dynamic Page Load Times (JSON-generated HTML)

I have a Rails app where I load up a base HTML layout and I fill in the main content with rows of divs from JSON. This works in 2 steps:
Render the HTML
Ajax call to get the JSON
This has the benefit of being able to cache the HTML layout which doesn't change much, but it seems to have more drawbacks:
2 HTTP requests
HTML isn't that complex, the generated html is where all the work is done, so I'm not saving that much on time probably.
Each request in my specific case requires that we check the current user, their roles, and some things related to that user, so those 2 calls are somewhat involved.
Granted, memcached will probably solve a lot of this, I am wondering if there are some best practices here. I'm thinking I could do this:
Render the first page of JSON inline, in a script block, along with the HTML. This would cut out those 2 server calls requiring user authentication. And, assuming 80% of the time you don't need to make the second ajax call (pagination/sorting in this case), that seems like a fairly good solution.
What are your thoughts on how to approach this?
There are advantages and disadvantages to doing stuff like this. In general I'd say it's only a good idea, if whatever you're delaying via an ajax call would delay the page load enough to annoy the end user for most of the use cases on your page.
A good example of this is browsing a repository on github. 90% of the time all you want is to navigate the files, so they use an ajax load to fill in the commit messages per file after the page load.
It sounds like you're trying to do this to speed up or do something fancy for your users, but I think you should consider instead, what part is slow, and what speed of page load (and maybe for what information on that page) on your users are expecting. As you say, using memcached or fragment caching might well give you the improvements you're looking for.
Are you using some kind of monitoring tool? I'm using the free version of New Relic RPM on Heroku. It gives a lot of data on request times for individual controller actions. Data like that could help you focus your optimization process.

Is Rails Metal (& Rack) a good way to implement a high traffic web service api?

I am working on a very typical web application. The main component of the user experience is a widget that a site owner would install on their front page. Every time their front page loads, the widget talks to our server and displays some of the data that returns.
So there are two components to this web application:
the front end UI that the site owner uses to configure their widget
the back end component that responds to the widget's web api call
Previously we had all of this running in PHP. Now we are experimenting with Rails, which is fantastic for #1 (the front end UI). The question is how to do #2, the back serving of widget information, efficiently. Obviously this is much higher load than the front end, since it is called every time the front page loads on one of our clients' websites.
I can see two obvious approaches:
A. Parallel Stack: Set up a parallel stack that uses something other than rails (e.g. our old PHP-based approach) but accesses the same database as the front end
B. Rails Metal: Use Rails Metal/Rack to bypass the Rails routing mechanism, but keep the api call responder within the Rails app
My main question:
Is Rails/Metal a reasonable approach for something like this?
But also...
Will the overhead of loading the Rails environment still be too heavy?
Is there a way to get even closer to the metal with Rails, bypassing most of the environment?
Will Rails/Metal performance approach the perf of a similar task on straight PHP (just looking for ballpark here)?
And...
Is there a 'C' option that would be much better than both A and B? That is, something before going to the lengths of C code compiled to binary and installed as an nginx or apache module?
Thanks in advance for any insights.
Not really the most elaborate answer but:
I would not use metal for this, I would use page caching instead. That way, the requests will be served by the webserver and no dynamic languages at all. When you create a resource clear the corresponding index page. A very basic example would be:
class PostsController < ApplicationController
caches_page :index
def index
#posts = Post.all
respond_to do |format|
format.html
format.xml
end
end
def create
#post = Post.new(params[:post])
respond_to do |format|
if #post.save
expire_page :action => :index
format.html { redirect_to posts_path }
format.xml
else
format.html { render :action => "new" }
end
end
end
end
For more information read the Caching Guide.
I would only start pulling functionality down to Rack/Metal if I had determined the exact cause of any performance issue being encountered. Particular in recent versions of Rails (3, in particular) and Ruby, the stack itself is very rarely the bottleneck. Start measuring, get some real metrics and optimise judiciously.
My rule of thumb: if you don't have metrics, you can't reason intelligently about your performance issue and any possible solution.
The issues in my experience are nearly always: the views and the database.
As Ryan suggests, caching can be incredibly effective ... you can even move you architecture to use a reverse proxy in front of your Rails request stack to provide even more capability. A cache like Varnish provides incredibly high-performance. Rails has built-in support for etags and HTTP headers for facilitating a reverse proxy solution.
The other thing to do is look at the db layer itself. The cache can go a long way here, but some optimisation may be useful here too. Making sure you use Active Record's :include sensibly is a great step to avoid N+1 query situations, but there is fantastic support in Rails for dropping memcached into the stack with little or no configuration, which can provide excellent performance gains.
PHP loads the entire environment on each request. In production mode, Rails loads the entire environment once when the server starts up. There is certainly a fair amount of Ruby code being executed during normal controller-action invocations. However, in production mode, none of this code is related to loading the environment. And using Rails Metal instead of the usual Rails controller stack removes a number of those layers, yielding a few additional milliseconds of time saved per request.

Resources