I have a page which renders a lot of partials.
I fragment cache them all, which makes it very fast. Horray!
The thing is, that because of the amount of partials, the first run, when writing the cache, takes so long, the request timeout (but the other times are really fast)
I also use sidekiq (but the question is relevant to any background processor)
Is there a way to save those partials in a background process so users that miss the cache (due to expiration) won't have a timeout? So I would go over all partials, and those of which the cache expired (or is going to expire soon) I will recache them?
I only know of preheat gem, but I think it is still not complex enough for my need. Plus it hasn't been maintained for ages.
I was working on some project and had similar problem. Actually it was problem with only what page and problem with loading right after cleaning the cache. I solved it on another way (I didn't have anything like sidekiq, so maybe it will not be right solution for you, but maybe will be helpful)
What I did, is that right after cleaning the cache a called open() method and put the problematic url as parameter like:
open('http://my-website/some-url')
so, after cleaning the cache, that url was being called and it creates a new cache automatically. We solved that problem quickly on that way. I know that the some background workers would be better solutions, but for me it worked.
Just to say, that our cache was cleaning by the cron, not manually.
UPDATE
or maybe if you want do clean the cache manually, you can after cleaning the cache call open('http://my-website/some-url') but using the sidekiq (I didn't try this, it's only idea).
Of course, my problem was with only one page, but if you want whole website, it makes things complicated.
Related
Caching is something that I kind of ignored for a long time, as projects that I worked on were on local intranets with very little activity. I'm working on a much larger Rails 3 personal project now, and I'm trying to work out what and when I should cache things.
How do people generally determine this?
If I know a site is going to be relatively low-activity, should I just cache every single page?
If I have a page that calls several partials, is it better to do fragment caching in those partials, or page caching on those partials?
The Ruby on Rails guides did a fine job of explaining how caching in Rails 3 works, but I'm having trouble understanding the decision-making process associated with it.
Don't ever cache for the sake of it, cache because there's a need (with the exception of something like the homepage, which you know is going to be super popular.) Launch the site, and either parse your logs or use something like NewRelic to see what's slow. From there, you can work out what's worth caching.
Generally though, if something takes 500ms to complete, you should cache, and if it's over 1 second, you're probably doing too much in the request, and you should farm whatever you're doing to a background process…for example, fetching a Twitter feed, or manipulating images.
EDIT: See apneadiving's answer too, he links to some great screencasts (albeit based on Rails 2, but the theory is the same.)
You'll want to think about caching several kinds of things:
Requests that are hit a lot, and seldom change
Requests that are "expensive" to draw, lots of database calls, etc. Also hopefully these seldom change.
The other side of caching that shouldn't go without mention, is expiration. Its also often the harder part. You have to know when a cache is no longer good, and clear it out so fresh content will be generated. Sweepers, or Observers, depending on how you implement your cache can help you with this. You could also do it just based on a time value, allow caches to have a max-age and clear them after that no matter what.
As for fragment vs full page caching, think of it in terms of how often those parts are updated. If 3 partials of a page are never updated, and one is, maybe you want to cache those 3, and allow that 1 to be fetched live for so you can have up to the second accuracy. Or if the different partials of a page should have different caching rules: maybe a "timeline" section is cached, but has a cache-age of 1 minute. While the "friends" partial is cached for 12 hours.
Hope this helps!
If the site is relatively low activity you shouldn't cache any page. You cache because of performance problems, and performance problems come about because you have too much data to query, too many users, or worse, both of those situations at the same time.
Before you even think about caching, the first thing you do is look through your application for the requests that are taking up the most time. Not the slowest requests, but the requests your application spends the most aggregate time performing. That is if you have a request A that runs 10 times at 1500ms and request B that runs 5000 times at 250ms you work on optimizing B first.
It's actually pretty easy to grep through your production.log and extract rendering times and URLs to combine them into a simple report. You can even do that in real-time if you want.
Once you've identified a problematic request, you go about picking apart what it's doing to service the request. The first thing is to look for any queries that can be combined by using eager loading or by looking ahead a bit more to anticipate what you'll need. The next thing is to ensure you're not loading data that isn't used.
So many times you'll see code to list users and it's loading 50KB per person of biographical data, their Facebook and Twitter handles, literally everything about them, and all you use is their name.
Fetch as little as you need, and fetch it in the most efficient way you can. Use connection.select_rows when you don't need models.
The next step is to look at what kind of queries you're running, and how they're under-performing. Ensure your indexes are all set properly and are being used. Check that you're not doing complicated JOIN operations that could be resolved by a bit of tactical de-normalization.
Have a look at what data you are storing in your application, and try and find things that can be removed from your production database and warehoused somewhere else. Cycle your data out regularly when it's no longer relevant, preserve it in a separate database if you need to.
Then go over and have a look at how your database server is tuned. Does it have sufficiently large buffers? Is it on hardware that could be upgraded with more memory at a nominal cost? Too many people are running a completely un-tuned database server and with a few simple settings they can get ten-fold performance increases.
If, and only if, you still have a performance problem at this point then you might want to consider caching.
You know why you don't cache first? It's because once you cache something, that cached data is immediately stale. If parts of your application use this data under the assumption it's always up to date, you will have problems. If you don't expire this cache when the data does change, you will have problems. If you cache the data and never use it again, you're just clogging up your cache and you will have problems. Basically you'll have lots of problems when you use caching, so it's often a last resort.
I have certain pieces of data that are updated infrequently - say, once every few weeks or so. Additionally, they're shared amongst users, and can take some time to render as JSON. This makes them ideal to cache. So, I cache their rendered output via caches_action.
Soon, I'll be updating this data from a Resque job, and after updating it successfully, I will need to invalidate the cache. I'm not sure where to do this, as it seems like the job of the controller. It doesn't seem right to put it in the model, as its more of a presentation layer concern. (After all, why should the model care that JSON output takes forever?)
I don't think a sweeper would work here, as it operates within a controller, correct? I've seen people suggest instantiating the controller in question within the job, but that really isn't nice either. Has anyone dealt with this in a DRY-ish way? The only way I see to do it is to manipulate Rails.cache manually.
Would an observer work for you? You can setup observers to watch for changes to records then do something after it changes, but it mainly just reduces clutter in the model. You'd have to use Rail.cache in either place. An observer just cleans that code out of the model.
Check out:
http://www.daokaous.com/rails3.0.0_doc/classes/ActiveModel/Observer.html
PS: I was doing to some random search and then I got detrusion.com.
Whats this web application firewall ?
How it works ?
Any performance hit, if yes then how much?
Should I use this destruction.com or anything else better available.
Anybody??
I quickly glanced at the code and it doesnt appear to be doing all that much. Basically it maintains a white and black list of IPs. While it cannot be that much of a crazy performance hit you'd probably be better off doing this kind of request analyzing in a Rack middleware, that is before it even gets to the Rails request handling.
That being said, I dont like the fact that it will re-sync every 5 minutes DURING processing a given request. That is, it will block the current request while it re-syncs its ruleset / and lists. Which means that you're at the mercy of the Detrusion.com team to keep their site/API up. So when they go down you go down.
While its not as real-timey, I'd feel more comfortable to have the updating process be out of bound. Maybe you store the rules/lists in a flat file or a local DB (Redis would be perfect) which you load on app start. Then you have a frequent cron which reloads the ruleset from Detrusion and writes it locally.
Something like that. Just anything to de-couple your request handling from a Detrusion API check.
I have been neglecting learning about caching for quite some time now, and although I've used caching here and there in the past it's not something I'm familiar with.
I found a great tutorial about what caching is and what kinds of cache there are (I know already what caching is), but...
How does one decide what and when to cache? Are there things that should always be cached? On what situations should you never use caching?
First rule is: Don't cache until you need it, that would be premature optimization (first link I found, google for more info)
The biggest problem with caching is invalidation of cache. What happens when the data you have cached is being updated. You need to make sure your cache is updated as well and if not done correctly often becomes a mess.
I would:
Build the application without
caching and make sure the
functionality works as intended
Do some performance testing, and
apply caching when needed
After applying caching do
performance testing again to check
that you are getting the expected speed increase
I think the easiest way is to ask yourself a bunch of questions,
Is this result ever going to change?
No? then cache it permanently
Yes, When is it going to change? When a user updates something.
Is it going to impact only the particular user who changed the value or all of the users. This should give you an indication of when to clear the particular cache.
You can keep on going, but after awhile you will end up with different profiles
UserCache, GlobalCache just being 2 examples.
These profiles should be able to tell you what to cache and have a certain update criteria (When to refresh the cache)
I seem to be having some extremely odd cache_money interactions.
When I am on the console, and I create a new instance of a class and save it I see the cache misses and cache stores on my memcached console output. Then when the create finishes I see a bunch of cache deletions.
If I then try to do any kind of find for the newly created object (or any other objects for that matter) I never see any cache access.
This is highly confusing. I could kind of understand if all finds never hit the cache (though that in and of itself would be an issue requiring investigation), but finds do seem to hit the cache when the object is being created (checking for associations and such).
Anyone have this experience in the past at all? Any thoughts?
AFAIK there isn't really much in the way of configuration options for cache_money, and it certainly doesn't seem like there are any that would be on by default and be creating these kinds of symptoms.
My cache_money config is basically straight out of the docs.
Any help would be greatly appreciated.
Okay, this looks like this was a problem on my side. I had some failing tests and thought they were due to a line of code in cache_money. I changed the line in the cache money code, and did a few other changes and my problem was fixed.
It seems though my fix to cache_money though actually broke things. I just installed a pristine copy of cache money and all is well with the world.
If this is in your tests, make sure you are either mocking memcached or flushing memcache in your test setup/before filters. CREAM!