why is there need to set an expiry time for caching? - ruby-on-rails

I don't see this issue explained in the Rails caching guide http://guides.rubyonrails.org/caching_with_rails.html, so I wonder if I might ask how caching is working exactly in this example. On my user profile page, I cache the languages the user speaks and set an expiry for 15 minutes. When I did this, I assumed that if the user update his languages before those 15 minutes expired, then the updated languages wouldn't show, because the cache hadn't expired. However, when I test this on my app, the updated languages are showing immediately, so I assume that updating breaks the cache. If that's the case, then why wouldn't I set the expiry date to 1 hour or infinity?
#languages = Rails.cache.fetch("lang", :expires_in => 15.minutes) do
Language.where({:user_id => #user.id})
end
Note, I'm using Rails 4 with memcached if that's important.
Update, if the expiry time is just about clearing the cache due to size limitations, how long should I set the expiry for?
I have a lot of information (about 15 queries similar to below) on my profile pages that I'd prefer to cache if a user keeps refreshing the page, therefore I was just going to do this
#endorsements = Rails.cache.fetch("endorsements", :expires_in => 15.minutes) do
Endorsement.where({:subject_id => #user.id})
end
#languages = Rails.cache.fetch("lang", :expires_in => 15.minutes) do
Language.where({:user_id => #user.id})
end

Here's what you need to do in Rails4 to get the caching to work (in development) as you'd expect:
Add 'dalli' to your Gemfile
add config.cache_store = :mem_cache_store to your config/environments/development.rb
add config.action_controller.perform_caching = true to your config/environments/development.rb
(I know you already have #3 done)
Once this is complete, you won't see the "SELECT *" in your logs anymore, and when you update your models, it will not automatically update your cache.
UPDATE:
Like #FrederickCheung says, you need to cache objects, not relations (queries). Best way is to call "to_a" on them.

You're not actually caching anything: you are just caching ActiveRecord::Relation objects (which is pretty much just a ruby description of a query), rather than the query results itself.
Each time the code runs, this query is pulled in its unexecuted state and then run again. To achieve what you wanted to do, you need to force the query to be executed, for example
#endorsements = Rails.cache.fetch("endorsements", :expires_in => 15.minutes) do
Endorsement.where({:subject_id => #user.id}).all
end
Cache expiry can be tricky - it's sometimes easier just to have cached items expire automatically rather than ensuring that every single way of changing the data clears the cache. In some cases you may not even know when the data changes, for example if you are caching the results of an external api call.

Related

Proper Rails low-level caching with concurrency

I want to run my Rails 5 app on Puma. I use low-level caching and suppose the way to have thread-safe caching:
# somewhere in a model ...
##mutex = Mutex.new
def nice_suff
Rails.cache.fetch("a_key") do
##mutex.synchronize do
Rails.cache.fetch("a_key", 60) do
Model.stuff.to_a
end
end
end
end
Will this be working fine?
The proper way to handle concurrent cache access is already built-in.
val_1 = cache.fetch('foo', race_condition_ttl: 10.seconds) do
Model.stuff.to_a
end
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load several different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting the new value. The key is to keep :race_condition_ttl small.

How does Rails 4 Russian doll caching prevent stampedes?

I am looking to find information on how the caching mechanism in Rails 4 prevents against multiple users trying to regenerate cache keys at once, aka a cache stampede: http://en.wikipedia.org/wiki/Cache_stampede
I've not been able to find out much information via Googling. If I look at other systems (such as Drupal) cache stampede prevention is implemented via a semaphores table in the database.
Rails does not have a built-in mechanism to prevent cache stampedes.
According to the README for atomic_mem_cache_store (a replacement for ActiveSupport::Cache::MemCacheStore that mitigates cache stampedes):
Rails (and any framework relying on active support cache store) does
not offer any built-in solution to this problem
Unfortunately, I'm guessing that this gem won't solve your problem either. It supports fragment caching, but it only works with time-based expiration.
Read more about it here:
https://github.com/nel/atomic_mem_cache_store
Update and possible solution:
I thought about this a bit more and came up with what seems to me to be a plausible solution. I haven't verified that this works, and there are probably better ways to do it, but I was trying to think of the smallest change that would mitigate the majority of the problem.
I assume you're doing something like cache model do in your templates as described by DHH (http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works). The problem is that when the model's updated_at column changes, the cache_key likewise changes, and all your servers try to re-create the template at the same time. In order to prevent the servers from stampeding, you would need to retain the old cache_key for a brief time.
You might be able to do this by (dum da dum) caching the cache_key of the object with a short expiration (say, 1 second) and a race_condition_ttl.
You could create a module like this and include it in your models:
module StampedeAvoider
def cache_key
orig_cache_key = super
Rails.cache.fetch("/cache-keys/#{self.class.table_name}/#{self.id}", expires_in: 1, race_condition_ttl: 2) { orig_cache_key }
end
end
Let's review what would happen. There are a bunch of servers calling cache model. If your model includes StampedeAvoider, then its cache_key will now be fetching /cache-keys/models/1, and returning something like /models/1-111 (where 111 is the timestamp), which cache will use to fetch the compiled template fragment.
When you update the model, model.cache_key will begin returning /models/1-222 (assuming 222 is the new timestamp), but for the first second after that, cache will keep seeing /models/1-111, since that is what is returned by cache_key. Once 1 second passes, all of the servers will get a cache-miss on /cache-keys/models/1 and will try to regenerate it. If they all recreated it immediately, it would defeat the point of overriding cache_key. But because we set race_condition_ttl to 2, all of the servers except for the first will be delayed for 2 seconds, during which time they will continue to fetch the old cached template based on the old cache key. Once the 2 seconds have passed, fetch will begin returning the new cache key (which will have been updated by the first thread which tried to read/update /cache-keys/models/1) and they will get a cache hit, returning the template compiled by that first thread.
Ta-da! Stampede averted.
Note that if you did this, you would be doing twice as many cache reads, but depending on how common stampedes are, it could be worth it.
I haven't tested this. If you try it, please let me know how it goes :)
The :race_condition_ttl setting in ActiveSupport::Cache::Store#fetch should help avoid this problem. As the documentation says:
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load seven different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting new value. The key is to keep :race_condition_ttl small.
Great question. A partial answer that applies to single multi-threaded Rails servers but not multiprocess(or) environments (thanks to Nick Urban for drawing this distinction) is that the ActionView template compilation code blocks on a mutex that is per template. See line 230 in template.rb here. Notice there is a check for completed compilation both before grabbing the lock and after.
The effect is to serialize attempts to compile the same template, where only the first will actually do the compilation and the rest will get the already completed result.
Very interesting question. I searched on google (you get more results if you search for "dog pile" instead of "stampede") but like you, did I not get any answers, except this one blog post: protecting from dogpile using memcache.
Basically does it store you fragment in two keys: key:timestamp (where timestamp would be updated_at for active record objects) and key:last.
def custom_write_dogpile(key, timestamp, fragment, options)
Rails.cache.write(key + ':' + timestamp.to_s, fragment)
Rails.cache.write(key + ':last', fragment)
Rails.cache.delete(key + ':refresh-thread')
fragment
end
Now when reading from the cache, and trying to fetch a non existing cache, will it instead try to fecth the key:last fragment instead:
def custom_read_dogpile(key, timestamp, options)
result = Rails.cache.read(timestamp_key(name, timestamp))
if result.blank?
Rails.cache.write(name + ':refresh-thread', 0, raw: true, unless_exist: true, expires_in: 5.seconds)
if Rails.cache.increment(name + ':refresh-thread') == 1
# The cache didn't exists
result = nil
else
# Fetch the last cache, as the new one has not been created yet
result = Rails.cache.read(name + ':last')
end
end
result
end
This is a simplified summary of the by Moshe Bergman that i linked to before, or you can find here.
There is no protection against memcache stampedes. This is a real problem when multiple machines are involved and multiple processes on those multiple machines. -Ouch-.
The problem is compounded when one of the key processes has "died" leaving any "locking" ... locked.
In order to prevent stampedes you have to re-compute the data before it expires. So, if your data is valid for 10 minutes, you need to regenerate again at the 5th minute and re-set the data with a new expiration for 10 more minutes. Thus you don't wait until the data expires to set it again.
Should also not allow your data to expire at the 10 minute mark, but re-compute it every 5 minutes, and it should never expire. :)
You can use wget & cron to periodically call the code.
I recommend using redis, which will allow you to save the data and reload it in the advent of a crash.
-daniel
A reasonable strategy would be to:
use a :race_condition_ttl with at least the expected time it takes to refresh the resource. Setting it to less time than expected to perform a refresh is not advisable as the angry mob will end up trying to refresh it, resulting in a stampede.
use an :expires_in time calculated as the maximum acceptable expiry time minus the :race_condition_ttl to allow for refreshing the resource by a single worker and avoiding a stampede.
Using the above strategy will ensure that you don't exceed your expiry/staleness deadline and also avoid a stampede. It works because only one worker gets through to refresh, whilst the angry mob are held off using the cache value with the race_condition_ttl extension time right up to the originally intended expiry time.

Cache with expiring keys

I'm working on a mashup site and would like to limit the number of fetches to scrape the source sites. There is essentially one bit of data I need, an integer, and would like to cache it with a defined expiration period.
To clarify, I only want to cache the integer, not the entire page source.
Is there a ruby or rails feature or gem that already accomplishes this for me?
Yes, there is ActiveSupport::Cache::Store
An abstract cache store class. There are multiple cache store
implementations, each having its own additional features. See the
classes under the ActiveSupport::Cache module, e.g.
ActiveSupport::Cache::MemCacheStore. MemCacheStore is currently the
most popular cache store for large production websites.
Some implementations may not support all methods beyond the basic
cache methods of fetch, write, read, exist?, and delete.
ActiveSupport::Cache::Store can store any serializable Ruby object.
http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html
cache = ActiveSupport::Cache::MemoryStore.new
cache.read('Chicago') # => nil
cache.write('Chicago', 2707000)
cache.read('Chicago') # => 2707000
Regarding the expiration time, this can be done by passing the time as a initialization parameter
cache = ActiveSupport::Cache::MemoryStore.new(expires_in: 5.minutes)
If you want to cache a value with a different expiration time, you can also set this when writing a value to the cache
cache.write(key, value, expires_in: 1.minute) # Set a lower value for one entry
See Caching with Rails, particularly the :expires_in option to ActiveSupport::Cache::Store.
For example, you might go:
value = Rails.cache.fetch('key', expires_in: 1.hour) do
expensive_operation_to_compute_value()
end

What's the most optimal way to regenerate static cache with Ruby on Rails?

I have a pretty slow controller action, which does some raporting here and there. I only need to refresh the data every few days, so it was no brainer to static cache the result.
The problem is, that the action takes solid few minutes to complete and I am not sure whats the most optimal way to expire the old data and replace them with the new ones.
Now the problem with just generic expire/request is that for few minutes (time when the action is running) those data are unavailable.
Is there any resonable way to overcome this gap using just static cache mechanisms in Rails? Or should I just rebuild the whole thing in a different way?
Rails has a built-in way to use stale caches for just a bit longer when it expires while the new cache value is being regenerated. It's the :race_condition_ttl setting used in conjuction with :expires_in, as describe in the Rails Guides on Caching.
With Rails Fragment caching the syntax should be:
<% cache 'my_awesome_cache_key', :expires_in => 12.hours.to_i, :race_condition_ttl => 12.hours.to_i %>
# This block will be cached
<% end %>
Results:
1) First request: Cache is empty, so block will be executed and results written to cache
2) All requests between the next 24 hours: Will be served straight from the cache (which is fresh)
3a) First request after 24 hours but WITHIN grace period will be served from the then slightly stale cache; the cache will be regenerated in the background: the block is executed and written to the cache as soon as its finished
3b) First request after 24 hours but out of the grace period will be handled as 1), i.e. not served directly but block will be executed, cache will be written and request will be served.
race_condition_ttl was introduced to prevent multiple processes from regenerating a cache simultaneuosly, which would result in all processes reading the data from the db at once etc. on a highly requested resource, but I think it should work well for your situation.
How to choose the timing for :expires_in and :race_condition_ttlis your choice then, I'd suggest calculating it like this: expires_in + race_condition = expires_in_that_you_would_usually_set. This way the cache is regenerated more often but also fresher, especially if the action/view is not rendered that often.

Rails cache expire

I have a rails application, in that I am using simple rails cache. My testing is as follows:
Rails.cache.write('temp',Date.today,:expires_in => 60.seconds)
I can read it through Rails.cache.read('temp') and Rails.cache.fetch('temp').
The problem is it doesn't expire. It will still be alive after 60 seconds. Can any one tell me what is missing here.
FYI: I have declared in my development.rb as follows :
config.action_controller.perform_caching = true
config.cache_store = :memory_store
Is there anything I missed out? I want to expires my cache.
After some search, I have found one possible reason why the cache is not cleaned after 60 seconds.
You call Rails.cache.write which is documented here.
It calls write_entry(namespaced_key(name, options), entry, options), where your option :expires_in is one part of the options argument.
The implementation of write_entry has the following condition:
if expires_in > 0 && !options[:raw]
# Set the memcache expire a few minutes in the future to support race condition ttls on read
expires_in += 5.minutes
end
So there are 5 minutes added to your 60 seconds. 2 possible solutions:
Just live with it :-)
Try to include the option :raw => true, perhaps this will skip the condition, so that your expiry works as suspected.
The :expires_in option only works with compatible stores (eg memcached) - not memory.
From http://guides.rubyonrails.org/caching_with_rails.html:
Finally, if you are using memcached or Ehcache, you can also pass
:expires_in. In fact, all parameters not used by caches_action are
sent to the underlying cache store.
You should use fetch method with do block instead of Rails.cache.write. Even though the name is "fetch", it writes to the cache if you put inside do block. You also do not need to clear cache entry manually, just use the "fetch" with expires_in and race_condition_ttl. Keep the ttl value under expiry time and keep it as small as possible.
output_json = Rails.cache.fetch('temp',expires_in: 1.minute,race_condition_ttl:3) do
my_expansive_operation().to_json
end

Resources