Cache with expiring keys - ruby-on-rails

I'm working on a mashup site and would like to limit the number of fetches to scrape the source sites. There is essentially one bit of data I need, an integer, and would like to cache it with a defined expiration period.
To clarify, I only want to cache the integer, not the entire page source.
Is there a ruby or rails feature or gem that already accomplishes this for me?

Yes, there is ActiveSupport::Cache::Store
An abstract cache store class. There are multiple cache store
implementations, each having its own additional features. See the
classes under the ActiveSupport::Cache module, e.g.
ActiveSupport::Cache::MemCacheStore. MemCacheStore is currently the
most popular cache store for large production websites.
Some implementations may not support all methods beyond the basic
cache methods of fetch, write, read, exist?, and delete.
ActiveSupport::Cache::Store can store any serializable Ruby object.
http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html
cache = ActiveSupport::Cache::MemoryStore.new
cache.read('Chicago') # => nil
cache.write('Chicago', 2707000)
cache.read('Chicago') # => 2707000
Regarding the expiration time, this can be done by passing the time as a initialization parameter
cache = ActiveSupport::Cache::MemoryStore.new(expires_in: 5.minutes)
If you want to cache a value with a different expiration time, you can also set this when writing a value to the cache
cache.write(key, value, expires_in: 1.minute) # Set a lower value for one entry

See Caching with Rails, particularly the :expires_in option to ActiveSupport::Cache::Store.
For example, you might go:
value = Rails.cache.fetch('key', expires_in: 1.hour) do
expensive_operation_to_compute_value()
end

Related

Correct way to expire a fragment cache

In the view for Reports#index, I cache a fragment like this:
- cache("#{#country}-#{#category}-#{#benchmark}-#{#status}") do
Note that those four variables are strings, not model objects, so they will not expire automatically on touch.
To expire every copy of this cache, do I have to call expire_fragment for every string that could be generated by cocacenating possible values for #country, #category, #benchmark and #status? Or will expire_fragment reports_path do the trick?
Sorry the API docs aren't crystal clear on this and it's not the type of thing that's easily, robustly tested.
You would indeed need to expire each fragment from individually - there is no way in general to expire a range of keys (I think memorystore supports expiry by regexp but I don't think anyone uses that in production).
You may be better off choosing your cache keys so that they change when the underlying data has changed.

How does Rails 4 Russian doll caching prevent stampedes?

I am looking to find information on how the caching mechanism in Rails 4 prevents against multiple users trying to regenerate cache keys at once, aka a cache stampede: http://en.wikipedia.org/wiki/Cache_stampede
I've not been able to find out much information via Googling. If I look at other systems (such as Drupal) cache stampede prevention is implemented via a semaphores table in the database.
Rails does not have a built-in mechanism to prevent cache stampedes.
According to the README for atomic_mem_cache_store (a replacement for ActiveSupport::Cache::MemCacheStore that mitigates cache stampedes):
Rails (and any framework relying on active support cache store) does
not offer any built-in solution to this problem
Unfortunately, I'm guessing that this gem won't solve your problem either. It supports fragment caching, but it only works with time-based expiration.
Read more about it here:
https://github.com/nel/atomic_mem_cache_store
Update and possible solution:
I thought about this a bit more and came up with what seems to me to be a plausible solution. I haven't verified that this works, and there are probably better ways to do it, but I was trying to think of the smallest change that would mitigate the majority of the problem.
I assume you're doing something like cache model do in your templates as described by DHH (http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works). The problem is that when the model's updated_at column changes, the cache_key likewise changes, and all your servers try to re-create the template at the same time. In order to prevent the servers from stampeding, you would need to retain the old cache_key for a brief time.
You might be able to do this by (dum da dum) caching the cache_key of the object with a short expiration (say, 1 second) and a race_condition_ttl.
You could create a module like this and include it in your models:
module StampedeAvoider
def cache_key
orig_cache_key = super
Rails.cache.fetch("/cache-keys/#{self.class.table_name}/#{self.id}", expires_in: 1, race_condition_ttl: 2) { orig_cache_key }
end
end
Let's review what would happen. There are a bunch of servers calling cache model. If your model includes StampedeAvoider, then its cache_key will now be fetching /cache-keys/models/1, and returning something like /models/1-111 (where 111 is the timestamp), which cache will use to fetch the compiled template fragment.
When you update the model, model.cache_key will begin returning /models/1-222 (assuming 222 is the new timestamp), but for the first second after that, cache will keep seeing /models/1-111, since that is what is returned by cache_key. Once 1 second passes, all of the servers will get a cache-miss on /cache-keys/models/1 and will try to regenerate it. If they all recreated it immediately, it would defeat the point of overriding cache_key. But because we set race_condition_ttl to 2, all of the servers except for the first will be delayed for 2 seconds, during which time they will continue to fetch the old cached template based on the old cache key. Once the 2 seconds have passed, fetch will begin returning the new cache key (which will have been updated by the first thread which tried to read/update /cache-keys/models/1) and they will get a cache hit, returning the template compiled by that first thread.
Ta-da! Stampede averted.
Note that if you did this, you would be doing twice as many cache reads, but depending on how common stampedes are, it could be worth it.
I haven't tested this. If you try it, please let me know how it goes :)
The :race_condition_ttl setting in ActiveSupport::Cache::Store#fetch should help avoid this problem. As the documentation says:
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load seven different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting new value. The key is to keep :race_condition_ttl small.
Great question. A partial answer that applies to single multi-threaded Rails servers but not multiprocess(or) environments (thanks to Nick Urban for drawing this distinction) is that the ActionView template compilation code blocks on a mutex that is per template. See line 230 in template.rb here. Notice there is a check for completed compilation both before grabbing the lock and after.
The effect is to serialize attempts to compile the same template, where only the first will actually do the compilation and the rest will get the already completed result.
Very interesting question. I searched on google (you get more results if you search for "dog pile" instead of "stampede") but like you, did I not get any answers, except this one blog post: protecting from dogpile using memcache.
Basically does it store you fragment in two keys: key:timestamp (where timestamp would be updated_at for active record objects) and key:last.
def custom_write_dogpile(key, timestamp, fragment, options)
Rails.cache.write(key + ':' + timestamp.to_s, fragment)
Rails.cache.write(key + ':last', fragment)
Rails.cache.delete(key + ':refresh-thread')
fragment
end
Now when reading from the cache, and trying to fetch a non existing cache, will it instead try to fecth the key:last fragment instead:
def custom_read_dogpile(key, timestamp, options)
result = Rails.cache.read(timestamp_key(name, timestamp))
if result.blank?
Rails.cache.write(name + ':refresh-thread', 0, raw: true, unless_exist: true, expires_in: 5.seconds)
if Rails.cache.increment(name + ':refresh-thread') == 1
# The cache didn't exists
result = nil
else
# Fetch the last cache, as the new one has not been created yet
result = Rails.cache.read(name + ':last')
end
end
result
end
This is a simplified summary of the by Moshe Bergman that i linked to before, or you can find here.
There is no protection against memcache stampedes. This is a real problem when multiple machines are involved and multiple processes on those multiple machines. -Ouch-.
The problem is compounded when one of the key processes has "died" leaving any "locking" ... locked.
In order to prevent stampedes you have to re-compute the data before it expires. So, if your data is valid for 10 minutes, you need to regenerate again at the 5th minute and re-set the data with a new expiration for 10 more minutes. Thus you don't wait until the data expires to set it again.
Should also not allow your data to expire at the 10 minute mark, but re-compute it every 5 minutes, and it should never expire. :)
You can use wget & cron to periodically call the code.
I recommend using redis, which will allow you to save the data and reload it in the advent of a crash.
-daniel
A reasonable strategy would be to:
use a :race_condition_ttl with at least the expected time it takes to refresh the resource. Setting it to less time than expected to perform a refresh is not advisable as the angry mob will end up trying to refresh it, resulting in a stampede.
use an :expires_in time calculated as the maximum acceptable expiry time minus the :race_condition_ttl to allow for refreshing the resource by a single worker and avoiding a stampede.
Using the above strategy will ensure that you don't exceed your expiry/staleness deadline and also avoid a stampede. It works because only one worker gets through to refresh, whilst the angry mob are held off using the cache value with the race_condition_ttl extension time right up to the originally intended expiry time.

How to use two different memcache instances on heroku

I am storing database query results in memcache on heroku. I am using memcachier addon on heroku. For example if I have a cache User's tasks in memcache. I do something like this:
def cached_tasks
Rails.cache.fetch([:users, id, :tasks], :expires_in => 12.hours) { tasks.to_a }
end
This works perfectly fine but I want to use two different memcache instances to store data.
Why?
One that is used very frequently, basically data changes frequently and another for those which are big data objects and those will never change or very rarely.
How can I use two different instances and specify that cached_tasks should be stored in memcache_instance_1 and other like cached_images should be stored in memcache_instance_2
Why not to use the same one:
Because sometimes I need to flush the whole cache and that will flush the big data too which I don't want to.
Any suggestions?
Declare three new env variables for your heroku app
BIG_MEMCACHIER_SERVERS
BIG_MEMCACHIER_USERNAME
BIG_MEMCACHIER_PASSWORD
Add a file called big_cache.rb in the config\initializers directory:
module Rails
def self.big_cache
#big_cache ||= ActiveSupport::Cache::DalliStore.new(
(ENV["BIG_MEMCACHIER_SERVERS"] || "").split(","),
:username => ENV["BIG_MEMCACHIER_USERNAME"],
:password => ENV["BIG_MEMCACHIER_PASSWORD"])
end
end
Now you can access the 2nd cache as follows:
Rails.big_cache.fetch(..)
What i would do is to try and define what from the the things i am caching is actually something that requires cache, or it actually needs some life-limited persistency.
I would leave the Rails internal caching system to do what it does best: page, action and fragment caching, and use another persistency engine (such as Redis) to hold and maintain those large objects you talked about (implementation of this suggestion is a completely different question).
note that Redis also allows setting a TTL (time to live) on keys - so it can provide this life limited persistency that you want.

What is the default expiry time for Rails cache?

I've done some googling and couldn't find the answer to this question. Rails allows to specify expiry times for its cache like that:
Rails.cache.fetch("my_var", :expires_in => 10.seconds)
But what happens if I specify nothing:
Rails.cache.fetch("my_var")
It never expires? Is there a default value? How can I explicitly define something that never expires?
It really depends on which cache storage you're using. Rails provides several, one of them most popular is Memcached. One of key features of Memcached is that it automatically expires old unused records, so you can forget about :expire option.
Other Rails cache storages, like memory storage or redis storage will keep will not expire date unless you explicitly specify when to do that.
More about how cache key expiration works in Rails.
Using Dalli for memcached (who doesn't), the default expiry time is never, as #Rahul says. You don't have to worry about garbage collection, as #icem says, memcached throw out the old unused records.
See the official dalli documentation:
Expires_in default is 0, which means never
https://github.com/mperham/dalli#configuration
you can set the global expiry time for dalli
config.cache_store = :dalli_store, { expires_in: 1.day}
and for better individual control:
Rails.cache.write "some_cache_key", some_cachable_string, expires_in: 3.hours
the new doc http://apidock.com/rails/ActiveSupport/Cache/Store/write doesn't say much, but the old does:
http://apidock.com/rails/ActiveSupport/Cache/MemCacheStore/write
manually expire a cache (if some event occurred):
Rails.cache.delete "some_cache_key"
They never expires. (for FileStore based cache, which is default in Rails)
If they key is found in the cache store, the value would be used. Thus it is always recommended to add atleast any expiry time.

What is the right way to clear cache in Rails without sweepers

Observers and Sweepers are removed from Rails 4. Cool.
But what is the way to cache and clear cache then ?
I read about russian doll caching. It nice and all but it only concerns the view rendering cache. It doesn't prevent the database from being hit.
For instance:
<% cache #product do %>
Some HTML code here
<% end %>
You still need to get #product from the db to get its cache_key. So page or action caching can still be useful to prevent unnecessary load.
I could use some timeout to clear the cache sometimes but what for if the records didn't change ?
At least with sweepers you have control on that aspect. What is/will be the right way to do cache and to clear it ?
Thanks ! :)
Welcome to one of the two hard problems in computer science, cache invalidation :)
You would have to handle that manually since the logic for when a cached object, unlike a cached view which can be simply derived from the objects it displays, should be invalidated is application and situation dependent.
You goto method for this is the Rails.cache.fetch method. Rails.cache.fetch takes 3 arguments; the cache key, an options hash, and a block. It first tries to read a valid cache record based on the key; if that key exists and hasn’t expired it will return the value from the cache. If it can’t find a valid record it instead takes the return value from the block and stores it in the cache with your specified key.
For example:
#models = Rails.cache.fetch my_cache_key do
Model.where(condition: true).all
end
This will cache the block and reuse the result until something (tm) invalidates the key, forcing the block to be reevaluated. Also note the .all at the end of the method chain. Normally Rails would return an ActiveRecord relation object that would be cached and this would then be evaluated when you tried to use #models for the first time, neatly sidestepping the cache. The .all call forces Rails to eager load the records and ensure that it's the result that we cache, not the question.
So now that you get all your cache on and never talk to the database again we have to make sure we cover the other end, invalidating the cache. This is done with the Rails.cache.delete method that simply takes a cache key and removes it, causing a miss the next time you try to fetch it. You can also use the force: trueoption with fetch to force a re-evaluation of the block. Whichever suits you.
The science of it all is where to call Rails.cache.delete, in the naïve case this would be on update and delete for a single instance and update, delete, create on any member for a collection. There will always bee corner cases and they are always application specific, so I can't help you much there.
I assume in this answer that you will set up some sane cache store, like memcached or Redis.
Also remember to add this to config/environments/development.rb:
config.cache_store = :null_store
or you development environment will cache and you will end up hairless from frustration.
For further reference read: Everyone should be using low level caching in Rails and The rails API docs
It is also worth noting that functionality is not removed from Rails 4, merely extracted into a gem. If you need or would like the full features of the sweepers simply add it back to your app with a gem 'rails-observers' line in your Gemfile. That gem contains both the sweepers and observers that where removed from Rails 4 core.
I hope that helpt you get started.

Resources