Rails - how to cache data for server use, serving multiple users - ruby-on-rails

I have a class method (placed in /app/lib/) which performs some heavy calculations and sub-http requests until a result is received.
The result isn't too dynamic, and requested by multiple users accessing a specific view in the app.
So, I want to schedule a periodic run of the method (using cron and Whenever gem), store the results somewhere in the server using JSON format and, by demand, read the results alone to the view.
How can this be achieved? what would be the correct way of doing that?
What I currently have:
def heavyMethod
response = {}
# some calculations, eventually building the response
File.open(File.expand_path('../../../tmp/cache/tests_queue.json', __FILE__), "w") do |f|
f.write(response.to_json)
end
end
and also a corresponding method to read this file.
I searched but couldn't find an example of achieving this using Rails cache convention (and not some private code that I wrote), on data which isn't related with ActiveRecord.
Thanks!

Your solution should work fine, but using Rails.cache should be cleaner and a bit faster. Rails guides provides enough information about Rails.cache and how to get it to work with memcached, let me summarize how I would use it in your case
Heavy method
def heavyMethod
response = {}
# some calculations, eventually building the response
Rails.cache.write("heavy_method_response", response)
end
Request
response = Rails.cache.fetch("heavy_method_response")
The only problem here is that when ur server starts for the first time, the cache will be empty. Also if/when memcache restarts.
One advantage is that somewhere on the flow, the data u pass in is marshalled into storage, and then unmartialled on the way out. Meaning u can pass in complex datastructures, and dont need to serialize to json manually.
Edit: memcached will clear your item if it runs out of memory. Will be very rare since its using a LRU (i think) algoritm to expire things, and I presume you will use this often.
To prevent this,
set expires_in larger than your cron period,
change your fetch code to call the heavy_method if ur fetch fails (like Rails.cache.fetch("heavy_method_response") {heavy_method}, and change heavy_method to just return the object.
Use something like redis which will not delete items.

Related

Can I use Rails.cache to store short-lived session data?

We are already using cookie based sessions, and switching off them to file store sessions in not an option. However, I need a way to store larger amounts of session data (up to 10MG or so) -- beyond the limit of cookie session and, even it weren't, round-tripping that much data on multiple requests would be slow.
I am currently attempting to solve this by using (abusing?) Rails.cache. The basic setup is like this:
I post to a route, which has the following code:
# calculate some results...
Rails.cache.write('search_results' + session.id), search_results)
redirect_to '/results'
Inside GET /results, I read the cached data and send it to the client:
#results = Rails.cache.read('search_results' + session.id)
This works fine. However, if I subsequently make a request to another route like GET /results2 that also calls Rails.cache.read('search_results' + session.id), it will return nil. Even if all calls happen within a 5-10s span.
So my questions are:
Why does this happen? What determines when Rails.cache clean itself?
Is there a way to make this work?
Is there a better approach altogether that doesn't involve using a DB or redis?
Answer to your questions:
The problem with file cache store is that it stores file locally. Thus, if you have multiple servers, cache can be written to one server while cache is read on another server which will return ‘nil’. The solution is to use cache store that can be shared among multiple servers.
Using redis-store may be a solution: https://github.com/redis-store/redis-rails

Rails 4 / Heroku smart expire cache

We have in our application some blocks which are cached.
According to some login we sometimes modify some of them, and in this case we have a logic that expires the relevant blocks.
When we perform changes in the code, we need to expire these blocks via console. In this case we need to detect and be precise with the exact logic in order to expire all modified blocks. For example, if we change header html of closed streams, it will look like:
a = ActionController::Base.new
Stream.closed.each {|s| a.expire_fragment("stream_header_#{s.id}") }; nil
Actually, I think that must be a more generic way to simply compare cached blocks with how it should be rendered, and expire only blocks which their html is different that their cached version.
I wonder if there is a gem that does this task, and if not - if somebody has already written some deploy hook to do it.
============== UPDATE ============
Some thought:
In a rake task one can get the cached fragment, as long as you know which fragments you have.
For example, in my case I can do:
a = ActionController::Base.new
Stream.each do |s|
cached_html = a.read_fragment("stream_header_#{s.id}")
:
:
If I could generate the non-cached html I could simply compare them, and expire the cached fragment in case they are different.
Is it possible?
How heavy do you think this task will be?
Not at all easy to answer with so little code.
You can use the
cache #object do
render somthing
end
So based on the hash of the object the cache will invalided itself. This is also true for the rendering template as Rails will create a has on this also and combined it with the hash of the object to invalidate it properly. This also can work at a deeper level and in this way it is possible to invalidate an entire branch of the render tree.
Let me point you toward the documentation of Rails and the Russian doll caching.
http://edgeguides.rubyonrails.org/caching_with_rails.html
There was also a great video on the caching by these guys:
https://www.codeschool.com/courses/rails-4-zombie-outlaws
They are free, but it look like you have to register now.
I hope this is in the right direction for your need.

How do I bypass Rails.cache for a single request or code block?

I have an API endpoint that aggregates a bunch of data from code that leverages Rails.cache for small pieces of data here and there. There are times, however, when I want 100% up-to-date data, as if Rails.cache was empty. Obviously I could clear cache prior to aggregating the data, but that will affect unrelated data and requests.
Is there a way for me to have a request in rails act as if Rails.cache is empty, similar to if Rails.cache was configured to be :null_store?
The query cache in ActiveRecord has something like this - an "uncached" function that you can pass a block to, where the block will run w/o query cache enabled. I need something similar, but for Rails.cache in general.
Since it does not appear there is a solution to this out of the box, I coded a solution of my own by adding the following code as config/initializers/rails_cache.rb
module Rails
class << self
alias :default_rails_cache :cache
def cache
# Allow any thread to override Rails.cache with its own cache implementation.
RequestStore.store[:rails_cache] || default_rails_cache
end
end
end
This allows any thread to specify its own cache store, which will then be used for all fetches, reads, and writes. As such, it will not read from the default Rails.cache, nor will its values be written to the default Rails.cache.
If the thread is long-running and benefits from having caching enabled, you can easily set this to its own MemoryStore instance:
RequestStore.store[:rails_cache] = ActiveSupport::Cache.lookup_store(:memory_store)
And if you want caching completely off for this thread, you can :null_store instead of :memory_store.
If you are not using the request_store gem, "RequestStore.store" can be replaced with "Thread.current" for the same effect - just have to be more careful about thread reuse across requests.

How does Rails 4 Russian doll caching prevent stampedes?

I am looking to find information on how the caching mechanism in Rails 4 prevents against multiple users trying to regenerate cache keys at once, aka a cache stampede: http://en.wikipedia.org/wiki/Cache_stampede
I've not been able to find out much information via Googling. If I look at other systems (such as Drupal) cache stampede prevention is implemented via a semaphores table in the database.
Rails does not have a built-in mechanism to prevent cache stampedes.
According to the README for atomic_mem_cache_store (a replacement for ActiveSupport::Cache::MemCacheStore that mitigates cache stampedes):
Rails (and any framework relying on active support cache store) does
not offer any built-in solution to this problem
Unfortunately, I'm guessing that this gem won't solve your problem either. It supports fragment caching, but it only works with time-based expiration.
Read more about it here:
https://github.com/nel/atomic_mem_cache_store
Update and possible solution:
I thought about this a bit more and came up with what seems to me to be a plausible solution. I haven't verified that this works, and there are probably better ways to do it, but I was trying to think of the smallest change that would mitigate the majority of the problem.
I assume you're doing something like cache model do in your templates as described by DHH (http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works). The problem is that when the model's updated_at column changes, the cache_key likewise changes, and all your servers try to re-create the template at the same time. In order to prevent the servers from stampeding, you would need to retain the old cache_key for a brief time.
You might be able to do this by (dum da dum) caching the cache_key of the object with a short expiration (say, 1 second) and a race_condition_ttl.
You could create a module like this and include it in your models:
module StampedeAvoider
def cache_key
orig_cache_key = super
Rails.cache.fetch("/cache-keys/#{self.class.table_name}/#{self.id}", expires_in: 1, race_condition_ttl: 2) { orig_cache_key }
end
end
Let's review what would happen. There are a bunch of servers calling cache model. If your model includes StampedeAvoider, then its cache_key will now be fetching /cache-keys/models/1, and returning something like /models/1-111 (where 111 is the timestamp), which cache will use to fetch the compiled template fragment.
When you update the model, model.cache_key will begin returning /models/1-222 (assuming 222 is the new timestamp), but for the first second after that, cache will keep seeing /models/1-111, since that is what is returned by cache_key. Once 1 second passes, all of the servers will get a cache-miss on /cache-keys/models/1 and will try to regenerate it. If they all recreated it immediately, it would defeat the point of overriding cache_key. But because we set race_condition_ttl to 2, all of the servers except for the first will be delayed for 2 seconds, during which time they will continue to fetch the old cached template based on the old cache key. Once the 2 seconds have passed, fetch will begin returning the new cache key (which will have been updated by the first thread which tried to read/update /cache-keys/models/1) and they will get a cache hit, returning the template compiled by that first thread.
Ta-da! Stampede averted.
Note that if you did this, you would be doing twice as many cache reads, but depending on how common stampedes are, it could be worth it.
I haven't tested this. If you try it, please let me know how it goes :)
The :race_condition_ttl setting in ActiveSupport::Cache::Store#fetch should help avoid this problem. As the documentation says:
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load seven different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting new value. The key is to keep :race_condition_ttl small.
Great question. A partial answer that applies to single multi-threaded Rails servers but not multiprocess(or) environments (thanks to Nick Urban for drawing this distinction) is that the ActionView template compilation code blocks on a mutex that is per template. See line 230 in template.rb here. Notice there is a check for completed compilation both before grabbing the lock and after.
The effect is to serialize attempts to compile the same template, where only the first will actually do the compilation and the rest will get the already completed result.
Very interesting question. I searched on google (you get more results if you search for "dog pile" instead of "stampede") but like you, did I not get any answers, except this one blog post: protecting from dogpile using memcache.
Basically does it store you fragment in two keys: key:timestamp (where timestamp would be updated_at for active record objects) and key:last.
def custom_write_dogpile(key, timestamp, fragment, options)
Rails.cache.write(key + ':' + timestamp.to_s, fragment)
Rails.cache.write(key + ':last', fragment)
Rails.cache.delete(key + ':refresh-thread')
fragment
end
Now when reading from the cache, and trying to fetch a non existing cache, will it instead try to fecth the key:last fragment instead:
def custom_read_dogpile(key, timestamp, options)
result = Rails.cache.read(timestamp_key(name, timestamp))
if result.blank?
Rails.cache.write(name + ':refresh-thread', 0, raw: true, unless_exist: true, expires_in: 5.seconds)
if Rails.cache.increment(name + ':refresh-thread') == 1
# The cache didn't exists
result = nil
else
# Fetch the last cache, as the new one has not been created yet
result = Rails.cache.read(name + ':last')
end
end
result
end
This is a simplified summary of the by Moshe Bergman that i linked to before, or you can find here.
There is no protection against memcache stampedes. This is a real problem when multiple machines are involved and multiple processes on those multiple machines. -Ouch-.
The problem is compounded when one of the key processes has "died" leaving any "locking" ... locked.
In order to prevent stampedes you have to re-compute the data before it expires. So, if your data is valid for 10 minutes, you need to regenerate again at the 5th minute and re-set the data with a new expiration for 10 more minutes. Thus you don't wait until the data expires to set it again.
Should also not allow your data to expire at the 10 minute mark, but re-compute it every 5 minutes, and it should never expire. :)
You can use wget & cron to periodically call the code.
I recommend using redis, which will allow you to save the data and reload it in the advent of a crash.
-daniel
A reasonable strategy would be to:
use a :race_condition_ttl with at least the expected time it takes to refresh the resource. Setting it to less time than expected to perform a refresh is not advisable as the angry mob will end up trying to refresh it, resulting in a stampede.
use an :expires_in time calculated as the maximum acceptable expiry time minus the :race_condition_ttl to allow for refreshing the resource by a single worker and avoiding a stampede.
Using the above strategy will ensure that you don't exceed your expiry/staleness deadline and also avoid a stampede. It works because only one worker gets through to refresh, whilst the angry mob are held off using the cache value with the race_condition_ttl extension time right up to the originally intended expiry time.

Rails per-request hash?

Is there a way to cache per-request data in Rails? For a given Rails/mongrel request I have the result of a semi-expensive operation that I'd like to access several times later in that request. Is there a hash where I can store and access such data?
It needs to be fairly global and accessible from views, controllers, and libs, like Rails.cache and I18n are.
I'm ok doing some monkey-patching if that's what it takes.
Memcached doesn't work because it'll be shared across requests, which I don't want.
A global variable similarly doesn't work because different requests would share the same data, which isn't what I want.
Instance variables don't work because I want to access the data from inside different classes.
There is also the request_store gem. From the documentation:
Add this line to your application's Gemfile:
gem 'request_store'
and use this code to store and retrieve data (confined to the request):
# Set
RequestStore.store[:foo] = 0
# Get
RequestStore.store[:foo]
Try PerRequestCache. I stole the design from the SQL Query Cache.
Configure it up in config/environment.rb with:
config.middleware.use PerRequestCache
then use it with:
PerRequestCache.fetch(:foo_cache){ some_expensive_foo }
One of the most popular options is to use the request_store gem, which allows you to access a global store that you from any part of your code. It uses Thread.current to store your data, and takes care of cleaning up the data after each request.
RequestStore[:items] = []
Be aware though, since it uses Thread.current, it won't work properly in a multi-threaded environment where you have more than one thread per request.
To circumvent this problem, I have implemented a store that can be shared between threads for the same request. It's called request_store_rails, it's thread-safe, and the usage is very similar:
RequestLocals[:items] = []
Have you considered flash? It uses Session but is automatically cleared.
Memoisation?
According to this railscast it's stored per request.
Global variables are evil. Work out how to cleanly pass the data you want to where you want to use it.
app/models/my_cacher.rb
class MyCacher
def self.result
##result ||= begin
# do expensive stuff
# and cache in ##result
end
end
end
The ||= syntax basically means "do the following if ##result is nil" (i.e. not set to anything yet). Just make sure the last line in the begin/end block is returning the result.
Then in your views/models/whatever you would just reference the function when you need it:
MyCacher.result
This will cache the expensive action for the duration of a request.

Resources