I'm calling an API method from a server:
def get_data
#.........
get_some_data_from_server
end
I want to cache the result of this call, obviously. So I created a field in a table and changed the get_data to look like this:
def fetch_data
key = get_cache_key
Rails.cache.fetch key, expires_in: 500.minutes do
get_some_data_from_server
end
end
The result of get_some_data_from_server doesn't change frequently, it's pretty much the same all the time. But it may change over time, even early when 500 minutes have passed. And thus the user may receive an outdated data from cache.
Is this strategy sensible? What do I do about get_some_data_from_server changing over time?
If it has changed, then the key should change. Ideally, you want your key to reflect, in some part, something unique about the data it represents.
In the case of request, for example. If you were making a request that included two dates, to receive data between two points. My key would include those two dates, so everytime I request data from a different range of dates, it wont use the same cache key.
In the case of when the request will always be the same, there should be a way to determine if the api has changed, perhaps through a less expensive api call. If this call says that new data is available, then you should clear the cache at that key and request the new data.
One way to check if data is changed, before you make a request is using ETag's and conditional gets. A detailed description can be found here http://fideloper.com/api-etag-conditional-get
If the result of get_some_data_from_server is changed than the call
Rails.cache.delete(get_cache_key)
https://stackoverflow.com/a/19603852/1289704
Related
We are working on a data visualization problem right now. Our customer wants us to show the last 6 months data for a honeybee hive on a graph.
Clearly it's gonna be a huge dataset. Adding indexes we overcame the database slowness problem in loading data though we still have problem in visualizing data on a graph.
Here is the related code:
def self.prepare_single_hive_messages_for_datatable_dygraph(messages, us_metric_enabled)
data = []
messages.each do |message|
record = []
record << message.occurance_time.to_s(:dygraph_format)
record << weight_according_to_metric(message.weight, us_metric_enabled)
record << temperature_according_to_metric(message.temperature, us_metric_enabled)
record << (message.humidity.nil? ? nil : message.humidity.to_f)
data << record
end
return data
end
The problem is that messages.each is very slow and takes more than 30 seconds. Is there any solution to overcome this?
Project Specification:
Rails Version: 4.1.9
Graph Library: Dygraph
Database: Postgres
There are two ways to attack a performance problem like this.
Find and correct the performance bottle neck
Break it into smaller pieces
Finding Performance issues
First, get a dataset large enough to reproduce the problem setup on your dev system. Then look at the logs so you can see how long the transaction is taking. You should be looking for a line like this:
Completed 200 OK in 432.1ms (Views: 367.7ms | ActiveRecord: 61.4ms)
Rerun the task a couple times since caching can cause variations. Write down your different times. Then remove everything in the loop and run it with just the loop. Do the numbers go back to looking reasonable? If that is the case then you know the problem is the work you are doing inside the loop. Next, add each line in the loop back on its own (or one at a time if they depend on each other). Figure out which line causes those numbers to jump the most.
This is the point where you should try to performance tune your code. Check for queries that could be smarter. Make sure you aren't querying the same data over and over. If you have a function in a model that computes something and you call it multiple times to get the same answer then use this to only compute once:
def something
return #savedvalue if #savedvalue
#savedvalue = really complex calculation
end
The goal is to find the worse offender so you can make changes that have the biggest impact. However, if you are working with a LOT of data this may only get you so far. It may be impossible to performance tune enough for all the data. In that case there is option 2.
Break it into smaller pieces
Write a second rails action who's only job is to render a single record on a graph. It will do the inner part of your loop but only on the message who's id was passed to it.
Call your original function to setup the view and pass the list of messages to the view. In the view loop through the list of messages to setup jquery ajax code to call the above action once for each message. Have this run in on document ready.
Then, the page will load with an empty graph... but as soon as it is up the individual processed records will be fed to it and appear one at a time on the page. It will still take just ask long (or even a little longer because of overhead) to complete the graph... but it will no longer time out. Each ajax call will be its own quick hit to the server instead of one big long hit.
I just used this very technique to load a rather long report on a site I work on. Ideally we'd like to fix any underlying performance issues... but what we really wanted was to have a report working right away and then fix the performance issues as we had time.
Ok you said every person sees the same set of data, which is great, means we can cache without worrying about who's logged in, first here's your method, with tiny improvements
def self.prepare_single_hive_messages_for_datatable_dygraph(messages, us_metric_enabled)
messages.inject([]) do |records, message|
records << [].tap do |record|
record << message.occurance_time.to_s(:dygraph_format)
record << weight_according_to_metric(message.weight, us_metric_enabled)
record << temperature_according_to_metric(message.temperature, us_metric_enabled)
record << (message.humidity.nil? ? nil : message.humidity.to_f)
end
end
end
Then create a caching function, that runs this method and caches it
# some class constants
CACHE_KEY = 'some_cache_key'
EXPIRY_TIME = 15.minutes
# the methods
def self.write_single_hive_messages_to_cache(messages, us_metric_enabled)
Rails.cache.write CACHE_KEY,
self.class.prepare_single_hive_messages_for_datatable_dygraph(messages, us_metric_enabled),
expires_in: EXPIRY_TIME
end
And a simple cache reading method
self.read_single_hive_messages_from_cache
Rails.cache.read CACHE_KEY
end
Then create a rake task that just fetches these messages and call the caching method, and rails will write the cache.
Create a cron job that calls this rake task, set the cron job to 5 minutes or so, the expiry time is longer just in case for some reason the cron job didn't run, the data will still be available for the next run.
This way your processing is run in the background, every 5 ( or whatever time you choose ) minutes, the page load should happen normally with no delay at all, since the array data will be loaded from the pre-calculated cache.
In case the cron stops working, the data will expire in the 15 minutes I've set, and then the read cache method will return nil, you could avoid this and set the data to never expire, but then the data will become stale and the old data will keep getting returned.
Another way to handle this is to tell the cache reading method how to generate the cache it self, so if it finds the cache empty it generates one and caches it itself before returning the data, the method would look like this
def self.read_single_hive_messages_from_cache(messages, us_metric_enabled)
Rails.cache.fetch CACHE_KEY, expires_in: EXPIRY_TIME do
self.class.write_single_hive_messages_to_cache(messages, us_metric_enabled)
end
end
But then make sure that messages is an ActiveRecord::Relation and not a processed array, because you don't want to query for 1+ million records and then find the cache already ready, if it's an ActiveRecord::Relation it will not touch the database until the array is started ( inside the caching block ), if the cache exists it will be returned before you enter the block and thus the data won't get fetched, saving you that huge query.
I know the answer got long, if you need more help tell me.
I have a class method (placed in /app/lib/) which performs some heavy calculations and sub-http requests until a result is received.
The result isn't too dynamic, and requested by multiple users accessing a specific view in the app.
So, I want to schedule a periodic run of the method (using cron and Whenever gem), store the results somewhere in the server using JSON format and, by demand, read the results alone to the view.
How can this be achieved? what would be the correct way of doing that?
What I currently have:
def heavyMethod
response = {}
# some calculations, eventually building the response
File.open(File.expand_path('../../../tmp/cache/tests_queue.json', __FILE__), "w") do |f|
f.write(response.to_json)
end
end
and also a corresponding method to read this file.
I searched but couldn't find an example of achieving this using Rails cache convention (and not some private code that I wrote), on data which isn't related with ActiveRecord.
Thanks!
Your solution should work fine, but using Rails.cache should be cleaner and a bit faster. Rails guides provides enough information about Rails.cache and how to get it to work with memcached, let me summarize how I would use it in your case
Heavy method
def heavyMethod
response = {}
# some calculations, eventually building the response
Rails.cache.write("heavy_method_response", response)
end
Request
response = Rails.cache.fetch("heavy_method_response")
The only problem here is that when ur server starts for the first time, the cache will be empty. Also if/when memcache restarts.
One advantage is that somewhere on the flow, the data u pass in is marshalled into storage, and then unmartialled on the way out. Meaning u can pass in complex datastructures, and dont need to serialize to json manually.
Edit: memcached will clear your item if it runs out of memory. Will be very rare since its using a LRU (i think) algoritm to expire things, and I presume you will use this often.
To prevent this,
set expires_in larger than your cron period,
change your fetch code to call the heavy_method if ur fetch fails (like Rails.cache.fetch("heavy_method_response") {heavy_method}, and change heavy_method to just return the object.
Use something like redis which will not delete items.
I am looking to find information on how the caching mechanism in Rails 4 prevents against multiple users trying to regenerate cache keys at once, aka a cache stampede: http://en.wikipedia.org/wiki/Cache_stampede
I've not been able to find out much information via Googling. If I look at other systems (such as Drupal) cache stampede prevention is implemented via a semaphores table in the database.
Rails does not have a built-in mechanism to prevent cache stampedes.
According to the README for atomic_mem_cache_store (a replacement for ActiveSupport::Cache::MemCacheStore that mitigates cache stampedes):
Rails (and any framework relying on active support cache store) does
not offer any built-in solution to this problem
Unfortunately, I'm guessing that this gem won't solve your problem either. It supports fragment caching, but it only works with time-based expiration.
Read more about it here:
https://github.com/nel/atomic_mem_cache_store
Update and possible solution:
I thought about this a bit more and came up with what seems to me to be a plausible solution. I haven't verified that this works, and there are probably better ways to do it, but I was trying to think of the smallest change that would mitigate the majority of the problem.
I assume you're doing something like cache model do in your templates as described by DHH (http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works). The problem is that when the model's updated_at column changes, the cache_key likewise changes, and all your servers try to re-create the template at the same time. In order to prevent the servers from stampeding, you would need to retain the old cache_key for a brief time.
You might be able to do this by (dum da dum) caching the cache_key of the object with a short expiration (say, 1 second) and a race_condition_ttl.
You could create a module like this and include it in your models:
module StampedeAvoider
def cache_key
orig_cache_key = super
Rails.cache.fetch("/cache-keys/#{self.class.table_name}/#{self.id}", expires_in: 1, race_condition_ttl: 2) { orig_cache_key }
end
end
Let's review what would happen. There are a bunch of servers calling cache model. If your model includes StampedeAvoider, then its cache_key will now be fetching /cache-keys/models/1, and returning something like /models/1-111 (where 111 is the timestamp), which cache will use to fetch the compiled template fragment.
When you update the model, model.cache_key will begin returning /models/1-222 (assuming 222 is the new timestamp), but for the first second after that, cache will keep seeing /models/1-111, since that is what is returned by cache_key. Once 1 second passes, all of the servers will get a cache-miss on /cache-keys/models/1 and will try to regenerate it. If they all recreated it immediately, it would defeat the point of overriding cache_key. But because we set race_condition_ttl to 2, all of the servers except for the first will be delayed for 2 seconds, during which time they will continue to fetch the old cached template based on the old cache key. Once the 2 seconds have passed, fetch will begin returning the new cache key (which will have been updated by the first thread which tried to read/update /cache-keys/models/1) and they will get a cache hit, returning the template compiled by that first thread.
Ta-da! Stampede averted.
Note that if you did this, you would be doing twice as many cache reads, but depending on how common stampedes are, it could be worth it.
I haven't tested this. If you try it, please let me know how it goes :)
The :race_condition_ttl setting in ActiveSupport::Cache::Store#fetch should help avoid this problem. As the documentation says:
Setting :race_condition_ttl is very useful in situations where a cache entry is used very frequently and is under heavy load. If a cache expires and due to heavy load seven different processes will try to read data natively and then they all will try to write to cache. To avoid that case the first process to find an expired cache entry will bump the cache expiration time by the value set in :race_condition_ttl. Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer. In the meantime that first process will go ahead and will write into cache the new value. After that all the processes will start getting new value. The key is to keep :race_condition_ttl small.
Great question. A partial answer that applies to single multi-threaded Rails servers but not multiprocess(or) environments (thanks to Nick Urban for drawing this distinction) is that the ActionView template compilation code blocks on a mutex that is per template. See line 230 in template.rb here. Notice there is a check for completed compilation both before grabbing the lock and after.
The effect is to serialize attempts to compile the same template, where only the first will actually do the compilation and the rest will get the already completed result.
Very interesting question. I searched on google (you get more results if you search for "dog pile" instead of "stampede") but like you, did I not get any answers, except this one blog post: protecting from dogpile using memcache.
Basically does it store you fragment in two keys: key:timestamp (where timestamp would be updated_at for active record objects) and key:last.
def custom_write_dogpile(key, timestamp, fragment, options)
Rails.cache.write(key + ':' + timestamp.to_s, fragment)
Rails.cache.write(key + ':last', fragment)
Rails.cache.delete(key + ':refresh-thread')
fragment
end
Now when reading from the cache, and trying to fetch a non existing cache, will it instead try to fecth the key:last fragment instead:
def custom_read_dogpile(key, timestamp, options)
result = Rails.cache.read(timestamp_key(name, timestamp))
if result.blank?
Rails.cache.write(name + ':refresh-thread', 0, raw: true, unless_exist: true, expires_in: 5.seconds)
if Rails.cache.increment(name + ':refresh-thread') == 1
# The cache didn't exists
result = nil
else
# Fetch the last cache, as the new one has not been created yet
result = Rails.cache.read(name + ':last')
end
end
result
end
This is a simplified summary of the by Moshe Bergman that i linked to before, or you can find here.
There is no protection against memcache stampedes. This is a real problem when multiple machines are involved and multiple processes on those multiple machines. -Ouch-.
The problem is compounded when one of the key processes has "died" leaving any "locking" ... locked.
In order to prevent stampedes you have to re-compute the data before it expires. So, if your data is valid for 10 minutes, you need to regenerate again at the 5th minute and re-set the data with a new expiration for 10 more minutes. Thus you don't wait until the data expires to set it again.
Should also not allow your data to expire at the 10 minute mark, but re-compute it every 5 minutes, and it should never expire. :)
You can use wget & cron to periodically call the code.
I recommend using redis, which will allow you to save the data and reload it in the advent of a crash.
-daniel
A reasonable strategy would be to:
use a :race_condition_ttl with at least the expected time it takes to refresh the resource. Setting it to less time than expected to perform a refresh is not advisable as the angry mob will end up trying to refresh it, resulting in a stampede.
use an :expires_in time calculated as the maximum acceptable expiry time minus the :race_condition_ttl to allow for refreshing the resource by a single worker and avoiding a stampede.
Using the above strategy will ensure that you don't exceed your expiry/staleness deadline and also avoid a stampede. It works because only one worker gets through to refresh, whilst the angry mob are held off using the cache value with the race_condition_ttl extension time right up to the originally intended expiry time.
There is an array inside the session hash which I'm adding things to it. The problem is, sometimes multiple requests get processed at the same time (because ajax), and the changes a request makes to the array is then replaced by the changes made by the second request.
Example, the array first looks like this:
[63, 73, 92]
Then the first request adds something to it:
[63, 73, 92, 84]
The second request does the same thing (but works on the older version obviously):
[63, 73, 92, 102]
So in the end the array doesn't look like it should. Is there a way to avoid that?
I tried to use the cache store, the active record store and the cookie store. Same problem with all of them.
Thanks.
Session race conditions are very common in Rails. Redis session store doesn't help either!
The reason is Rails only reads and creates the session object when it receives the request and writes it back to session store when request is complete and is about to be returned to user.
Java has a solution for this called session replication. We can build our own concurrent redis based session store. The following is pretty much it. Except the update method is missing the lock. But it almost never happens to get race condition in it.
To get session hash just use concurrent_session which returns a hash.
To set some value in it, use update_concurrent_session method. It deep merges the new hash into the old value:
class ApplicationController < ActionController::Base
def concurrent_session
#concurrent_session ||= get_concurrent_session
end
def update_concurrent_session h
return unless session.id.present?
#concurrent_session = get_concurrent_session.deep_merge(h)
redis.set session_key, #concurrent_session.to_json
redis.expire session_key, session_timeout
end
private
def redis
Rails.cache.redis
end
def session_timeout
2.weeks.to_i
end
def session_key
'SESSION_' + session.id.to_s if session.id.present?
end
def get_concurrent_session
return {} unless session.id.present?
redis.expire session_key, session_timeout
redis.get(session_key).yield_self do |v|
if v
JSON.parse v, symbolize_names: true
else
{}
end
end
end
end
Usage example:
my_roles = concurrent_session[:my_roles]
update_concurrent_session({my_roles: ['admin', 'vip']})
There really isn't a great solution for this in Rails. My first suggestion would be to examine your use case and see if you can avoid this situation to begin with. If it's safe to do so, session data if often best kept on the client, as there are a number of challenges that can come up when dealing with server-side session stores. On the other hand, if this is data that might be useful long term across multiple page requests (and maybe multiple sessions), perhaps it should go in the database.
Of course, there is some data that really does belong in the session (a good example of this is the currently logged-in user). If this is the case, have a look at http://paulbutcher.com/2007/05/01/race-conditions-in-rails-sessions-and-how-to-fix-them/, and specifically https://github.com/fcheung/smart_session_store, which tries to deal with the situation you've described.
It is a simple race condition, just lock the request using any locking mechanism like
redis locker
RedisLocker.new('my_ajax').run! { session[:whatever] << number }
Load the current session, or create a new one if necessary
Save a copy of the unmodified session for future reference
Run the code of the action
Compare the modified session with the copy saved previously to determine what has changed
If the session has changed:
Lock the session
Reload the session
Apply the changes made to this session and save it
Unlock the session
From Race conditions in Rails sessions and how to fix them.
I have a long database query on one of our dashboard systems that I would like to cache as results do not need to be accurate in realtime but can give a "close enough" value from the cache.
I would like to do this without the user ever having to wait. I was looking at using something like
Rails.cache.write('my_val', 'val', :expires_in => 60.minutes)
to store this value, but I don't believe it gives the exact functionality that I want. I would like to call with
Rails.fetch('my_val') { create a background task to update my_val; return expired my_val}
It seems that my_val is removed from the cache when it expired though. Is there any way to access this expired value or perhaps another built in mechanism that would enable this functionality?
Thanks.
Just do this:
Rails.cache.write('my_val', 'val')
never expire
Now run your background job:
SomeLongJob.process
In the SomeLongJob.process job do this:
def SomeLongJob.process
some_long_calculation = Blah.calc
Rails.cache.write('my_val', some_long_calculation)
end
Now read the data with
def get_value
val = Rails.cache.read('my_val', 'val')
end
The :race_condition_ttl option for Rails.cache.fetch is REALLY close to what you are looking for: http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch
But from what I can tell, the first request is still blocked (it's just subsequent ones that get the old value while it's updating). Not sure why they didn't go all the way with it. It would be better if the pattern #drhenner mentioned was abstracted into an option like that, but I haven't seen one yet.