I want to use gem "redis-store" as cache store in my Rails app. Unlike memcached redis doesn't cleanup unused keys, but I can use EXPIRE command on each key (via :expire_in option) to limit lifetime of each key.
Then I want use cache_key of my model (which includes id and updated_at) as part of redis key, used for caching. So when model will be updated, new cache key will be created, and old one will never be used.
So the question is, which expiration time to choose? If set too small, it eliminates benefits from caching, if too long - it fills redis with unused data, which can (probably) reduce performance. Where is golden mean?
I would suggest to use the LRU expiration strategy of Redis to let Redis expire the least recently used keys itself. This way, you don't need to worry about the expiration of keys yourself.
Using the cache_key of your model as you suggested will indeed generate a new key when your model changes. The 'old' key(s) for that model will not be used anymore by your views, and Redis will expire them eventually.
See http://redis.io/topics/config for information how to configure Redis as a LRU store.
Surely this is app dependent? If it's a really expensive page, you want it re-calculated as rarely as possible, but if it changes rapidly, you have no options.
I'd personally experiment. Pick some numbers and see how they affect performance. Err high to start with (trust your database, in this case Redis), and then tweak them if you have problems.
Related
I was reading a few guides on caching in Rails but I am missing something important that I cannot reconcile.
I understand the concept of auto expiring cache keys, and how they are built off a derivative of the model's updated_at attribute, but I cannot figure out how it knows what the updated_at is without first doing a database look-up (which is exactly what the cache is partly designed to avoid)?
For example:
cache #post
Will store the result in a cache key something like:
posts/2-20110501232725
As I understand auto expiring cache keys in Rails, if the #post is updated (and the updated_at attribute is changed, then the key will change. But What I cannot figure out, is how will subsequent look-ups to #post know how to get the key without doing a database look-up to GET the new updated_at value? Doesn't Rails have to KNOW what #post.updated_at is before it can access the cached version?
In other words, if the key contains the updated_at time stamp, how can you look-up the cache without first knowing what it is?
In your example, you can't avoid hitting the database. However, the intent of this kind of caching is to avoid doing additional work that is only necessary to do once every time the post changes. Looking up a single row from the database should be extremely quick, and then based on the results of that lookup, you can avoid doing extra work that is more expensive than that single lookup.
You haven't specified exactly, but I suspect you're doing this in a view. In that case, the goal would be to avoid fragment building that won't change until the post does. Iteration of various attributes associated with the post and emission of markup to render those attributes can be expensive, depending on the work being done, so given that you have a post already, being able to avoid that work is the gain achieved in this case.
As I understand your question. You're trying to figure out the black magic of how caching works. Good luck.
But I think the underlying question is how do updates happen?
A cache element should have a logical key based on some part of the element, e.g. compound key, some key name based on the id for the item. You build this key to call the cache fragment when you need it. The key is always the same otherwise you can't have certainly that you're getting what you want.
One underlying assumption of caching is that the cache value is transient, i.e. if it goes away or is out of date its not a big deal. If it is a big deal then caching isn't the solution to your problem. Caching is meant to alleviate high load, i.e. a lot of traffic hitting the same thing in your database. Similar to a weblog where 1,000,000 people might be reading a particular blog post. Its not meant to speed up your database. That is done through SQL optimization, sharding, etc.
If you use Dalli as your cache store then you can set the expiry.
https://stackoverflow.com/a/18088797/793330
http://www.ruby-doc.org/gems/docs/j/jashmenn-dalli-1.0.3/Dalli/Client.html
Essentially a caching loop in Rails AFAIK works like this:
So to answer your question:
The key gets updated when you update it. An operation that is tied to the update of the post. You can set an expiry time, which essentially accomplishes the desired result by forcing the cache update via a new lookup/cache write. As far as the cache is concerned its always reading the cache element that corresponds to the key. If it gets updated, then it will read the updated element, but its not the cache's responsibility to check against the database.
What you might be looking for is something like a prepared statement. Tenderlove on Prepared Statements or a faster datastore like a less safe Postgres (i.e. tuned to NoSQL without ACID) or a NoSQL type of database here.
Also do you have indexes in your database? DB requests will be slow without proper indexes. You might just need to "tune" your database.
Also there is a wonderful gem called cells which allows you to do a lot more with your views, including faster returns vs rendering partials, at least in my experience. It also has some caching functions.
I am trying to understand how memcache works when (if) you fill up the allocated memory buffer. In particular I want to understand the lifecycle of a key value pair in cache. I am talking about low level cache operations in rails where I am directly creating the key/value pairs. e.g. commands like
Rails.cache.write key, cached_data
Rails.cache.fetch key
Assume for the sake of argument I have an infinite loop that was just generating random UUIDs as keys and storing random data. What happens when the cache fills up? Do older items just get bumped off or is there some specific algorithm behind the scenes that handles this eventuality?
I have read elsewhere "Cache Invalidation is a Hard Problem".
Just trying to understand how it actually works.
Maybe some simple code examples that illustrate the best way to create and destroy cached data? Do you have to explicitly define when entries should expire?
MemcacheD handles this behind the scenes. Check out this question -
Memcache and expired items
You can define expiration parameters, check out this wiki page -
http://code.google.com/p/memcached/wiki/NewProgramming#Cache_Invalidation
For cache invalidation specific to you application logic (and not just exhaustion of memory behind the scenes), the delete function will simply remove the data. As far when to delete cached data in your app, thats harder to say - hence the quote you referenced about cache invalidation being hard. I might suggest you start by thinking about ActiveRecord callbacks like after_commit - http://api.rubyonrails.org/classes/ActiveRecord/Callbacks.html, to let you easily invalidate cached data whenever your database changes.
But this is only a suggestion, there are many different cache invalidation schemes out there.
I am building a rails app, have a site wide counter variable (counter) that is used (read and write) by different pages, basically many pages can cause the counter increment, what is a multi-thread-safe way to store this variable so that
1) it is thread-safe, I may have many concurrent user access might read and write this variable
2) high performing, I originally thought about persist this variable in DB, but wondering is there a better way given there can be high volume of request and I don't want to make this DB query the bottleneck of my app...
suggestions?
It depends whether it has to be perfectly accurate. Assuming not, you can store it in memcached and sync to the database occasionally. If memcached crashes, it expires (shouldn't happen if configured properly), you have to shutdown, etc., reload it from the database on startup.
You could also look at membase. I haven't tried it, but to simplify, it's a distributed memcached server that automatically persists to disk.
For better performance and accuracy, you could look at a sharded approach.
Well you need persistence, so you have to store it in the Database/Session/File, AFAIK.
I have an application that needs to perform multiple network queries each one of those returns 100 records.
I'd like to keep all the results (several thousand or so) together in a single Memcached record named according to the user's request.
Is there a way to append data to a Memcached record or do I need to read and write it back and forth and combine the old results with the new ones by the means of my application?
Thanks!
P.S. I'm using Rails 3.2
There's no way to append anything to a memcached key. You'd have to read it in and out of storage every time.
redis does allow this sort of operation, however, as rubish points out -- it has a native list type that allows you to push new data onto it. Check out the redis list documenation for information on how to do that.
You can write a class that'll emulate list in memcached (which is actually what i did)... appending to record isn't atomic operation, so it'll generate errors that'll accumulate over time (at least in memcached). Beside it'll be very slow.
As pointed out Redis has native lists, but it can be emulated in any noSQL / K-V storage solution.
I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.