How do I limit REDIS memory usage with Sidekiq over time? - ruby-on-rails

I am using SideKiq with Heroku Redis on a Rails 5 app. Over time, my memory utilization has grown linearly, causing me to have to upgrade my plan every month or so. My load has been relatively flat.
I have an Eviction Policy of noeviction, as recommended in the SideKiq wiki. Does that mean that my memory footprint will continue to grow unabated over time?

You should have 2 Redis instances running, one for Caching and another dedicated to Sidekiq.
Here's an entry from the Sidekiq wiki explaining this:
Many people use Redis as a cache (it works great as a Rails cache store) but it's important that Sidekiq be run against a Redis instance that is not configured as a cache but as a persistent store. I recommend using two separate Redis instances, each configured appropriately, if you wish to use Redis for caching and Sidekiq. Redis namespaces do not allow for this configuration and come with many other problems, so using discrete Redis instances is always preferred.
That way, the cache store Redis will be able to manage evictions and memory as needed, whilst the Sidekiq Redis will be able to keep jobs persistently without a risk of an eviction. As you said the Sidekiq Redis instance should have noeviction set.
Just a guess, but likely the reason you had a linear increase in memory is because in other places of your Rails app you'd use Redis and that data would not be cleared due to the eviction policy, perhaps this was even Rails under the hood doing this, although I believe Rails would set a TTL everywhere, but I'm not certain.

noeviction is a eviction policy setting in Redis. It results in Redis returning errors if the maximal memory usage is reached.
To limit the memory usage in Redis you can set in the redis.conf file and set the maxmemory setting.
Read here for more information about it.

Related

Memcached vs Redis as Rails.cache when requiring an LRU cache

I currently use Redis as a work queue for Sidekiq. I'm interested in also using it as a caching mechanism for Rails.cache.
The recommended Rails caching mechanism never expires items and relies on evicting the least recently used (LRU) item. Unfortunately, Redis by default isn't configured to evict the least recently used item while the recommended cache store, memcached, is.
Furthermore, evicting items isn't a behavior I would want for my work queue, and configuring the same Redis instance to do this could lead to undesirable results. Nor would I want my queue to share cycles with my cache anyways.
What would you all recommend in this situation? A second redis store to act as a cache and have LRU configured? Or just use the rails recommended memcached cache store and only use redis alone to be a queue?
I'm leaning towards using both Redis and Memcached, despite plenty of stack overflow articles recommending otherwise. memcached supporting LRU eviction by default is what wins me over.
Some articles:
redis and memcache or just redis
why is memcached still used alongside rails
Hidden deeper in the comments, posters mention that memcached's LRU eviction as a great reason to use it as a cache.
Ended up using both redis and memcached. Pretty happy with the results.
Main difference is that Memcached can run in parallel cores/machines but Redis is so lightweight and fast that it takes a good amount of load to get to its limit if it's running on a decent machine, where it only uses a couple cores, well since it works to use both for you that's great, but it sounds like a bit unnecessary complexity to use both, that's all. (ie if you need contractors to work on it etc you'll need someone with experience in both technologies rather than just one)

Proper activerecord connection pool size with sidekiq and postgres for multiple sidekiq processes?

I'm running 7 sidekiq processes (currency set to 40) plus a passenger webserver, connecting to a postgres database. Rails pool setting is set to 100 and and postgres max_connections setting is also the default 100.
I just added a new job class where each job makes multiple postgres requests, and I started getting this error on many sidekiq jobs and sometimes on my webserver: PG::ConnectionBad: FATAL: remaining connection slots are reserved for non-replication superuser connections
I tried increasing postgres max_connections to 200, and the error still occurs. Then I tried reducing the activerecord pool setting to 25 (25 connections for each process = 200 total connections), figuring I might start getting DB connection timeout errors but at least it would stop the "no remaining connection slots" errors.
But I'm still getting the remaining connection slots are reserved error.
The smarter way to to deal with this issue might be to load the important postgres data that I keep reusing into redis, and then access it from redis - which obivously plays much more nicely and quickly with sidekiq. But even as I do that, I'd like to understand what's going on here with the postgres connections:
Am I likely leaking connections, and is that something I should be
managing inside the sidekiq jobs?
(see Releasing ActiveRecord connection before the end of a Sidekiq job)
Should I look into more obscure things like locking/contention issues
or threading issues with the PG driver?
(see https://github.com/mperham/sidekiq/issues/594. I think I'm using ActiveRecord pretty simply without much obscure or abnormal logic for a rails app...)
Or maybe I'm just not understanding how the ActiveRecord pool setting
and postgres max_connection settings work together...?
My situation may be too specific to help many others running into this error, but I'll share what I've found out in case it helps to point you in the right direction.
Am I likely leaking connections, and is that something I should be managing inside the sidekiq jobs?
No, not likely. Sidekiq's default middleware includes a hook to close connections even if a job fails. It took me a long time to understand what the heck that means, so if you're not sure what that means, tl;dr: Sidekiq won't leak connections if you're using it normally.
Should I look into more obscure things like locking/contention issues or threading issues with the PG driver?
Unless you're using a very obscure setup, its probably something more simple.
Or maybe I'm just not understanding how the ActiveRecord pool setting and postgres max_connection settings work together...?
Anyone can feel free to correct me if I'm wrong, but here's the guidelines I'm going on for pool settings, max_connections, and sidekiq processes:
Minimum DB pool size = sidekiq concurrency setting
Maximum DB pool size* = postgres max_connections / total sidekiq processes (+ leave a few connections for web processes)
*note that active record will only create a new connection when a new thread needs one, so if 95% of your threads don't use postgres at the same time, you should be able to get away with far fewer max_connections than if every thread is trying to check out a connection at the same time.
What fixed my problem:
On my Ubuntu machine, I had changed the vm.overcommit_memory setting to 1 as recommended by redis, so that it can spawn it's write to disk process without breaking the machine.
This is the right way to go, but leaves postgres vulnerable to being killed by OOM (out of memory) Killer if memory usage gets too high. Turns out that postgres will stop allowing new connections if it receives a kill signal from the OOM Killer.
Once I restarted postgres, sidekiq was able to connect again. The longer term solution is simply to work on memory leaks and make sure memory usage doesn't get too high. Also it's possible to configure the OOM killer to prioritize killing my sidekiqs before killing postgres.

How to scale with Rails

I would like to prepare a rails application to be scalable. Some features of this app is to connect to some APIs & send emails, plus I'm using PostgreSQL & it's on Heroku.
Now that code is clean, I would like to use caches and any technique that will help the app to scale.
Should I use Redis or Memcached ? It's a little obscur to me and I've seen similar questions on StackOverflow but here I would like to know which one I should use only for scaling purpose.
Also I was thinking to use Sidekiq to process some jobs. Is it going to conflict with Memcached/Redis ? Also, in wich case should I use it ?
Any other things I should think of in terms of scalability ?
Many thanks
Redis is a very good choice for caching, it has similar performances to memcached (redis is slightly faster) and it takes few minutes to configure it that way.
If possibile I would suggest agains using the same redis instance to store both cache and message store.
If you really need to do that make sure you configure redis with volatile-lru max memory policy and that you always set your cache with a TTL; this way when redis runs out of memory cache keys will be evicted.
Sidekiq requires Redis as its message store. So you will need to have a Redis instance or use a Redis service if you want to use Sidekiq. Sidekiq is great, btw.
You can use either Memcached or Redis as your cache store. For caching I'd probably use Memcached as the cache cleaning behavior is better. In Rails in general, and in Rails 4 apps in particular, one rarely explicit expires an item from cache or sets an explicit expiration time. Instead one depends on updates to the cache_key, which means the cached item isn't actually deleted from the store. Memcached handles this pretty well, by evicting the least recently used items when it hits memory limits.
That all said, scaling is a lot more than picking a few components. You'll have to do a fair amount of planning, identify bottlenecks as the app develops, and scale CPU/memory/disk by increasing servers as needed.
You must look at the two cool features of Redis that are yet to come in near future
Redis Cluster:
http://redis.io/topics/cluster-spec
http://redis.io/presentation/Redis_Cluster.pdf
Redis Sentinel: (Highly Available)
http://redis.io/topics/sentinel
http://redis.io/topics/sentinel-spec
Also these features will yield a great help in scaling redis,
In case of memcached these features are missing, also as far as active development is concerned redis community is fare more vibrant.

Dalli vs Redis-Store for Rails App

I have been using Dalli until now for caching and today I came across Redis -Store.
I am wondering should I switch to redisstore. My app already uses redis for certain stuff so I have a redis server which is quite big(in terms of resources) and I also have another memcached server. So if I where to switch to redis-store it would mean that I can remove the memcached server(less server to maintain + less cost).
Has anyone done a comparison of these 2 solutions.
Performance
Is it a drop-in replacement(can I switch between these 2 anytime without code change)
Any other stuff I should know about.
Redis can be used as a cache or as a permanent store, but if you try to mix both, you can end up having "interesting issues".
When you have memcached, yo have a maximum amount of memory for the process, so when memcached gets full it will automatically remove the least recently used entries to make room for the new entries.
You can configure Redis to have that behaviour too, but you don't want to do that if you are using Redis for persistent storage, because in that case you would potentially lose keys that are meant to be persistent.
So if you are using persistent storage for Redis, you would need to have two different Redis processes: one for your persistant keys, one for caching. Of course you could always have only one process and set expiring times to every cache item, but no one would assure you you don't hit the memory limit before they expire and you lose data, so in practice you would need two processes. Besides, if you are setting a master/slave configuration for your persistent data and you store cache on the same server, you are basically wasting RAM, so separate processes are the way to go.
About performance, both redis and memcached are VERY performant, and on different tests they are on the same range when it comes to get/extract data, but memcached is better when you only need a cache.
Why is this so? First of all, since memcached only has one mission, which is storing key/values, it doesn't have any overhead when it comes to storing metadata. Redis on the other hand offers different data structures, so it stores more metadata which each key. One example of this: it's much "cheaper" to store data on a hash in Redis instead of using individual keys. You don't get any of this on memcached, since there is only one type of data. This means with the same amount of memory in your servers you can store more data on memcached than on redis. If you have a relatively small installation you don't really care, but the moment you start seeing growth, believe me you will want to keep those data under control.
So, as much as I like Redis, I prefer to have memcached for my caching needs and redis for my persistent storage/temporary storage/queue needs. I still use redis as a "cache" but not a temporary one with expiration, but as a lookup cache to save reading from a more expensive storage. For example, I keep a mapping between user IDs and nicknames on Redis. I never expire these mappings, so Redis is a perfect place for it.
In the case that you are dealing with a small amount of data, then it might make sense your idea of having a single technology for everything, but the moment you start growing over a few hundreds MB, I would say go with both of them.

Redis and Memcache or just Redis?

I'm using memcached for some caching in my Rails 3 app through the simple Rails.cache interface and now I'd like to do some background job processing with redis and resque.
I think they're different enough to warrant using both. On heroku though, there are separate fees to use both memcached and redis. Does it make sense to use both or should I migrate to just using redis?
I like using memcached for caching because least recently used keys automatically get pushed out of the cache and I don't need the cache data to persist. Redis is mostly new to me, but I understand that it's persistent by default and that keys do not expire out of the cache automatically.
EDIT: Just wanted to be more clear with my question. I know it's feasible to use only Redis instead of both. I guess I just want to know if there are any specific disadvantages in doing so? Considering both implementation and infrastructure, are there any reasons why I shouldn't just use Redis? (I.e., is memcached faster for simple caching?) I haven't found anything definitive either way.
Assuming that migrating from memcached to redis for the caching you already do is easy enough, I'd go with redis only to keep things simple.
In redis persistence is optional, so you can use it much like memcached if that is what you want. You may even find that making your cache persistent is useful to avoid lots of cache misses after a restart. Expiry is available also - the algorithm is a bit different from memcached, but not enough to matter for most purposes - see http://redis.io/commands/expire for details.
I'm the author of redis-store, there is no need to use directly Redis commands, just use the :expires_in option like this:
ActionController::Base.cache_store = :redis_store, :expires_in => 5.minutes
The advantage of using Redis is fastness, and with my gem, is that you already have stores for Rack::Cache, Rails.cache or I18n.
I've seen a few large rails sites that use both Memcached and Redis. Memcached is used for ephemeral things that are nice to keep hot in memory but can be lost/regenerated if needed, and Redis for persistent storage. Both are used to take a load off the main DB for reading/write heavy operations.
More details:
Memcached: used for page/fragment/response caching and it's ok to hit the memory limit on Memcached because it will LRU (least recently used) to expire the old stuff, and frequently keep accessed keys hot in memory. It's important that anything in Memcached could be recreated from the DB if needed (it's not your only copy). But you can keep dumping things into it, and Memcached will figure which are used most frequently and keep those hot in memory. You don't have to worry about removing things from Memcached.
redis: you use this for data that you would not want to lose, and is small enough to fit in memory. This usually includes resque/sidekiq jobs, counters for rate limiting, split test results, or anything that you wouldn't want to lose/recreate. You don't want to exceed the memory limit here, so you have to be a little more careful about what you store and clean up later.
Redis starts to suffer performance problems once it exceeds its memory limit (correct me if I'm wrong). It's possible to solve this by configuring Redis to act like Memcached and LRU expire stuff, so it never reaches its memory limit. But you would not want to do this with everything you are keeping in Redis, like resque jobs. So instead of people often keep the default, Rails.cache set to use Memcached (using the dalli gem). And then they keep a separate $redis = ... global variable to do redis operations.
# in config/application.rb
config.cache_store = :dalli_store # memcached
# in config/initializers/redis.rb
$redis = $redis = Redis.connect(url: ENV['REDIS_URL'])
There might be an easy way to do this all in Redis - perhaps by having two separate Redis instances, one with an LRU hard memory limit, similar to Memcache, and another for persistent storage? I haven't seen this used, but I'm guessing it would be doable.
I would consider checking out my answer on this subject:
Rails and caching, is it easy to switch between memcache and redis?
Essentially, through my experience, I would advocate for keeping them separate: memcached for caching and redis for data structures and more persistant storage
I asked the team at Redis Labs (who provide the Memcached Cloud and Redis Cloud add ons) about which product they would recommend for Rails caching. They said that in general they would recommend Redis Cloud, that Memcached Cloud is mainly offered for legacy purposes, and pointed out that their Memcached Cloud service is in fact build on top of Redis Cloud.
I don't know what you're using them for, but actually using both may give you a performance advantage: Memcached has far better performance running across multiple cores than Redis, so caching the most important data with Memcached and keeping the rest in Redis, taking advantage of its capabilities as database, could increase performance.

Resources