sidekiq utilizing memory even if no workers are working - ruby-on-rails

I have scheduled some background tasks using sidekiq with workers concurrency of 22
but I am seeing about 13.7% of memory consumption is happening even if none of the workers are working, Is this normal or should I have to change some configuration in sidekiq to avoid this
ubuntu 9331 21.8 13.7 1505656 1082988 ? Sl Mar06 557:08 sidekiq 2.7.5 jobs [0 of 22 busy]
Thanks

13.7% seems to be a lot of memory, but I don't know how many GB you have. (I'm pretty bad at reading top results).
However, even if all your workers are idle, Sidekiq is still running ready to process new jobs. And this consume memory.
So depending on the quantity of memory you have, it's perfectly normal.

Related

Sidekiq is not picking up jobs from the equeued set

I am running a local rails application on a VM with 64 cores and 192 GB RAM. We are running with 12 Sidekiq Processes with 10 Threads each and 40 Puma Processes.
The App Memory and CPU allocation is taken care by the docker.
The Problem we are facing is when we enqueue large number of jobs, say 1000, Sidekiq picks up 10*12 jobs and starts processing them.
Mid-execution suddenly the number of jobs running goes to 0. Moreover Sidekiq stops picking up new jobs.
All the threads in each process have the busy status but none of them processing any jobs.
This problem goes away when we reduce the number of Sidekiq Processes to 2. All the jobs process normally and Sidekiq even picks up new jobs from Enqueued Set.
What should be the correct way to debug this issue or how should I proceed further?
I tried reducing number of threads keeping the number of processes same. But that also resulted in the same issue.

Multiple workers on machine - Memory management ( Resque - Rails )

We've migrated our Resque background workers from a ton of individual instances on Heroku to between four and ten m4.2xl (32GB Mem) instances on EC2, which is much cheaper and faster.
A few things I've noticed are that: The Heroku instances we were using had 1GB of RAM and rarely ran out of memory. I am currently allocating 24 workers to one machine, so about 1.3GB of memory per worker. However, because the processing power on these machines is so much greater, I think the OS has trouble reclaiming the memory fast enough and each worker ends of consuming more on average. Most of the day the system has 17-20GB memory free but when the memory intensive jobs are run, all 24 workers grab a job almost at the same time and then start growing. They get through a few jobs but then the system hasn't had time to reap memory and crashes if there is no intervention.
I've written a daemon to pause the workers before a crash and wait for the OS to free memory. I could reduce the number of workers per machine overall or have half of them unsubscribe from the problematic queues, I just feel there must be a better way to manage this. I would prefer to be making usage of more than 20% of memory 99% of the day.
The workers are setup to fork a process when they pick up a job from the queue. The master-worker processes are run as services managed with Upstart. I'm aware there are a number of managers which simply restart the process when it consumes a certain amount of memory such as God and Monit. That seems like a heavy handed solution which will end with too many jobs killed under normal circumstances.
Is there a better strategy I can use to get higher utilization with a lowered risk of running into Errno::ENOMEM?
System specs:
OS : Ubuntu 12.04
Instance : m4.2xlarge
Memory : 32 GB

How can I figure out which Sidekiq jobs are taking up so much memory?

I am running 4 Sidekiq processes on an R3 Large AWS VM, and a total of 100 sidekiq jobs. An R3 has 15.25 GB RAM assigned to it, and 2 CPUs.
Jobs consistently fail due to extremely heavy swapping. The worst I've ever seen. I have just decided to shut down the production background environment while I bring up a bigger VM.
I don't want to keep doing this though. I need to do the hard work to find out where I can optimize my jobs. I am absolutely certain that there are only 1 or 2 kinds of jobs that are causing me these headaches. I just don't know how to find them.
How can I see which processes are causing the biggest problems with memory among my Sidekiq jobs?

Sidekiq workers suddenly slowed down (almost like stuck workers)

I've been using sidekiq for a while now and it was working flawlessly (up to 5 million jobs processed). However in the past few days the workers got stuck and thus the jobs left unprocessed. Only by restarting the workers, they'll start working and consuming the jobs again, but they'll eventually stuck again (~10-30minutes, I haven't done any exact measurements).
Here's my setup:
Rails v4.2.5.1, with ActiveJob.
MySQL DB, clustered (with 3 masters)
ActiveRecord::Base.connection_pool set to 32 (verified in Sidekiq process as well).
2 sidekiq workers, 3 threads per worker (total 6).
Symptons:
If the workers just got restarted, they process the jobs fast (~1s).
After several jobs processed, the time needed to complete a job (the same job that previously take only ~1s to complete) suddenly spiked to ~2900s, which make the worker look like stuck.
The slows down affect any kind of jobs (there's no specific offending job).
CPU usage and Memory consumption is normal and no swap either.
Here is the TTIN log. It seems like the process hung when:
retrieve_connection
clear_active_connections
But I'm not sure why it is happening. Anyone have similar experience or know something about this issue? Thanks in advance.
EDIT:
Here's the relevant mysql show processlist log.
I'm running Sidekiq on 2 different machines:
20.7.4.5
20.7.4.6
Focusing on 20.7.4.5, there were 10 connections and all of them are currently sleeping. If I understand correctly:
1 is passenger connection
3 are the currently "busy"(stuck) sidekiq workers.
6 are the unclosed connections.
There's no long-running query here since all the connections are currently sleeping (idle, waiting to be terminated with default timeout duration of 8 hours), is this correct?
EDIT 2:
So it turns out the issue has something to do with our DB configuration. We are using this schema:
Sidekiq workers => Load balancer => DB clusters.
With this setup, sidekiq workers start hanging after a while (completing job MUCH slower, up to 3000s, while it usually takes only 1s).
However if we setup the workers to directly talk with the DB cluster, it works flawlessly. So something is probably wrong with our setup, and this is not a sidekiq issue.
Thanks for all the help guys.

Memory usage in a Rails app, how to monitor and troubleshoot?

I have a Rails app that among other things, have several background jobs which are computationally expensive (image manipulation :O).
I am using Sidekiq to manage those jobs. I currently have set a concurrency of 5 threads per Sidekiq process and here is what I do in order to see the memory usage:
ps faux | grep sidekiq
Results are this:
hommerzi 3874 3.5 5.7 287484 231756 pts/5 Sl+ 17:17 0:10 | \_ sidekiq 2.17.0 imageparzer [3 of 3 busy]
However, I have a feeling that there must be a way to monitor this correctly from within the Rails app, or am I wrong?
My question would be: How can I monitor my memory usage in this scenario?
My advice would be to use Monit (or God) to manage your processes. This goes for database, server, application; not just background jobs.
Here's an example: Monit Ruby on Rails Sidekiq
Monitor your application for a while and set realistic memory limits. Then, if one of your processes dies or goes above that limit for a given amount of cycles (usually 2 minute checks), it will (re)start the process.
You can also setup an alert email address and a web frontend (with basic HTTP auth). This will prove essential for running stable applications in production. For example, recently I had a sidekiq process get carried away with itself and chew up 250mb memory. Monit then restarted the process (which is now hovering around 150mb) and sent me an alert. Now I can check the logs/system to see why that might have happened. This all happened while I was asleep. Much better than the alternative: waking up and finding your server on its knees with a runaway process or downed service.
https://www.digitalocean.com/community/articles/how-to-install-and-configure-monit

Resources