Multiple workers on machine - Memory management ( Resque - Rails ) - ruby-on-rails

We've migrated our Resque background workers from a ton of individual instances on Heroku to between four and ten m4.2xl (32GB Mem) instances on EC2, which is much cheaper and faster.
A few things I've noticed are that: The Heroku instances we were using had 1GB of RAM and rarely ran out of memory. I am currently allocating 24 workers to one machine, so about 1.3GB of memory per worker. However, because the processing power on these machines is so much greater, I think the OS has trouble reclaiming the memory fast enough and each worker ends of consuming more on average. Most of the day the system has 17-20GB memory free but when the memory intensive jobs are run, all 24 workers grab a job almost at the same time and then start growing. They get through a few jobs but then the system hasn't had time to reap memory and crashes if there is no intervention.
I've written a daemon to pause the workers before a crash and wait for the OS to free memory. I could reduce the number of workers per machine overall or have half of them unsubscribe from the problematic queues, I just feel there must be a better way to manage this. I would prefer to be making usage of more than 20% of memory 99% of the day.
The workers are setup to fork a process when they pick up a job from the queue. The master-worker processes are run as services managed with Upstart. I'm aware there are a number of managers which simply restart the process when it consumes a certain amount of memory such as God and Monit. That seems like a heavy handed solution which will end with too many jobs killed under normal circumstances.
Is there a better strategy I can use to get higher utilization with a lowered risk of running into Errno::ENOMEM?
System specs:
OS : Ubuntu 12.04
Instance : m4.2xlarge
Memory : 32 GB

Related

Dask: Would storage network speed cause a worker to die

I am running a process that writes large files across the storage network. I can run the process using a simple loop and I get no failures. I can run using distributed and jobqueue during off peak hours and no workers fail. However when I run the same command during peak hours, I get worker killing themselves.
I have ample memory for the task and plenty of workers, so I am not sitting in a queue.
The error logs usually has a bunch of over garbage collection limits followed by a Worker killed with Signal 9
Signal 9 suggests that the process has violated some system limit, not that Dask has decided for the worker to die. Since this only happens on high disk IO at busy times, indeed I agree that the network storage is the likely culprit, e.g., a lot of writes have been buffered, but are not being cleared through the relatively low bandwidth.
Dask also uses local storage for temporary files, and "local" might be the network storage. If you have real local disks on the nodes, you should use that, or if not, maybe turn off disk-spilling altogether. https://docs.dask.org/en/latest/setup/hpc.html#local-storage

How do I allocate same amount of resource for all my tasks deployed on a Celery cluster?

To compare and contrast the performances of three different algorithms in a scientific experiment, I am planning to use Celery scheduler. These algorithms are implemented by three different tools. They may or may not have parallelism implemented which I don't want to make any prior assumption about. The dataset contains 10K data points. All three tools are supposed to run on all the data points; which translates to 30K tasks scheduled by the scheduler. All I want is to allocate the same amount of resources to all the tools, across all the executions.
Assume, my physical Ubuntu 18.04 server is equipped with 24 cores and 96 GB of RAM. Tasks are scheduled by 4 Celery workers, each handling a single task. I want to put an upper limit of 4 CPU cores and 16 GB of memory per task. Moreover, no two tasks should race for the same cores, i.e., 4 tasks should be using 16 cores in total, each scheduled on its own set of cores.
Is there any means to accomplish this setup, either through Celery, or cgroup, or by any other mechanism? I want to refrain from using docker, kubernetes, or any VM based approach, unless it is absolutely required.
Dealing with CPU cores should be fairly easy by specifying concurrency to 6. But limiting memory usage is hard part of the requirement and I believe you can accomplish that by making worker processes be owned by particular cgroup that you specified memory limit on.
An alternative would be to run Celery workers in containers with specified limits.
I prefer not to do this as there may be tasks (or task with particular arguments) that allocate tiny amount of RAM so it would be wasteful if you can't use 4G of RAM while such task runs.
Pity Celery autoscaling is deprecated (it is one of the coolest features of Celery, IMHO). It should not be a difficult task to implement Celery autoscaler that scales up/down depending on memory utilization.

ruby requests more memory when there are plenty free heap slots

We have a server running
Sidekiq 4.2.9
rails 4.2.8
MRI 2.1.9
This server periodically produce some amount of importing from external API's, perform some calculations on them and save these values to the database.
About 3 weeks ago server started hanging, as I see from NewRelic (and when ssh'ed to it) - it consumes more and more memory over time, eventually occupying all available RAM, then server hangs.
I've read some articles about how ruby GC works, but still can't understand, why at ~5:30 AM heap size jumps from ~2.3M to 3M , when there's still 1M free heap slots available(GC settings are default)
similar behavior, 3:35PM:
So, the questions are:
how to make Ruby fill free heap slots instead of requesting new slots from OS ?
how to make it release free heap slots to the system ?
how to make Ruby fill free heap slots instead of requesting new slots from OS ?
Your graph does not have "full" fidelity. It is a lot to assume that GC.stat was called by Newrelic or whatnot just at the exact right time.
It is incredibly likely that you ran out of slots, heap grew and since heaps don't shrink in Ruby you are stuck with a somewhat bloated heap.
To alleviate some of the pain you can limit RUBY_GC_HEAP_GROWTH_MAX_SLOTS to a sane number, something like 100,000 will do, I am trying to lobby setting a default here in core.
Also
Create a persistent log of jobs that run and time they ran (duration and so on), gather GC.stat before and after job runs
Split up your jobs by queue, run 1 queue on one server and other queue on another one, see which queue and which job is responsible for the problem
Profile various jobs you have using flamegraph or other profiling tools
Reduce the amount of concurrent jobs you run as an experiment, or place a mutex between certain job types. It is possible that 1 "job a" at a time is OKish, and 20 concurrent "job a"s at a time will bloat memory.

How can I figure out which Sidekiq jobs are taking up so much memory?

I am running 4 Sidekiq processes on an R3 Large AWS VM, and a total of 100 sidekiq jobs. An R3 has 15.25 GB RAM assigned to it, and 2 CPUs.
Jobs consistently fail due to extremely heavy swapping. The worst I've ever seen. I have just decided to shut down the production background environment while I bring up a bigger VM.
I don't want to keep doing this though. I need to do the hard work to find out where I can optimize my jobs. I am absolutely certain that there are only 1 or 2 kinds of jobs that are causing me these headaches. I just don't know how to find them.
How can I see which processes are causing the biggest problems with memory among my Sidekiq jobs?

speed up php-cli

Why is a php cli process using 25% of CPU, is there a way to reduce this? Right now I'm running 3 instances but obviously I would like to run much more to finish the job faster.
Background info: I'm moving data from a transbase db to mysql db.
EDIT: If I run this in a browser there isn't such a noticeable load on the CPU.
More processes doesn't mean faster processing. The PHP process takes as much CPU as it can to finisgh the task as quick as possible. It's probably 25% because you got a quad-core processor and it's a single threaded task.
Ideally, you would need 4 processes if you could assign each of them to a different code. Also, because of waiting for database or disk-I/O, a single thread cannot fully use all CPU power all the time, so go ahead and run more processes. It's not that a 5th processes will crash because all CPU power is used up; it will just take its share, while the OS divides processing power to all running processes.
Just dont' start too many; every process has a little overhead, and you won't benefit from having 200 simultaneous processes.

Resources