gearmand version 1.1.18 (latest) works very slow when free RAM runs out. I have 64 GB RAM on server and when gearman use all free memory and goes to 1mb of swap - gearmand starts to work very-very slowly. Adding any task to background que can take up to 10 seconds.
Removing task from que can take up to 30 seconds.
Que more than 10 000 000 jobs.
Related
I am running a local rails application on a VM with 64 cores and 192 GB RAM. We are running with 12 Sidekiq Processes with 10 Threads each and 40 Puma Processes.
The App Memory and CPU allocation is taken care by the docker.
The Problem we are facing is when we enqueue large number of jobs, say 1000, Sidekiq picks up 10*12 jobs and starts processing them.
Mid-execution suddenly the number of jobs running goes to 0. Moreover Sidekiq stops picking up new jobs.
All the threads in each process have the busy status but none of them processing any jobs.
This problem goes away when we reduce the number of Sidekiq Processes to 2. All the jobs process normally and Sidekiq even picks up new jobs from Enqueued Set.
What should be the correct way to debug this issue or how should I proceed further?
I tried reducing number of threads keeping the number of processes same. But that also resulted in the same issue.
Is that normal that Sidekiq eats the 25% of my RAM even if there are no jobs running (0/10 busy)?
I'm using jemalloc as suggested online: it seems that the consumption has decreased a bit, but not so much.
RAM usage is a function of your app code and the gems you load. Use a profiler like derailed_benchmarks to profile RAM usage in your app. Lowering the concurrency from 10 to 5 might help a little.
We've migrated our Resque background workers from a ton of individual instances on Heroku to between four and ten m4.2xl (32GB Mem) instances on EC2, which is much cheaper and faster.
A few things I've noticed are that: The Heroku instances we were using had 1GB of RAM and rarely ran out of memory. I am currently allocating 24 workers to one machine, so about 1.3GB of memory per worker. However, because the processing power on these machines is so much greater, I think the OS has trouble reclaiming the memory fast enough and each worker ends of consuming more on average. Most of the day the system has 17-20GB memory free but when the memory intensive jobs are run, all 24 workers grab a job almost at the same time and then start growing. They get through a few jobs but then the system hasn't had time to reap memory and crashes if there is no intervention.
I've written a daemon to pause the workers before a crash and wait for the OS to free memory. I could reduce the number of workers per machine overall or have half of them unsubscribe from the problematic queues, I just feel there must be a better way to manage this. I would prefer to be making usage of more than 20% of memory 99% of the day.
The workers are setup to fork a process when they pick up a job from the queue. The master-worker processes are run as services managed with Upstart. I'm aware there are a number of managers which simply restart the process when it consumes a certain amount of memory such as God and Monit. That seems like a heavy handed solution which will end with too many jobs killed under normal circumstances.
Is there a better strategy I can use to get higher utilization with a lowered risk of running into Errno::ENOMEM?
System specs:
OS : Ubuntu 12.04
Instance : m4.2xlarge
Memory : 32 GB
On our webhomes on a iis 7.5 webserver there is an increase in active request in a cycle of 7 minutes. 5 minutes everything is quite normal then comes an increase in active request for 2 minutes.
While active requests increases, ntoskrnl.exe also has an increase in cpu load.
Are there anybody who can give me any clues to what I shall look for?
One of the things that we did notice, was that the Garbage Collector was going nuts every 5 minuttes.
After a server restart everything is fine again.
I was attempting to evaluate various Rails server solutions. First on my list was an nginx + passenger system. I spun up an EC2 instance with 8 gigs of RAM and 2 processors, installed nginx and passenger, and added this to the nginx.conf file:
passenger_max_pool_size 30;
passenger_pool_idle_time 0;
rails_framework_spawner_idle_time 0;
rails_app_spawner_idle_time 0;
rails_spawn_method smart;
I added a little "awesome" controller to rails that would just render :text => (2+2).to_s
Then I spun up a little box and ran this to test it:
ab -n 5000 -c 5 'http://server/awesome'
And the CPU while this was running on the box looked pretty much like this:
05:29:12 PM CPU %user %nice %system %iowait %steal %idle
05:29:36 PM all 62.39 0.00 10.79 0.04 21.28 5.50
And I'm noticing that it takes only 7-10 simultaneous requests to bring the CPU to <1% idle, and of course this begins to seriously drag down response times.
So I'm wondering, is a lot of CPU load just the cost of doing business with Rails? Can it only serve a half dozen or so super-cheap requests simultaneously, even with a giant pile of RAM and a couple of cores? Are there any great perf suggestions to get me serving 15-30 simultaneous requests?
Update: tried the same thing on one of the "super mega lots and lots of CPUs" EC2 thing. Holy crap was that a lot of CPU power. The sweet spot seemed to be about 2 simultaneous requests per CPU, was able to get it up to about 630 requests/second at 16 simultaneous requests. Don't know if that's actually cost efficient over a lot of little boxes, though.
I must say that my Rails app got a massive boost to supporting about 80 concurrent users from about 20 initially supported after adding some memcached servers (4 mediums at EC2). I run a high traffic sports site that really hit it a few months ago. Database size is about 6 gigs with heavy updates/inserts.
MySQL (RDS large high usage) cache also helped a bit.
I've tried playing with the passenger settings but got some curious results - like for example each thread eats up 250 megs of RAM which is odd considering the application isn't that big.
You can also save massive $ by using spot instances but don't rely entirely on that - their pricing seems to spike on occasion. I'd AutoScale with two policies - one with spot instances and another with on demand (read: reserved) instances.