Nginx + Unicorn : difference between both worker_processes - ruby-on-rails

I have a rails app running on a Nginx + Unicorn environnement
I see that both have a "worker_processes" in their config file and I wonder what is the optimal configuration
If I have 4 cores, should I put 4 for both? Or 1 for nginx and 4 for unicorn?
(by the way, I am using sidekiq too, so what about sidekiq concurrency?)

Nginx
Nginx is event based server. It means that 1 operation system (OS) process can manage very big number of connections. It's possible, because usual state of connections is wait. While connection waiting for other side or sending/receiving packet of data - nginx can process with other connections. One nginx worker can work with thousands or even tens of thousands connections. So, even worker_processes 1 can be enough.
More nginx's workers allow to use more CPU cores (that can be important if nginx is the main CPU eater). More workers also good if nginx doing lot of disk IO.
Resume: you can safe start from 1 worker and increase to number of CPU cores.
Unicorn
Worker of Unicorn is little bit different from nginx because one worker = one request. Number of unicorn workers show how many ruby processes will execute same time. And this number depends on your application.
For example, you application is CPU bound (doing some math only). In this case number of workers greater than number of CPU cores can cause problems.
But usual application work with some databases and sleep while wait for database answer. If our database placed on other server (request processing do not eat our CPU) - ruby will sleep and CPU idle. In this case we can increase number of workers to CPU*3... CPU*5 or even CPU*20 workers.
Resume: Best way to find this number - load testing of your real application. Set number of unicorn workers, start load testing with the same number of concurrency. If server feels good - increase number of workers and test again.
Sidekiq
Concurrency of sidekiq similar to unicorn workers. If tasks is CPU bound - set number of treads close to number of CPU. If I/O bound - number of thread can be greater than number of CPU cores. Also, other tasks of this server is important (like unicorn). Just remember, that number of CPU cores do not change if you will run sidekiq on same server with unicorn :)
Resume: same as unicorn.

There is no absolute best answer. (If there was, the software would tune itself automatically.)
It all depends on your operating system, environment, processor, memory, discs, buffer cache of the OS, caching policy in nginx, hit rates, application, and probably many other factors, of what would be the best possible solution.
Which, not very surprisingly, is actually what the documentation of nginx says, too:
http://nginx.org/r/worker_processes
The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it).
As for unicorn, a quick search for "worker_processes unicorn" reveals the following as the first hit:
http://bogomips.org/unicorn/TUNING.html
worker_processes should be scaled to the number of processes your backend system(s) can support. DO NOT scale it to the number of external network clients your application expects to be serving. unicorn is NOT for serving slow clients, that is the job of nginx.
worker_processes should be scaled to the number of processes your backend system(s) can support. DO NOT scale it to the number of external network clients your application expects to be serving. unicorn is NOT for serving slow clients, that is the job of nginx.
worker_processes should be at least the number of CPU cores on a dedicated server (unless you do not have enough memory). If your application has occasionally slow responses that are /not/ CPU-intensive, you may increase this to workaround those inefficiencies.
…
Never, ever, increase worker_processes to the point where the system runs out of physical memory and hits swap. Production servers should never see heavy swap activity.
https://bogomips.org/unicorn/Unicorn/Configurator.html#method-i-worker_processes
sets the current number of #worker_processes to nr. Each worker process will serve exactly one client at a time. You can increment or decrement this value at runtime by sending SIGTTIN or SIGTTOU respectively to the master process without reloading the rest of your Unicorn configuration. See the SIGNALS document for more information.
In summary:
for nginx, it is best to keep it at or below the number of CPUs (and I'd probably not count the hyperthreading ones, especially if you have other stuff running on the same server) and/or discs,
whereas for unicorn, it looks like, it probably has to be at least the number of CPUs, plus, if you have sufficient memory, and depending on your workload, you may possibly want to increase it much further than the raw number of the CPUs

The general rule of thumb is to use one worker process per core that your server has. So setting worker_processes 4; would be optimal in your scenario for both nginx and Unicorn config files, as given by example here:
nginx.conf
# One worker process per CPU core is a good guideline.
worker_processes 1;
unicorn.rb
# The number of worker processes you have here should equal the number of CPU
# cores your server has.
worker_processes (ENV['RAILS_ENV'] == 'production' ? 4 : 1)
More information on the Sidekiq concurrency can be found here:
You can tune the amount of concurrency in your sidekiq process. By default, one sidekiq process creates 25 threads. If that's crushing your machine with I/O, you can adjust it down:
sidekiq -c 10
Don't set the concurrency higher than 50. I've seen stability issues with concurrency of 100, for example. Note that ActiveRecord has a connection pool which needs to be properly configured in config/database.yml to work well with heavy concurrency. Set the pool setting to something close or equal to the number of threads:
production:
adapter: mysql2
database: foo_production
pool: 25

Related

Maximising use of available database connections

I just upgrade our database plan on Heroku for Postgres. On the new plan we have a lot more connections and I'm trying to make sure we're making full use of them at scale.
Say we configured our Puma server with the 40 threads:
puma -t 40:40
...and I set the pool size to 60 (just for a bit of buffer). My understanding is that because I've preallocated 40 Puma threads, each one will reserve a connection, resulting in 40 active connections. However, if I check the active connections there are only 5.
Am I completely misunderstanding how this works?
I am far from an expert in Puma so I just share my own knowledge.
First if you set the number of threads to 40, then your Puma worker will have 40 threads. Though be careful, because of GIL (or GVL) your Puma worker can have only a single thread doing a Ruby task at once. The 39 remaining threads are just sitting idle. UNLESS they are doing I/O (access to database or such ).
Basically the common knowledge is that after 5 threads, you have no more gain from increasing the number of threads. Maybe this can be pushed to 10 if your app is really I/O oriented but I wouldn't go further..
The real concurrency is set by the number of Puma workers (if you boot Puma in clustered mode). If you set the number of Puma workers to 40 then your app can at least handle 40 users at a time.
But 40 workers requires a huge Heroku Dyno, with quite a bit of RAM. Also if you add 5 threads per Puma worker then you need 200 DB connections !
What about the live DB connections
Due to the above, it is very hard to have a single worker with 40 threads to have them all access the DB at the same time. This is probably why your live DB connections are only 5 (unless you have not redeployed your app after the change).
I have a small app and also see a varying number of live DB connections across time.
The buffer
Never do a buffer. You are just blocking connections that can't be accessed by your app. The thread pool should equates the max number of threads.
My question: why so many DB connections ?
What was your goal in increasing the DB connections ? More concurrency ? If you have a small app, with a small web dyno, there is no point to have a big database plan behind.
If you want to scale your app. Get a bigger web dyno. Add more Puma workers while sticking to a number of threads to 5.
When the number of workers multiplied by the number of threads exceeds the number of allowed database connections, then it is time to upgrade the database.
Nota Bene : Rails may use a few connections for its internals. So if you have a database with 20 connnections, a Puma config with 3 workers and 5 threads. It is better to upgrade before adding a fourth worker.

Best configuration of Auto Scaling Group for Rails application deployed using NGINX and Puma

I am using the Amazon Auto Scaling group for Rails application deployed on an EC2 instance using NGINX and Puma. I am facing some challenges with the configuring of the Auto Scaling policy.
I am using r5.xlarge for the main instance that is hosting my corn jobs and r5.large for the autoscaling instance. My current scaling trigger is defined on the 50% CPU but apparently, that does not work due to the following reasons
Since the main instance has 4 CPUs the overall consumption did not hit 50% unless there is some corn job running that is consuming all resources.
Even if the CPU will hit 50% the startup time of rails application is 30-40 seconds and in the meantime, all requests received by the server returns 503.
If the CPU consumption is less than 50% but the system receives a lot of concurrent requests it does not start a new instance and either start returning 503 or the response time increases significantly.
I have tried changing the auto-scaling group from CPU consumption to the number of requests but the start time issue of instance still prevails and sometimes it starts a new instance when it is not even needed.
Have you ever faced any such issue with Rails deployment, anything that you thinks worked for your out of the box?
We are running Ruby application with PUMA in ECS Tasks, but should be quite the same problematic that with EC2.
Since Ruby is single threaded, your Ruby Process running your PUMA server is only going to use one CPU at a time. If you have 4 CPU, I imagine one PUMA process will never manage to saturate more than 25% of the overall machine.
Note: Also have a look at your configuration regarding the number of PUMA Threads. This is also critical to configure, since you are doing auto-scaling, your application NEED to be able to saturate the CPU it's using, to be able to kick in. With too few Puma Thread it will not be the case, with too much your application will become unstable, this is something to fine tune.
Recommendation:
Run one PUMA process per CPU you have available with the EC2 class you have chosen, each PUMA server listening on a different port, have your load-balancer manage that. This should allow your machine to reach potentially 100% CPU during saturation (in theory), allowing auto-scaling base on CPU to work
Preferred solution: Pick smaller machines, with 1 CPU, so you only need to run one PUMA server per machine.
From my experience with ECS, Ruby and other single threaded languages should not use more than 1 (v)CPU machines, and you should instead really on heavy horizontal scaling if necessary (some of our service are running 50x ECS instances).
Hope this helps.

AWS/Ruby On Rails/Puma/Nginx Maximum number of requests

I recently changed from passenger to puma because it was constantly giving me "request limit exceeded error" and I read online that passenger free version doesn't support multithreading. My backend application is hosted on two AWS c5xlarge instances and has and elastic load balancer on top. Can someone help me with the number of workers and threads I should set in puma config and a maximum number of concurrent requests I can serve with these settings?
There is no clear answer to your question. It depends on a lot of parameters.
You should create a benchmark script which sends a lot of requests from multiple process and/or threads, and if your server can handle a heavy load even from multiple instances, and see how many requests are served in a second.
After you have this benchmark test try to change the number of threads and workers, in order to increase the number of requests served using this benchmark.
I would start with one nginx worker and the number of threads as the number of cpu cores and increase or decrease according to the benchmark.

Is it bad to use unicorn without nginx? why?

I read that unicorn is fast to serve static content, slow users, making redirects.
Why is better nginx+unicorn vs running unicorn only, and scale the number of unicorn workers when needed?
Do you have any numbers showing how much fast is nginx on each of these things(redirecting, proxying, serving static content)?
As Heroku DevCenter claims, Unicorn workers are vulnerable to slow clients.
Each worker is only able to process a single request, and if the client is not ready to accept the entire answer (aka "slow client"), the Unicorn worker is blocked on sending out the response and cannot handle the next one. Since each Unicorn worker takes up a substantial amount of RAM (again, see Heroku, it claims to handle 2-4 processes at 512 MiB RAM), you cannot rely on number of workers, since it's about the number of clients that can render your application inoperable by pretending to have slow connections.
When behind nginx, Unicorn is able to dump the entire answer into nginx's buffer and switch immediately to handling the next request.
That said, nginx with a single Unicorn worker behind is much more reliable than a bunch of Unicorn workers exposed directly.
NB: for the folks using ancient Rubies out there: if you'll be using a set of Unicorn workers, consider migrating to at least Ruby 2.0 to reduce RAM consumption by sharing common data across forked processes (ref).

Rails application servers

I've been reading information about how different rails application servers work for a while and some things got me confused probably because of my lack of knowledge in this field. Anyway, the following things got me confused:
Puma server has the following line about its clustered mode workers number in its readme:
On a ruby implementation that offers native threads, you should tune this number to match the number of cores available
So if I have, lets say, 2 cores and use rubinius as a ruby implementation, should I still use more than 1 process considering that rubinius use native threads and doesn't have the lock thus it uses all the CPU cores anyway, even with 1 process?
I understand it that I'd need to only increase the threads pool of the only process if I upgrade to a machine with more cores and memory, if it's not correct, please explain it to me.
I've read some articles on using Server-Sent Events with puma which, as far as I understand, blocks the puma thread since the browser keeps the connection open, so if I have 16 threads and 16 people are using my site, then the 17th would have to wait before one of those 16 leaves so it could connect? That's not very efficient, is it? Or what do I miss?
If I have a 1 core machine with 3Gb of RAM, just for the sake of the question, and using unicorn as my application server and 1 worker takes 300 MBs of memory and its CPU usage is insignificant, how many workers should I have? Some say that the number of workers should be equal to the number of cores, but if I set the workers number to, lets say, 7 (since I have enough RAM for it), it will be able to handle 7 concurrent requests, won't it? So it's just a question of memory and cpu usage and amount of RAM? Or what do I miss?

Resources