I recently changed from passenger to puma because it was constantly giving me "request limit exceeded error" and I read online that passenger free version doesn't support multithreading. My backend application is hosted on two AWS c5xlarge instances and has and elastic load balancer on top. Can someone help me with the number of workers and threads I should set in puma config and a maximum number of concurrent requests I can serve with these settings?
There is no clear answer to your question. It depends on a lot of parameters.
You should create a benchmark script which sends a lot of requests from multiple process and/or threads, and if your server can handle a heavy load even from multiple instances, and see how many requests are served in a second.
After you have this benchmark test try to change the number of threads and workers, in order to increase the number of requests served using this benchmark.
I would start with one nginx worker and the number of threads as the number of cpu cores and increase or decrease according to the benchmark.
Related
I am using the Amazon Auto Scaling group for Rails application deployed on an EC2 instance using NGINX and Puma. I am facing some challenges with the configuring of the Auto Scaling policy.
I am using r5.xlarge for the main instance that is hosting my corn jobs and r5.large for the autoscaling instance. My current scaling trigger is defined on the 50% CPU but apparently, that does not work due to the following reasons
Since the main instance has 4 CPUs the overall consumption did not hit 50% unless there is some corn job running that is consuming all resources.
Even if the CPU will hit 50% the startup time of rails application is 30-40 seconds and in the meantime, all requests received by the server returns 503.
If the CPU consumption is less than 50% but the system receives a lot of concurrent requests it does not start a new instance and either start returning 503 or the response time increases significantly.
I have tried changing the auto-scaling group from CPU consumption to the number of requests but the start time issue of instance still prevails and sometimes it starts a new instance when it is not even needed.
Have you ever faced any such issue with Rails deployment, anything that you thinks worked for your out of the box?
We are running Ruby application with PUMA in ECS Tasks, but should be quite the same problematic that with EC2.
Since Ruby is single threaded, your Ruby Process running your PUMA server is only going to use one CPU at a time. If you have 4 CPU, I imagine one PUMA process will never manage to saturate more than 25% of the overall machine.
Note: Also have a look at your configuration regarding the number of PUMA Threads. This is also critical to configure, since you are doing auto-scaling, your application NEED to be able to saturate the CPU it's using, to be able to kick in. With too few Puma Thread it will not be the case, with too much your application will become unstable, this is something to fine tune.
Recommendation:
Run one PUMA process per CPU you have available with the EC2 class you have chosen, each PUMA server listening on a different port, have your load-balancer manage that. This should allow your machine to reach potentially 100% CPU during saturation (in theory), allowing auto-scaling base on CPU to work
Preferred solution: Pick smaller machines, with 1 CPU, so you only need to run one PUMA server per machine.
From my experience with ECS, Ruby and other single threaded languages should not use more than 1 (v)CPU machines, and you should instead really on heavy horizontal scaling if necessary (some of our service are running 50x ECS instances).
Hope this helps.
I have a rails app running on a Nginx + Unicorn environnement
I see that both have a "worker_processes" in their config file and I wonder what is the optimal configuration
If I have 4 cores, should I put 4 for both? Or 1 for nginx and 4 for unicorn?
(by the way, I am using sidekiq too, so what about sidekiq concurrency?)
Nginx
Nginx is event based server. It means that 1 operation system (OS) process can manage very big number of connections. It's possible, because usual state of connections is wait. While connection waiting for other side or sending/receiving packet of data - nginx can process with other connections. One nginx worker can work with thousands or even tens of thousands connections. So, even worker_processes 1 can be enough.
More nginx's workers allow to use more CPU cores (that can be important if nginx is the main CPU eater). More workers also good if nginx doing lot of disk IO.
Resume: you can safe start from 1 worker and increase to number of CPU cores.
Unicorn
Worker of Unicorn is little bit different from nginx because one worker = one request. Number of unicorn workers show how many ruby processes will execute same time. And this number depends on your application.
For example, you application is CPU bound (doing some math only). In this case number of workers greater than number of CPU cores can cause problems.
But usual application work with some databases and sleep while wait for database answer. If our database placed on other server (request processing do not eat our CPU) - ruby will sleep and CPU idle. In this case we can increase number of workers to CPU*3... CPU*5 or even CPU*20 workers.
Resume: Best way to find this number - load testing of your real application. Set number of unicorn workers, start load testing with the same number of concurrency. If server feels good - increase number of workers and test again.
Sidekiq
Concurrency of sidekiq similar to unicorn workers. If tasks is CPU bound - set number of treads close to number of CPU. If I/O bound - number of thread can be greater than number of CPU cores. Also, other tasks of this server is important (like unicorn). Just remember, that number of CPU cores do not change if you will run sidekiq on same server with unicorn :)
Resume: same as unicorn.
There is no absolute best answer. (If there was, the software would tune itself automatically.)
It all depends on your operating system, environment, processor, memory, discs, buffer cache of the OS, caching policy in nginx, hit rates, application, and probably many other factors, of what would be the best possible solution.
Which, not very surprisingly, is actually what the documentation of nginx says, too:
http://nginx.org/r/worker_processes
The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it).
As for unicorn, a quick search for "worker_processes unicorn" reveals the following as the first hit:
http://bogomips.org/unicorn/TUNING.html
worker_processes should be scaled to the number of processes your backend system(s) can support. DO NOT scale it to the number of external network clients your application expects to be serving. unicorn is NOT for serving slow clients, that is the job of nginx.
worker_processes should be scaled to the number of processes your backend system(s) can support. DO NOT scale it to the number of external network clients your application expects to be serving. unicorn is NOT for serving slow clients, that is the job of nginx.
worker_processes should be at least the number of CPU cores on a dedicated server (unless you do not have enough memory). If your application has occasionally slow responses that are /not/ CPU-intensive, you may increase this to workaround those inefficiencies.
…
Never, ever, increase worker_processes to the point where the system runs out of physical memory and hits swap. Production servers should never see heavy swap activity.
https://bogomips.org/unicorn/Unicorn/Configurator.html#method-i-worker_processes
sets the current number of #worker_processes to nr. Each worker process will serve exactly one client at a time. You can increment or decrement this value at runtime by sending SIGTTIN or SIGTTOU respectively to the master process without reloading the rest of your Unicorn configuration. See the SIGNALS document for more information.
In summary:
for nginx, it is best to keep it at or below the number of CPUs (and I'd probably not count the hyperthreading ones, especially if you have other stuff running on the same server) and/or discs,
whereas for unicorn, it looks like, it probably has to be at least the number of CPUs, plus, if you have sufficient memory, and depending on your workload, you may possibly want to increase it much further than the raw number of the CPUs
The general rule of thumb is to use one worker process per core that your server has. So setting worker_processes 4; would be optimal in your scenario for both nginx and Unicorn config files, as given by example here:
nginx.conf
# One worker process per CPU core is a good guideline.
worker_processes 1;
unicorn.rb
# The number of worker processes you have here should equal the number of CPU
# cores your server has.
worker_processes (ENV['RAILS_ENV'] == 'production' ? 4 : 1)
More information on the Sidekiq concurrency can be found here:
You can tune the amount of concurrency in your sidekiq process. By default, one sidekiq process creates 25 threads. If that's crushing your machine with I/O, you can adjust it down:
sidekiq -c 10
Don't set the concurrency higher than 50. I've seen stability issues with concurrency of 100, for example. Note that ActiveRecord has a connection pool which needs to be properly configured in config/database.yml to work well with heavy concurrency. Set the pool setting to something close or equal to the number of threads:
production:
adapter: mysql2
database: foo_production
pool: 25
I'm trying to scale up an app server to process over 20,000 requests per minute.
When I stress-test the requests, most requests are easily handling 20,000 RPM or more.
But, requests that need to make an external HTTP request (eg, Facebook Login) bring the server down to a crawl (3,000 RPM).
I conceptually understand the limitations of my current environment -- 3 load-balanced servers with 4 unicorn workers per server can only handle 12 requests at a time, even if all of them are waiting on HTTP requests.
What are my options for scaling this better? I'd like to handle many more connections at once.
Possible solutions as I understand it:
Brute force: use more unicorn workers (ie, more RAM) and more servers.
Push all the blocking operations into background/worker processes to free up the web processes. Clients will need to poll periodically to find when their request has completed.
Move to Puma instead of Unicorn (and probably to Rubinius from MRI), so that I can use threads instead of processes -- which may(??) improve memory usage per connection, and therefore allow the number of workers to be increased.
Fundamentally, what I'm looking for is: Is there a better way to increase the number of blocked/queued requests a single worker can handle so that I can increase the number of connections per server?
For example, I've heard discussion of using Thin with EventMachine. Does this open up the possibility of a Rails worker that can put down the web request it's currently working on (because that one is waiting on an external server) and then picks up another request while it's waiting? If so, is this a worthwhile avenue to pursue for performance compared with Unicorn and Puma? (Does it strongly depend on the runtime activities of the app?)
Unicorn is a single-threaded, multi-process synchronous app server. It's not a good match for this kind of processing.
It sounds like your application is I/O bound. This argues for an event-oriented daemon to process your requests.
I'd recommend trying EventMachine and the em-http-request and em-http-server.
This will allow you to service both incoming requests to the http server and outgoing HTTP service calls asynchronously.
I'm quite new to Ruby web apps (coming from java).
I have VPS that has 1 CPU and 2GB of RAM and would like to play with some rails/sinatra stuff.
I'm using Ruby 2.1.0 MRI
How does number of CPUs maps to number of web server's processes I need to run? I use puma as a web server and have default threads (0,16) set up. But I noticed there is also "workers" option that forks another process to better handle multiple requests.
Do I understand correctly that for such setup (1 CPU) there is no point in running 2 web server processes? The only reasonable setup is 1 process with threads?
Oh now this is a pretty big question!
The number of processes and threads aren't necessarily linked to the number of CPU's. It's more a case of the amount of memory available, the amount of concurrent requests and the amount of 'locking' stuff that's going on.
If you're going to have long running requests that block other requests, then having additional processes can help with that. You can still have more than one process with a single CPU.
There are a number of different servers in Ruby that handle scaling in different ways, Unicorn, Puma, Thin are some of them. Doing a search on Unicorn vs Puma vs Thin can turn up some useful blog posts on the topic.
Here's a couple
http://ylan.segal-family.com/blog/2012/08/20/better-performance-on-heroku-thins-vs-unicorn-vs-puma/
https://www.engineyard.com/articles/rails-server
https://www.ruby-forum.com/topic/1822610
And some information on concurrency in Ruby
http://merbist.com/2011/02/22/concurrency-in-ruby-explained/
The TL:DR answer is, it depends!
Sorry if this might seem obvious. I've monitored that a web request on my Rails app uses 30-33% of CPU every time. For example, if I load a web page, then 30% of CPU is used. Does that mean that my box can only handle 3 concurrent web requests, and will stall if there are more than 3 web requests (i.e. I'll get a 100% CPU)?
If so, does that also mean that if I want to handle more than 3 concurrent web requests, then I'll have to get more servers to handle the load using a load balancer? (e.g. to handle 6 concurrent web requests, I'll need 2 servers; for 9 concurrent requests, I'll need 3 servers; for 12, I'll need 4 servers -- and so on?)
I think you should start with load tests. I wouldn't trust manual testing that much.
Load tests tell you how long the response takes for each client, and how many clients
simply time-out.
Also you will be able to measure the improvements objectively for any changes that you make.
Look at ab, or httperf; there are many other tools available.
Stephan
Your Apache or Nginx in front of the Passenger will queue requests until a Passenger worker becomes available. You can limit the number of concurrent workers so your server never stalls (but new visitors will have to wait longer until it's their turn).
It's difficult to tell based on this information. It depends very much on the web server stack you're using and which environment you're running. Different servers (Mongrel, Webrick, Apache using various mechanisms, Unicorn) all have different memory characteristics. Different environments (development vs. test vs. production) all exhibit radically different memory usage characteristics.