Why are multiple Ruby processes on an EC2 server causing 100% CPU utilisation? - ruby-on-rails

I have a Rails application which has 100% CPU utilization most of the time.
I am not able to figure out why there is so much load on the server. I am using the Puma web server with a default configuration, and am running multiple background jobs using the sucker-punch gem. There are 7 files which are using sucker punch jobs with 5 workers:
include SuckerPunch::Job
workers 5
I ran the top -i query and found the following processes running on the server. I can see multiple Ruby commands on the server. Can someone tell me whether this is normal behavior on a server, or if something is wrong?

Some Ways to Reduce Resource Contention
Your user space load is high (~48%), so you'll probably want to reduce the number of workers in your web application, increase the number of CPUs available on your instance, move to a version of Ruby that has better concurrency and real multi-core support (e.g. Rubinius or JRuby), or some combination of these options. Depending on what your code is actually doing, you may also need to re-architect your application to offload expensive I/O from the application server.
In addition, your steal time is quite high (~41%), so your EC2 instance is probably overloaded. Simply moving your application to a less-loaded instance may free up sufficient resources to reduce application wait times.

Related

Best configuration of Auto Scaling Group for Rails application deployed using NGINX and Puma

I am using the Amazon Auto Scaling group for Rails application deployed on an EC2 instance using NGINX and Puma. I am facing some challenges with the configuring of the Auto Scaling policy.
I am using r5.xlarge for the main instance that is hosting my corn jobs and r5.large for the autoscaling instance. My current scaling trigger is defined on the 50% CPU but apparently, that does not work due to the following reasons
Since the main instance has 4 CPUs the overall consumption did not hit 50% unless there is some corn job running that is consuming all resources.
Even if the CPU will hit 50% the startup time of rails application is 30-40 seconds and in the meantime, all requests received by the server returns 503.
If the CPU consumption is less than 50% but the system receives a lot of concurrent requests it does not start a new instance and either start returning 503 or the response time increases significantly.
I have tried changing the auto-scaling group from CPU consumption to the number of requests but the start time issue of instance still prevails and sometimes it starts a new instance when it is not even needed.
Have you ever faced any such issue with Rails deployment, anything that you thinks worked for your out of the box?
We are running Ruby application with PUMA in ECS Tasks, but should be quite the same problematic that with EC2.
Since Ruby is single threaded, your Ruby Process running your PUMA server is only going to use one CPU at a time. If you have 4 CPU, I imagine one PUMA process will never manage to saturate more than 25% of the overall machine.
Note: Also have a look at your configuration regarding the number of PUMA Threads. This is also critical to configure, since you are doing auto-scaling, your application NEED to be able to saturate the CPU it's using, to be able to kick in. With too few Puma Thread it will not be the case, with too much your application will become unstable, this is something to fine tune.
Recommendation:
Run one PUMA process per CPU you have available with the EC2 class you have chosen, each PUMA server listening on a different port, have your load-balancer manage that. This should allow your machine to reach potentially 100% CPU during saturation (in theory), allowing auto-scaling base on CPU to work
Preferred solution: Pick smaller machines, with 1 CPU, so you only need to run one PUMA server per machine.
From my experience with ECS, Ruby and other single threaded languages should not use more than 1 (v)CPU machines, and you should instead really on heavy horizontal scaling if necessary (some of our service are running 50x ECS instances).
Hope this helps.

Rails application servers

I've been reading information about how different rails application servers work for a while and some things got me confused probably because of my lack of knowledge in this field. Anyway, the following things got me confused:
Puma server has the following line about its clustered mode workers number in its readme:
On a ruby implementation that offers native threads, you should tune this number to match the number of cores available
So if I have, lets say, 2 cores and use rubinius as a ruby implementation, should I still use more than 1 process considering that rubinius use native threads and doesn't have the lock thus it uses all the CPU cores anyway, even with 1 process?
I understand it that I'd need to only increase the threads pool of the only process if I upgrade to a machine with more cores and memory, if it's not correct, please explain it to me.
I've read some articles on using Server-Sent Events with puma which, as far as I understand, blocks the puma thread since the browser keeps the connection open, so if I have 16 threads and 16 people are using my site, then the 17th would have to wait before one of those 16 leaves so it could connect? That's not very efficient, is it? Or what do I miss?
If I have a 1 core machine with 3Gb of RAM, just for the sake of the question, and using unicorn as my application server and 1 worker takes 300 MBs of memory and its CPU usage is insignificant, how many workers should I have? Some say that the number of workers should be equal to the number of cores, but if I set the workers number to, lets say, 7 (since I have enough RAM for it), it will be able to handle 7 concurrent requests, won't it? So it's just a question of memory and cpu usage and amount of RAM? Or what do I miss?

Ruby rack app on 1 CPU - how many processes should I run?

I'm quite new to Ruby web apps (coming from java).
I have VPS that has 1 CPU and 2GB of RAM and would like to play with some rails/sinatra stuff.
I'm using Ruby 2.1.0 MRI
How does number of CPUs maps to number of web server's processes I need to run? I use puma as a web server and have default threads (0,16) set up. But I noticed there is also "workers" option that forks another process to better handle multiple requests.
Do I understand correctly that for such setup (1 CPU) there is no point in running 2 web server processes? The only reasonable setup is 1 process with threads?
Oh now this is a pretty big question!
The number of processes and threads aren't necessarily linked to the number of CPU's. It's more a case of the amount of memory available, the amount of concurrent requests and the amount of 'locking' stuff that's going on.
If you're going to have long running requests that block other requests, then having additional processes can help with that. You can still have more than one process with a single CPU.
There are a number of different servers in Ruby that handle scaling in different ways, Unicorn, Puma, Thin are some of them. Doing a search on Unicorn vs Puma vs Thin can turn up some useful blog posts on the topic.
Here's a couple
http://ylan.segal-family.com/blog/2012/08/20/better-performance-on-heroku-thins-vs-unicorn-vs-puma/
https://www.engineyard.com/articles/rails-server
https://www.ruby-forum.com/topic/1822610
And some information on concurrency in Ruby
http://merbist.com/2011/02/22/concurrency-in-ruby-explained/
The TL:DR answer is, it depends!

Mongrel does not use the full CPU power on Windows 2003 server

I have a deployment using:
rails 2.3.2
ruby 1.8.7
mysql db
and
3 mongrel instances (windows services) with apache as load balancer
[I know it is due for upgrade...]
OS: Windows2003
We have many CPU intensive tasks and when these occur on the 4 core machine the mongrel process is able to only use a max 25% cpu power on the core the task was scheduled.
After running many tests we noticed that it is only able to use the power of a single core and therefore there is time lag in finishing tasks.
There is a suggestion to virtualize... which is difficult to do on the client server.
Has anyone got any suggestion on how the situation can be improved? Memory does reach 250MB to 1GB for this process but this not such a big issue.
Thanks in advance
Linus
The typically used Ruby versions (MRI or YARV, i.e 1.8 or 1.9) are not able to use more than one core at a time. MRI is single-threaded and just provides green threads internally. YARV uses real OS threads but has a GIL (global interpreter lock) that ensures that only one thread is running at a time.
Thus, your mongrels are unable to use more than one core each (even if you would have coded your Rails app to be threadsafe). There are alternative Ruby implementations, like JRuby or Rubinius that provide native threads without a global interpreter lock and thus allow your app to use more than one core, but you'd probably need to adapt your app a bit.
That said, even then a single request would run in a single thread and thus only use a single core. But that is something that is hard to come by without you handling your own threads (or at least fibers in 1.9) which is most probably not worth the hassle.
So generally, the recommendation is to start multiple app server processes (mongrels in your case). I personally use about 1.5 - 3 per core (depending on the app). That way, you are able to answer that many parallel requests and fully use your available CPU power shared between them.

Ruby on Rails: How would I handle 10 concurrent users? Do I need more CPU?

Sorry if this might seem obvious. I've monitored that a web request on my Rails app uses 30-33% of CPU every time. For example, if I load a web page, then 30% of CPU is used. Does that mean that my box can only handle 3 concurrent web requests, and will stall if there are more than 3 web requests (i.e. I'll get a 100% CPU)?
If so, does that also mean that if I want to handle more than 3 concurrent web requests, then I'll have to get more servers to handle the load using a load balancer? (e.g. to handle 6 concurrent web requests, I'll need 2 servers; for 9 concurrent requests, I'll need 3 servers; for 12, I'll need 4 servers -- and so on?)
I think you should start with load tests. I wouldn't trust manual testing that much.
Load tests tell you how long the response takes for each client, and how many clients
simply time-out.
Also you will be able to measure the improvements objectively for any changes that you make.
Look at ab, or httperf; there are many other tools available.
Stephan
Your Apache or Nginx in front of the Passenger will queue requests until a Passenger worker becomes available. You can limit the number of concurrent workers so your server never stalls (but new visitors will have to wait longer until it's their turn).
It's difficult to tell based on this information. It depends very much on the web server stack you're using and which environment you're running. Different servers (Mongrel, Webrick, Apache using various mechanisms, Unicorn) all have different memory characteristics. Different environments (development vs. test vs. production) all exhibit radically different memory usage characteristics.

Resources