Rails application servers - ruby-on-rails

I've been reading information about how different rails application servers work for a while and some things got me confused probably because of my lack of knowledge in this field. Anyway, the following things got me confused:
Puma server has the following line about its clustered mode workers number in its readme:
On a ruby implementation that offers native threads, you should tune this number to match the number of cores available
So if I have, lets say, 2 cores and use rubinius as a ruby implementation, should I still use more than 1 process considering that rubinius use native threads and doesn't have the lock thus it uses all the CPU cores anyway, even with 1 process?
I understand it that I'd need to only increase the threads pool of the only process if I upgrade to a machine with more cores and memory, if it's not correct, please explain it to me.
I've read some articles on using Server-Sent Events with puma which, as far as I understand, blocks the puma thread since the browser keeps the connection open, so if I have 16 threads and 16 people are using my site, then the 17th would have to wait before one of those 16 leaves so it could connect? That's not very efficient, is it? Or what do I miss?
If I have a 1 core machine with 3Gb of RAM, just for the sake of the question, and using unicorn as my application server and 1 worker takes 300 MBs of memory and its CPU usage is insignificant, how many workers should I have? Some say that the number of workers should be equal to the number of cores, but if I set the workers number to, lets say, 7 (since I have enough RAM for it), it will be able to handle 7 concurrent requests, won't it? So it's just a question of memory and cpu usage and amount of RAM? Or what do I miss?

Related

Why are multiple Ruby processes on an EC2 server causing 100% CPU utilisation?

I have a Rails application which has 100% CPU utilization most of the time.
I am not able to figure out why there is so much load on the server. I am using the Puma web server with a default configuration, and am running multiple background jobs using the sucker-punch gem. There are 7 files which are using sucker punch jobs with 5 workers:
include SuckerPunch::Job
workers 5
I ran the top -i query and found the following processes running on the server. I can see multiple Ruby commands on the server. Can someone tell me whether this is normal behavior on a server, or if something is wrong?
Some Ways to Reduce Resource Contention
Your user space load is high (~48%), so you'll probably want to reduce the number of workers in your web application, increase the number of CPUs available on your instance, move to a version of Ruby that has better concurrency and real multi-core support (e.g. Rubinius or JRuby), or some combination of these options. Depending on what your code is actually doing, you may also need to re-architect your application to offload expensive I/O from the application server.
In addition, your steal time is quite high (~41%), so your EC2 instance is probably overloaded. Simply moving your application to a less-loaded instance may free up sufficient resources to reduce application wait times.

ruby requests more memory when there are plenty free heap slots

We have a server running
Sidekiq 4.2.9
rails 4.2.8
MRI 2.1.9
This server periodically produce some amount of importing from external API's, perform some calculations on them and save these values to the database.
About 3 weeks ago server started hanging, as I see from NewRelic (and when ssh'ed to it) - it consumes more and more memory over time, eventually occupying all available RAM, then server hangs.
I've read some articles about how ruby GC works, but still can't understand, why at ~5:30 AM heap size jumps from ~2.3M to 3M , when there's still 1M free heap slots available(GC settings are default)
similar behavior, 3:35PM:
So, the questions are:
how to make Ruby fill free heap slots instead of requesting new slots from OS ?
how to make it release free heap slots to the system ?
how to make Ruby fill free heap slots instead of requesting new slots from OS ?
Your graph does not have "full" fidelity. It is a lot to assume that GC.stat was called by Newrelic or whatnot just at the exact right time.
It is incredibly likely that you ran out of slots, heap grew and since heaps don't shrink in Ruby you are stuck with a somewhat bloated heap.
To alleviate some of the pain you can limit RUBY_GC_HEAP_GROWTH_MAX_SLOTS to a sane number, something like 100,000 will do, I am trying to lobby setting a default here in core.
Also
Create a persistent log of jobs that run and time they ran (duration and so on), gather GC.stat before and after job runs
Split up your jobs by queue, run 1 queue on one server and other queue on another one, see which queue and which job is responsible for the problem
Profile various jobs you have using flamegraph or other profiling tools
Reduce the amount of concurrent jobs you run as an experiment, or place a mutex between certain job types. It is possible that 1 "job a" at a time is OKish, and 20 concurrent "job a"s at a time will bloat memory.

Why is my PostgreSQL server cpu constrained?

My database is very cpu constrained, and I can't find the root cause of the issue. I currently have two applications servers each wit a Rails api connecting to PostgreSQL via the ruby-pg gem. Both application server also have sidekiq running background jobs, and I have a handful of support servers processing new posts from a national feed via sidekiq. If I were running out of memory, the solution would seemingly be straight forward. Any general ideas why I am CPU constrained?
Database Specs:
Rackspace 8GB Performance Tier cloud VM (8GB RAM, 8x Core CPU, SSD)
Debian 7 Wheezy Linux OS
PostgreSQL 9.1 with PostGIS extension
Possible Problems:
PostgreSQL 9.1 is bad at indexes
The database has nearly 10GB of indexes. I am going to upgrade my database to PostgreSQL version >= 9.2. In version 9.2, index only scans were introduced.
Too many connections
In the postgresql.conf, I have set max connection equal to '500'. Usually throughout the day, only 175 connections are utilized, but during peak times, sidekiq tasks will increase the current connections to 350. How many connections are recommended with an 8GB server instance?
Idol Connections
When I take a look at pg_stat_activity in the psql console, I see sidekiq is leaving a lot of IDLE connections. Could these connections result in CPU inflation? Does the fix exist in the api or in sidekiq?
Need a more powerful server
Maybe there is not a bug. I might need to simply increase the server instance. Again this would make more sense if I was memory bound. However, both app servers and 3 of the support sidekiq servers are 4gb performance tier instances. Essentially, servers that interact with the database have combined more than double the resources of the database. Should this even matter?
Additional questions:
What tools/techniques should I employ to troubleshoot the issue?
Any basic settings in the postgresql.conf related to cpu usage?
Are there any known issues related to rails, sidekiq, or the pg gem that could be a contributing factor? (I havent seen any open issues.)
Are there any general postgreSQL guideline for CPU usage?
Any other ideas thoughts that might help my search?
You are using massively too many concurrent connections. PostgreSQL will be wasting lots of its time on housekeeping and juggling concurrent queries. All the concurrent work will be fighting for CPU and buffer space, there'll be heavy contention on spinlocks, and it'll all generally be a mess.
On an 8 core machine, you should probably not have more than 20 actively working connections if you're mostly CPU constrained. If you're I/O limited, you can go higher, but 350 is just ridiculous.
If possible, put a PgBouncer in transaction pooling mode in front of your PostgreSQL instance, so queries get queued up and executed rapidly in series instead of slowly in parallel.
See number of database connections (Pg wiki).
Additionally, PostGIS can be very CPU-heavy. It sometimes needs to do very complex calculations. I suggest using the auto_explain module to record long running queries, and using pg_stat_statements / pg_stat_plans to record what's taking up resources. Examine these queries to see if they need improvement.
Your idle in transaction sessions must be dealt with, too. Depending on why they're idle and whether they have a transaction ID or not, they might be causing serious table bloat. They're also creating unnecessary signalling overhead within PostgreSQL, as it has to do more co-ordination with backends that're actively doing things. Finally, the number of open transactions its self increases the cost of some internal housekeeping operations.
So. Your DB will probably perform better if you reduce the connection counts, put a PgBouncer in transaction pooling mode in front, and fix those idle connections.
Most likely you are CPU constrained because your work needs a lot of CPU. :)
9.1 is not generally bad at indexes. There may be some specific issues, as all versions might, which exactly what they are might change from version to version.
Index-only-scans are mostly a benefit when you are IO constrained. I wouldn't hold out much hope for that being a magic bullet for you.
350 connections are certainly not helpful, but probably are not very harmful, either. But when they are harmful, it can be downright catastrophic. The correct value is more determined by the number of cores, not the amount of RAM. If it is easy to throttle down the sidekiq connections, do it even if you can't prove that it helps.
If the connections are just IDLE, not IDLE in transaction, then they probably aren't very harmful, but again there are a few cases where they can be. That is pretty much the same issue as the number of connections.
The connection you showed from top was idle in transaction. That status shouldn't be taking up much CPU, so that probably means it is rapidly cycling through statements and top just happens to catch it while it is between them. But you didn't say how many similar lines there were in top, if it is just that one it suggests your code is not running concurrently and 7 of you 8 CPUs are wasted.
Regarding the db server versus the other servers, if the database is fundamentally the limit, beating on it with a bigger hammer is not going to help. Often there is some flexibility about where computation is done. If you can get the app servers to do more computation that is currently done on the db and let the db focus on ACID issues, that would be good. But no one but you can know if that is possible or feasible.
My first stop would be to use pg_stat_statements to see what SQL statements are taking the most time. Maybe just adding an index to the slowest/most frequent query would make the problem magically go away.

Ruby rack app on 1 CPU - how many processes should I run?

I'm quite new to Ruby web apps (coming from java).
I have VPS that has 1 CPU and 2GB of RAM and would like to play with some rails/sinatra stuff.
I'm using Ruby 2.1.0 MRI
How does number of CPUs maps to number of web server's processes I need to run? I use puma as a web server and have default threads (0,16) set up. But I noticed there is also "workers" option that forks another process to better handle multiple requests.
Do I understand correctly that for such setup (1 CPU) there is no point in running 2 web server processes? The only reasonable setup is 1 process with threads?
Oh now this is a pretty big question!
The number of processes and threads aren't necessarily linked to the number of CPU's. It's more a case of the amount of memory available, the amount of concurrent requests and the amount of 'locking' stuff that's going on.
If you're going to have long running requests that block other requests, then having additional processes can help with that. You can still have more than one process with a single CPU.
There are a number of different servers in Ruby that handle scaling in different ways, Unicorn, Puma, Thin are some of them. Doing a search on Unicorn vs Puma vs Thin can turn up some useful blog posts on the topic.
Here's a couple
http://ylan.segal-family.com/blog/2012/08/20/better-performance-on-heroku-thins-vs-unicorn-vs-puma/
https://www.engineyard.com/articles/rails-server
https://www.ruby-forum.com/topic/1822610
And some information on concurrency in Ruby
http://merbist.com/2011/02/22/concurrency-in-ruby-explained/
The TL:DR answer is, it depends!

Unicorn CPU usage spiking during load tests, ways to optimize

I am interested in ways to optimize my Unicorn setup for my Ruby on Rails 3.1.3 app. I'm currently spawning 14 worker processes on High-CPU Extra Large Instance since my application appears to be CPU bound during load tests. At about 20 requests per second replaying requests on a simulation load tests, all 8 cores on my instance get peaked out, and the box load spikes up to 7-8. Each unicorn instance is utilizing about 56-60% CPU.
I'm curious what are ways that I can optimize this? I'd like to be able to funnel more requests per second onto an instance of this size. Memory is completely fine as is all other I/O. CPU is getting tanked during my tests.
If you are CPU bound you want to use no more unicorn processes than you have cores, otherwise you overload the system and slow down the scheduler. You can test this on a dev box using ab. You will notice that 2 unicorns will outperform 20 (number depends on cores, but the concept will hold true).
The exception to this rule is if your IO bound. In which case add as many unicorns as memory can hold.
A good performance trick is to route IO bound requests to a different app server hosting many unicorns. For example, if you have a request that uses a slow sql query, or your waiting on an external request, such as a credit card transaction. If using nginx, define an upstream server for the IO bound requests, forward those urls to a box with 40 unicorns. CPU bound or really fast requests, forward to a box with 8 unicorns (you stated you have 8 cores, but on aws you might want to try 4-6 as their schedulers are hypervised and already very busy).
Also, I'm not sure you can count on aws giving you reliable CPU usage, as your getting a percentage of an obscure percentage.
First off, you probably don't want instances at 45-60% cpu. In that case, if you get a traffic spike, all of your instances will choke.
Next, 14 Unicorn instances seems large. Unicorn does not use threading. Rather, each process runs with a single thread. Unicorn's master process will only select a thread if it is able to handle it. Because of this, the number of cores isn't a metric you should use to measure performance with Unicorn.
A more conservative setup may use 4 or so Unicorn processes per instance, responding to maybe 5-8 requests per second. Then, adjust the number of instances until your CPU use is around 35%. This will ensure stability under the stressful '20 requests per second scenario.'
Lastly, you can get more gritty stats and details by using God.
For a high CPU extra large instance, 20 requests per second is very low. It is likely there is an issue with the code. A unicorn-specific problem seems less likely. If you are in doubt, you could try a different app server and confirm it still happens.
In this scenario, questions I'd be thinking about...
1 - Are you doing something CPU intensive in code--maybe something that should really be in the database. For example, if you are bringing back a large recordset and looping through it in ruby/rails to sort it or do some other operation, that would explain a CPU bottleneck at this level as opposed to within the database. The recommendation in this case is to revamp the query to do more and take the burden off of rails. For example, if you are sorting the result set in your controller, rather than through sql, that would cause an issue like this.
2 - Are you doing anything unusual compared to a vanilla crud app, like accessing a shared resource, or anything where contention could be an issue?
3 - Do you have any loops that might burn CPU, especially if there was contention for a resource?
4 - Try unhooking various parts of the controller logic in question. For example, how well does it scale if you hack your code to just return a static hello world response instead? I bet suddenly unicorn will be blazlingly fast. Then try adding back in parts of your code until you discover the source of the slowness.

Resources