High traffic rails perf tuning - ruby-on-rails

I was attempting to evaluate various Rails server solutions. First on my list was an nginx + passenger system. I spun up an EC2 instance with 8 gigs of RAM and 2 processors, installed nginx and passenger, and added this to the nginx.conf file:
passenger_max_pool_size 30;
passenger_pool_idle_time 0;
rails_framework_spawner_idle_time 0;
rails_app_spawner_idle_time 0;
rails_spawn_method smart;
I added a little "awesome" controller to rails that would just render :text => (2+2).to_s
Then I spun up a little box and ran this to test it:
ab -n 5000 -c 5 'http://server/awesome'
And the CPU while this was running on the box looked pretty much like this:
05:29:12 PM CPU %user %nice %system %iowait %steal %idle
05:29:36 PM all 62.39 0.00 10.79 0.04 21.28 5.50
And I'm noticing that it takes only 7-10 simultaneous requests to bring the CPU to <1% idle, and of course this begins to seriously drag down response times.
So I'm wondering, is a lot of CPU load just the cost of doing business with Rails? Can it only serve a half dozen or so super-cheap requests simultaneously, even with a giant pile of RAM and a couple of cores? Are there any great perf suggestions to get me serving 15-30 simultaneous requests?
Update: tried the same thing on one of the "super mega lots and lots of CPUs" EC2 thing. Holy crap was that a lot of CPU power. The sweet spot seemed to be about 2 simultaneous requests per CPU, was able to get it up to about 630 requests/second at 16 simultaneous requests. Don't know if that's actually cost efficient over a lot of little boxes, though.

I must say that my Rails app got a massive boost to supporting about 80 concurrent users from about 20 initially supported after adding some memcached servers (4 mediums at EC2). I run a high traffic sports site that really hit it a few months ago. Database size is about 6 gigs with heavy updates/inserts.
MySQL (RDS large high usage) cache also helped a bit.
I've tried playing with the passenger settings but got some curious results - like for example each thread eats up 250 megs of RAM which is odd considering the application isn't that big.
You can also save massive $ by using spot instances but don't rely entirely on that - their pricing seems to spike on occasion. I'd AutoScale with two policies - one with spot instances and another with on demand (read: reserved) instances.

Related

Rails application servers

I've been reading information about how different rails application servers work for a while and some things got me confused probably because of my lack of knowledge in this field. Anyway, the following things got me confused:
Puma server has the following line about its clustered mode workers number in its readme:
On a ruby implementation that offers native threads, you should tune this number to match the number of cores available
So if I have, lets say, 2 cores and use rubinius as a ruby implementation, should I still use more than 1 process considering that rubinius use native threads and doesn't have the lock thus it uses all the CPU cores anyway, even with 1 process?
I understand it that I'd need to only increase the threads pool of the only process if I upgrade to a machine with more cores and memory, if it's not correct, please explain it to me.
I've read some articles on using Server-Sent Events with puma which, as far as I understand, blocks the puma thread since the browser keeps the connection open, so if I have 16 threads and 16 people are using my site, then the 17th would have to wait before one of those 16 leaves so it could connect? That's not very efficient, is it? Or what do I miss?
If I have a 1 core machine with 3Gb of RAM, just for the sake of the question, and using unicorn as my application server and 1 worker takes 300 MBs of memory and its CPU usage is insignificant, how many workers should I have? Some say that the number of workers should be equal to the number of cores, but if I set the workers number to, lets say, 7 (since I have enough RAM for it), it will be able to handle 7 concurrent requests, won't it? So it's just a question of memory and cpu usage and amount of RAM? Or what do I miss?

Scaling Puppet - when is too much for WEBrick?

I've found the following at Docs: Scaling Puppet:
Are you using the default webserver?
WEBrick, the default web server used to enable Puppet’s web services connectivity, is essentially a reference implementation, and becomes unreliable beyond about ten managed nodes. In any sort of production environment serving many nodes, you should switch to a more efficient web server implementation such as Passenger or Mongrel.
Where does the the number 10 come from in "ten managed nodes"?
I have a little over 20 nodes and I might soon have little over 30. Should I change to Passenger or not?
You should change to Passenger when you start having problems with WEBrick (or a little before). When that happens for you will depend on your workload.
The biggest problem with WEBrick is that it's single-threaded and blocking; once it's started working on a request, it cannot handle any other requests until it's done with the first one. Thus, what will make the difference to you is how much of the time Puppet spends processing requests.
Each time a client asks for its catalog, that's a request. Each separate file retrieved via puppet:/// URLs is also a request. If you're using Puppet lightly, each catalog won't take too long to generate, you won't be distributing many files on any given Puppet run, and each client won't be taking more than four to six seconds of server time every hour. If each client takes four seconds of server time per hour, 10 clients have a 5% chance of collisions0--of at least one client having to wait while another's request is processed. For 20 or 30 clients, those chances are 19% and 39%, respectively. As long as each request is short, you might be able to live with some contention, but the odds of collisions increase pretty quickly, so if you've got more than, say, 50 hosts (75% collision chance) you really ought to by using Passenger unless you're doing active performance measuring that shows that you're doing okay.
If, however, you're working your Puppet master harder--taking longer to generate catalogs, serving lots of files, serving large files, or whatever--you need to switch to Passenger sooner. I inherited a set of about thirty hosts with a WEBrick Puppet master where things were doing okay, but when I started deploying new systems, all of the Puppet traffic caused by a fresh deployment (including a couple of gigabyte files1) was preventing other hosts from getting their updates, so that's when I was forced to switch to Passenger.
In short, you'll probably be okay with 30 nodes if you're not doing anything too intense with Puppet, but at that point you need to be monitoring the performance of at least your Puppet master and preferably your clients' update status, too, so you'll know when you start running beyond the capabilities of WEBrick.
0 This is a standard birthday paradox calculation; if n is the number of clients and s is the average number of seconds of server time each client uses per hour, then the chance of having at least one collision during an hour is given by 1-(s/3600)!/((s/3600)^n*((s/3600)-n)!).
1 Puppet isn't really a good avenue for distributing files of this size in any case. I eventually switched to putting them on an NFS share that all of the hosts had access to.
For 20-30 nodes, there shouldn't be any problem. Note that passenger provides some additional features. It may be faster serving the nodes, but I am not sure how much improvement you will get if you have only 30 nodes.
You should change to passenger if you are using more than hundred nodes. I started seeing problems when the number of nodes requesting service from the puppet-master reached about 200. In my case, with the default web-server, about 5% of the nodes (random) couldn't receive the catalog during hourly run.

Apache+Passenger Optimization Questions?

I am working to optimize the speed of a Rails application we have in production.
It is built on an Apache,Passenger,Rails 2.35, Ubuntu on an m1.large EC2 instance (7.5GB RAM, 4 Compute Units) We will be switching to nginx in the near term, but have some dependancies on Apache right now.
I'm running load tests against it with ab and httperf.
We are seeing consistently 45 request/sec varying between 30 - 200 concurrent users for a simple API request to get a single record from a database table with 100k records indexed. It seems very slow to me.
We are focusing on both application code optimization as well as server configuration optimization.
I am currently focused on server configuration optimization. Here's what I've done so far:
Passenger adjusted PassengerMaxPoolSize to (90% Total Memory) / (Memory per passenger process, 230mb) Apache
adjusted MaxClients to 256 max.
I did both independently and together I saw no impact to the
req/sec.
Another note is that this seems to scale linearly by adding servers. So doesn't seem to be a database issue. 2 servers 90 req/sec, 3 servers was around 120 request/sec... etc
Any tips? It just seems like we should be getting better performance by adding more processes?

calculate the number of simultaneous Passenger instances

I need to find out if a server I have is capable of handling a number of traffic. I'm running ruby on rails with passenger and apache.
So let say on average a page takes 2 seconds to render and their will be 200k visitors in a day. The busiest hour will see 300 page views in a minute. From this how can I work out how many simultaneous Passenger instances I'll need to handle the expected load and then from that how much RAM I'll need to handle the required number of Passenger processes.
Hopefully this will tell me what server(s) I'll need and maybe a load balancer(s)?
The only way to know for sure is to simulate the load with a benchmarking tool. Memory usage is highly application specific, and can even depend on the areas of the application you're exercising, so if you can generate reasonable diversity in your test data you'll have a much better idea of how it scales.
For a rough start try the ab tool that comes with Apache. For something more complete, there are a number of simulation systems that will perform a series of events like logging in, viewing pages, and so on, like Selenium.

Is 100 or less requests per second (for non-cached pages) what one can expect with Rails?

Preface: Please don't start a discussion on premature optimization, or anything related. I'm just trying to understand what kind of performance I can get from a single server with rails.
I have been benchmarking ruby on rails 3 and it seems the highest rate of requests per second I can get is around 100 requests per second.
I used phusion passenger with nginx, and Ruby 1.8.7.
This is on a ec2 m1.large instance:
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large
The page was a very simple action that wrote a single row into mysql.
user = User.new
user.name = "test"
user.save
I am assuming no caching (memcache, etc), I just want to get a feel for the raw numbers.
I used apache bench on the same ec2 instance, and I used varying levels of # of requests (from 1000 to 10000 and varying numbers of concurrent requests 1/5/10/25/50/100).
The EC2 m1.large instance is really not that fast, so these numbers aren't surprising. If you want performance, you can either spring for a larger instance, as there are now some with 80 ECUs, or try a different provider.
I've found that Linode generally offers higher performance for the same price, but is not as flexible and doesn't scale to very large numbers of servers as well. It's a much better fit if you're in the "under 20 servers" phase of rolling out.
Also don't think that MySQL is a no-cost operation.

Resources