calculate the number of simultaneous Passenger instances - ruby-on-rails

I need to find out if a server I have is capable of handling a number of traffic. I'm running ruby on rails with passenger and apache.
So let say on average a page takes 2 seconds to render and their will be 200k visitors in a day. The busiest hour will see 300 page views in a minute. From this how can I work out how many simultaneous Passenger instances I'll need to handle the expected load and then from that how much RAM I'll need to handle the required number of Passenger processes.
Hopefully this will tell me what server(s) I'll need and maybe a load balancer(s)?

The only way to know for sure is to simulate the load with a benchmarking tool. Memory usage is highly application specific, and can even depend on the areas of the application you're exercising, so if you can generate reasonable diversity in your test data you'll have a much better idea of how it scales.
For a rough start try the ab tool that comes with Apache. For something more complete, there are a number of simulation systems that will perform a series of events like logging in, viewing pages, and so on, like Selenium.

Related

Heroku: How expensive could running my application be?

I'm about to release an iOS app, and deploy its backend (rails backend that serves the iOS app) to heroku.
I have very little knowledge when it comes to the practical price you will pay based on traffic, etc. This link (http://notes.ericjiang.com/posts/881) states... Nowadays, especially with faster code and faster computers, a standard 512MB dyno can power websites with tens of thousands of hits per hour.
I'm trying to get a rough estimate of how much running my backend on heroku could cost me. What's the best way to figure this out? The pricing is all very straightforward. It basically just comes down to how many dyno's I'm going to need.
If I get 5 beta testers to run my iOS app for a 10 minute window, can I extrapolate some statistics as to how much my backend is being used? Is it the 'hits' that matter, or the 'data' transferred, or the 'time' the backend is actively doing something, like queueing up some resulting data?
Is there a formula to figure it out? Let's say say a user averages 10 hits per minute, and I constantly have an average of 5000 users. That would be 3 million hits per hour. What exactly should I be looking for in trying to determine an accurate pricing for my first backend?
While Heroku do have some limits surrounding bandwidth (not requests) for the most part your cost is close to fixed.
Monthly pricing is typically made up from a combination of:
Dynos
Databases
Addons
Premium support
Heroku provide a price calculator on their website. Further, standard (non-hobby) dynos and up include metrics around CPU usage and memory usage.
My suggestion if you're just starting out? Start with one web dyno and a Postgres database. Beta test your app and check your metrics. A Rails app on a single Standard 1X dyno can handle a reasonable amount of traffic (depending on what else it might be doing) and if you need to add more dynos it's only a command line interface away.
Hope that helps.

Ruby on Rails hosting - Engineyard vs Enterprise-Rails

I've been running a Rails app on 1 big dedicated server. Now for scaling I want to switch to a cloud service hoster and serve the app on 3 instances - App, DB and Redis.
I have really bad experience with Heroku performance wise and hence cost efficiency. So for me 2 Alternatives remain: Engineyard and Enterprise-Rails.
What I find important is that Engineyard doesn't offer an autoscaling option to handle peaks. On the other hand Enterprise-Rails doesn't have too much of documentation, most of it is handled by a support crew which is setting up everything.
What are other differences and what should I use for my website? I don't need much of administration work and I am not experienced with it. Basically I just want my Site to run optimally safe, stable and cost efficient without much personal work involved.
I am running a massive Rails app off AWS at this time and I'm really happy with it. Previously I had a number of dedicated boxes that were always causing problems - sooner or later one of them would crash for some reason, Raid failures, database problems whatnot.
At AWS I use RDS for database, elastic cache for caching, I keep all my code on a fat instance that acts as staging server and get a variable number of reserved instances to load the code via NFS.
I also use autoscaling - we've prepaid for a number of reserved instances and autoscaling helps starting up nodes when CPU usage goes above 60%, then removing them when it goes below 25%. autoscaling rules are based on cloudwatch alerts that can be set to monitor a particular group of instances, memcache servers, and so on, you even get e-mails and SMS notifications via SNS when certain scaling activities take place, say when more than 100 instances are spammed in less than 1 hour (massive traffic spike). The instances also get added right up to the load balancers by the way and you don't need to mess with the session store as you can use the sticky session feature which is quite nice.
Recently I also started using a 2nd launch group with spot instances, this complicated things a bit in terms of cloudwatch rules but I'm able to save a lot every month as spot prices are much lower. When the spot price (minimum) I bid is not enough, the set-up I have switches back to reserved instances.
Even more recently I've also started using CloudFront which got my app's page assets to load real fast (about 2 megs of CSS, JS, some icon sprites). Previously I was serving directly from instances via the load balancers.
This took about 20 hours to deploy, test and tune for maximum performance and availability.
One of the problems I have with AWS is that there's no support unless you're prepared to foot a bill. They claim some support is offered without a subscription but the only option in the support area is Billing. Ha. Fortunately it's all stable enough not to put me in a position where I'd have to pay for it.
Overall Rails fits in quite nice with AWS. I spend less than 2 hours per month doing maintenance, where I was spending over 30 previously. Most important for me is that I know that I can GTFO on a vacation for X months knowing nothing will cause any trouble - haven't had a monitoring alert more than a year.
Later edit: the app is a sports site with white labeling feature, lots of users, lots of administrators working on content in the back-end, database intensive as we show market pricing data that should update every few seconds. I had an average load time of about 3 seconds per page with dedicated servers that were doing about the same thing - database, memcache, storage, load balancing, web app. Now my average is under 1 second. Monthly bill is about 8 times lower now.
While Engine Yard doesn't offer auto-scaling (it is in the pipeline), we do have a fairly easy to use scaling feature that allows you to spin up multiple instances at once in times of need.
The advantages over something like Enterprise-Rails is the full documentation, the choice to deploy from the CLI or the dashboard,and our amazing support team. It's also easier to use Engine Yard and move from a personal machine or from another cloud setup than it is using a service such as AWS directly.

Scaling Puppet - when is too much for WEBrick?

I've found the following at Docs: Scaling Puppet:
Are you using the default webserver?
WEBrick, the default web server used to enable Puppet’s web services connectivity, is essentially a reference implementation, and becomes unreliable beyond about ten managed nodes. In any sort of production environment serving many nodes, you should switch to a more efficient web server implementation such as Passenger or Mongrel.
Where does the the number 10 come from in "ten managed nodes"?
I have a little over 20 nodes and I might soon have little over 30. Should I change to Passenger or not?
You should change to Passenger when you start having problems with WEBrick (or a little before). When that happens for you will depend on your workload.
The biggest problem with WEBrick is that it's single-threaded and blocking; once it's started working on a request, it cannot handle any other requests until it's done with the first one. Thus, what will make the difference to you is how much of the time Puppet spends processing requests.
Each time a client asks for its catalog, that's a request. Each separate file retrieved via puppet:/// URLs is also a request. If you're using Puppet lightly, each catalog won't take too long to generate, you won't be distributing many files on any given Puppet run, and each client won't be taking more than four to six seconds of server time every hour. If each client takes four seconds of server time per hour, 10 clients have a 5% chance of collisions0--of at least one client having to wait while another's request is processed. For 20 or 30 clients, those chances are 19% and 39%, respectively. As long as each request is short, you might be able to live with some contention, but the odds of collisions increase pretty quickly, so if you've got more than, say, 50 hosts (75% collision chance) you really ought to by using Passenger unless you're doing active performance measuring that shows that you're doing okay.
If, however, you're working your Puppet master harder--taking longer to generate catalogs, serving lots of files, serving large files, or whatever--you need to switch to Passenger sooner. I inherited a set of about thirty hosts with a WEBrick Puppet master where things were doing okay, but when I started deploying new systems, all of the Puppet traffic caused by a fresh deployment (including a couple of gigabyte files1) was preventing other hosts from getting their updates, so that's when I was forced to switch to Passenger.
In short, you'll probably be okay with 30 nodes if you're not doing anything too intense with Puppet, but at that point you need to be monitoring the performance of at least your Puppet master and preferably your clients' update status, too, so you'll know when you start running beyond the capabilities of WEBrick.
0 This is a standard birthday paradox calculation; if n is the number of clients and s is the average number of seconds of server time each client uses per hour, then the chance of having at least one collision during an hour is given by 1-(s/3600)!/((s/3600)^n*((s/3600)-n)!).
1 Puppet isn't really a good avenue for distributing files of this size in any case. I eventually switched to putting them on an NFS share that all of the hosts had access to.
For 20-30 nodes, there shouldn't be any problem. Note that passenger provides some additional features. It may be faster serving the nodes, but I am not sure how much improvement you will get if you have only 30 nodes.
You should change to passenger if you are using more than hundred nodes. I started seeing problems when the number of nodes requesting service from the puppet-master reached about 200. In my case, with the default web-server, about 5% of the nodes (random) couldn't receive the catalog during hourly run.

High traffic volume in short amount of time

I'm looking for some advice here. My school's student section registration process is online and involves around 6,000 students
They base seating off first come first serve basis. Every year they open the site at noon and floods of people get on a try to register as fast as possible to get good seats. Every year without fail the server crashes and everyone is mad.
After several years of being frustrated myself, I've offered to redo their registration system.
My plan is to rewrite it in ruby on rails, and use heroku for hosting.
Does a heroku dyno only handle one request at a time?
Heroku scales up to 50 dynos. Will that be enough to handle around 6,000 users with about 5 pageviews per transaction in a short amount of time, say a half hour?
Any helpful strategies or tips you can give me before I dive into this project?
Does a heroku dyno only handle one request at a time?
Yes. Heroku dynos are single threaded.
Heroku scales up to 50 dynos. Will that be enough to handle around
6,000 users with about 5 pageviews per transaction in a short amount
of time, say a half hour?
This depends on how fast your page loads. For arguments sake let's pretend it takes 2s per page request (as per Google Analytics recommendation) and you need to load 6,000 users x 5 page views / 30 minutes - 1000 page views per minute.
At 2s per page load, one single dyno would load 30 page views per minute. At 50 dynos, this would be 1500 page views per minute. This would obviously allow you to exceed your overall goal and leave you some room for error, but if all 6000 users are hitting the page at once then a single Heroku app may not be able to keep up depending on your timeout. You would need to implement a user queue system - explained below.
Any helpful strategies or tips you can give me before I dive into this
project?
All that said, a 2s load time may vary depending on the assets your page needs to load, the amount it needs to interact with a database, it's queries, caching, etc. Your page can also potentially serve much faster.
You also need to worry about the initial hit of all the users. This could be taken care of via a first come first serve queue system - similar to that used by Ticketmaster if you've ever used their site. This could be accomplished via AWS SQS or your preference of queue system.
With a user queue and caching of your assets and common database queries, you should be able to accomplish this with 50 or less dynos.
EDIT: I'm taking your word for it that Heroku will run 50 web dynos. They show 24 as max on their pricing page, but I cannot find any info one way or another.
Does a heroku dyno only handle one request at a time?
It depends on web server you use (https://devcenter.heroku.com/articles/dynos#dynos-and-requests). If you want more concurrency within a dyno, I'd suggest taking a look at something like Puma.
Heroku scales up to 50 dynos. Will that be enough to handle around 6,000 users with about 5 pageviews per transaction in a short amount of time, say a half hour?
Any helpful strategies or tips you can give me before I dive into this project?
You can have more than 50 dynos. A specific answer for you app is going to be way better that a guess or generalization. Run a load test against your site (e.g., using Blitz) and find out the real numbers. Costs for add-ons are pro-rated per second, so you only pay for the period you have it installed. So make sure you uninstall or downgrade Blitz once you've finished your test.

Unicorn CPU usage spiking during load tests, ways to optimize

I am interested in ways to optimize my Unicorn setup for my Ruby on Rails 3.1.3 app. I'm currently spawning 14 worker processes on High-CPU Extra Large Instance since my application appears to be CPU bound during load tests. At about 20 requests per second replaying requests on a simulation load tests, all 8 cores on my instance get peaked out, and the box load spikes up to 7-8. Each unicorn instance is utilizing about 56-60% CPU.
I'm curious what are ways that I can optimize this? I'd like to be able to funnel more requests per second onto an instance of this size. Memory is completely fine as is all other I/O. CPU is getting tanked during my tests.
If you are CPU bound you want to use no more unicorn processes than you have cores, otherwise you overload the system and slow down the scheduler. You can test this on a dev box using ab. You will notice that 2 unicorns will outperform 20 (number depends on cores, but the concept will hold true).
The exception to this rule is if your IO bound. In which case add as many unicorns as memory can hold.
A good performance trick is to route IO bound requests to a different app server hosting many unicorns. For example, if you have a request that uses a slow sql query, or your waiting on an external request, such as a credit card transaction. If using nginx, define an upstream server for the IO bound requests, forward those urls to a box with 40 unicorns. CPU bound or really fast requests, forward to a box with 8 unicorns (you stated you have 8 cores, but on aws you might want to try 4-6 as their schedulers are hypervised and already very busy).
Also, I'm not sure you can count on aws giving you reliable CPU usage, as your getting a percentage of an obscure percentage.
First off, you probably don't want instances at 45-60% cpu. In that case, if you get a traffic spike, all of your instances will choke.
Next, 14 Unicorn instances seems large. Unicorn does not use threading. Rather, each process runs with a single thread. Unicorn's master process will only select a thread if it is able to handle it. Because of this, the number of cores isn't a metric you should use to measure performance with Unicorn.
A more conservative setup may use 4 or so Unicorn processes per instance, responding to maybe 5-8 requests per second. Then, adjust the number of instances until your CPU use is around 35%. This will ensure stability under the stressful '20 requests per second scenario.'
Lastly, you can get more gritty stats and details by using God.
For a high CPU extra large instance, 20 requests per second is very low. It is likely there is an issue with the code. A unicorn-specific problem seems less likely. If you are in doubt, you could try a different app server and confirm it still happens.
In this scenario, questions I'd be thinking about...
1 - Are you doing something CPU intensive in code--maybe something that should really be in the database. For example, if you are bringing back a large recordset and looping through it in ruby/rails to sort it or do some other operation, that would explain a CPU bottleneck at this level as opposed to within the database. The recommendation in this case is to revamp the query to do more and take the burden off of rails. For example, if you are sorting the result set in your controller, rather than through sql, that would cause an issue like this.
2 - Are you doing anything unusual compared to a vanilla crud app, like accessing a shared resource, or anything where contention could be an issue?
3 - Do you have any loops that might burn CPU, especially if there was contention for a resource?
4 - Try unhooking various parts of the controller logic in question. For example, how well does it scale if you hack your code to just return a static hello world response instead? I bet suddenly unicorn will be blazlingly fast. Then try adding back in parts of your code until you discover the source of the slowness.

Resources