How to do a rolling restart of a cluster of mongrels - ruby-on-rails

Anybody know a nice way to restart a mongrel cluster via capistrano in a "rolling" style, eg, one mongrel at a time. Would be great to have a bit of wait time in there as well for each, to let the mongrel load the rails app up as well.
I've done some searching, and haven't found too much, so looking for help before I dive into the mongrel_cluster gem myself.
Thanks!

I agree with the seesaw approach more than the rolling approach you are seeking. The problem is that you end up in situations where load balancing can throw users back and forth between different versions of the application while you are transitioning.
The solutions we came up with (before finding SeeSaw, which we don't use) was to take half of the mongrels off line from the load balancer. Shut them down. Update them. Start them up. Put those mongrels back online in the load balancer and take the other half off. Shut the second half down. Update the second half. Start them up. This greatly minimizes the time where you have two different versions of the application running simultaneously.
I wrote a windows bat file to do this. (Deploying on Windows is not recommended, btw)
It is very important to note that having database migrations can make the whole approach a little dangerous. If you have only additive migrations, you can run those at any time before the deployment. If you are removing columns, you need to do it after the deployment. If you are renaming columns, it is better to split it into a create a new column and copy data into it migration to run before deployment and a separate script to remove the old column after deployment. In fact, it may be dangerous to use your regular migrations on a production database in general if you don't make a specific effort to organize them. All of this points to making more frequent deliveries so each update is lower risk and less complex, but that's a subject for another response.

Seesaw is a gem found in the Rails Oceania Rubyforge Project that provides this kind of functionality to mongrel clusters. However, the project may be suffering from some bit-rot not havain had a release since 2007. Still worth a look even just to pinch the ideas :)

#!/bin/bash
for PIDFILE in /tmp/mongrel.*; do
PID=$(cat ${PIDFILE})
kill ${PID}
${RUN_MONGREL_CMD} ${PID}
sleep 2
done

Related

Zero downtime on Heroku

Is it possible to do something like the Github zero downtime deploy on Heroku using Unicorn on the Cedar stack?
I'm not entirely sure how the restart works on Heroku and what control we have over restarting processes, but I like the possibility of zero downtime deploys and up until now, from what I've read, it's not possible
There are a few things that would be required for this to work.
First off, we'd need backwards compatible migrations. I leave that up to our team to figure out.
Secondly, we'd want to migrate the db right after a push, but before the restart (assuming our migrations are fully backwards compatible, this should not affect anything)
Thirdly, we'd want to instruct Unicorn to launch a new master process and fork some workers, then swap the PIDs and gracefully shut down the old process/workers
I've scoured the docs but I can't find anything that would indicate this is possible on Heroku. Any thoughts?
I can't address migrations, but the part about restarting processes and avoiding wait time:
There is an beta feature for heroku called preboot. After a deploy, it boots your new dynos first and waits a while before switching traffic and killing the old ones:
https://devcenter.heroku.com/articles/labs-preboot/
I also wrote a blog post that has some measurements on my app's performance improvements using this feature:
http://ylan.segal-family.com/blog/2012/08/27/deploy-to-heroku-with-near-zero-downtime/
You might be interested in their feature called preboot.
Taken from their documentation:
This feature provides seamless deploys by booting web dynos with new code before killing existing web dynos.
Some apps take a long time to boot up, and this can cause unacceptable delays in serving HTTP requests during deployment.
There are a few caveats:
You must have at least two web dynos to use this feature. If you have your web process type scaled to 1 or 0, preboot will be disabled.
Whoever is doing the deployment will have to wait a few minutes before the new code starts serving user requests; this happens later than it would without preboot (but in the meanwhile, user requests are still served promptly by old dynos).
There will be a short period (a minute or two) where heroku ps shows the status of the new code, but user requests are still being served by old code.
There is much more information about it, so refer to their documentation.
It is possible, but requires a fair amount of forward planning. As of Rails 3.1 there's three tasks that need carrying out
Upload the new code
Run any database migrations
Sync the assets
Uploading code and restarting is fairly straightforward, the main problem lies with the other two, but the way round them is the pretty much the same.
Essentially you need to:
Make the code compatible with the migration you need to run
Run the migration, and remove any code written specifically for it
For instance, if you want to remove a column, you’ll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.
In short, you need to consider your database and the code compatability an work around them so that the two can overlap in terms of versioning.
An alternative to this method might be to have two versions of the application running on Heroku at the same time. When you deploy, switch the domain to the other version, do the deploy, and switch it back again. This will help in most instances, but again, database compat is an issue.
Personally, I would say that if your deployments are significant to require this sort of consideration, taking parts of the application offline are probably the safest answer. By breaking up an application into several smaller applications can help mitigate this and is a mechanism that I use regularly.
No - this is currently not possible using Unicorn on Heroku cedar. I've been bugging Heroku about this for weeks.
Here was Heroku Support's reply to my email on March 8, 2012:
Hi, you could enable maintenance mode when doing a deploy, at least your users would see a maintenance page instead of an error, and also request queue wouldn't build up.
We're definitely aware this is a pain and we're working to offer rolling / zero-downtime deploys in the future. We have no ETA to announce, though.

Is it correct that there are particular times when you may need to restart Webrick to see your changes?

I heard Kevin Skoglund (lynda.com) say that it is good practice to get in the habit of restarting Webrick frequently during development. Although generally you do not need to restart Webrick to see your changes, he implied that there are particular times when this may be needed? Does anyone know what those circumstances might be? This made wonder if Webrick is kind of flaky.
If you are working through the Lynda.com tutorials, then you are working with a much earlier version of Rails then the most recent release (2.3.2).
The short answer is, large amounts of restarts are no longer necessary when working in the development environment. I think Kevin has you restart the server every time you change a Model object, but that isn't the case anymore.
The general rule of thumb is: restart every time you change something in the config or lib folder . . . any other code changes shouldn't necessitate a restart. It is also a good idea to restart when you change your routes.rb file as well, although when working with it today I noticed it is not a hard and fast rule.
The reason for all of the server restarts isn't necessarily because your webserver (webrick, mongrel, phusion passenger) is flaky, but because when your Rails app has started up, there are certain things loaded into memory, load paths, initializers, environment data. When you make a change to one of these files, you want to restart your application so that the changes take place (as opposed to the old stuff that is still running in memory)
You'll need to restart if you change your database schema, or if you add/change a constant.
I think Rails uses Mongrel by default for development now, but those still apply.

Thin + Nginx Production ready combination for RubyOnRails Application

I have recently installed Nginx + Thin on my deployment server, but i am not sure how this will perform in last requests & responses situation. lets say 1000/req per sec.
so the speed on thin is good with 10-100 req /per sec
I wanted to know on higher volumes of data being processed on the request/response cluster.
Guide me on this :-)
Multiple thin processes and nginx are capable of providing lots of speed, depending on what your application is doing. So, the problem will be your application code, the speed of your application server, and your database server.
Scaling Rails has been recently covered in depth by the Scaling Rails Screencasts. I recommend you start there. My 5 step program to scaling Rails would be:
First step is to have the tools to look at what is slow in your application. Do not spend time optimizing everything in your application when you don't know what the problem is.
The easiest way to be able to handle lots of requests/second is with page caching.
If you can't do that, cache everything possible (fragment caching, use memcached to cache data, etc), to speed up your application.
After that, optimize your application as best as possible, make SQL queries fast, index everything, etc.
If you still need more speed, throw more hardware at the problem. Get a big, powerful database server, a bunch of app servers, and proxy your requests across them. You can start here, too, but it will only delay the optimization process.
If you have a single server I think that the main key is, apart from everything already mentioned, is don't skimp on the specs of it. Trying to get too much to run on too little is just a recipe for disaster.
It is also a good idea to get monit or God monitoring your thin instances, I started out with God, but it leaked memory pretty bad on Ruby 1.8.6 so I stop using it in favour of monit. Monit is written in C I believe and has a tiny memory footprint so I'd recommend that one.
If all that seems like a bit much to keep nginx and thin playing nicely you may want to look into an all in one solution like Passenger or LiteSpeed. I have very little experience with these so can offer no substancial advice for them.

Mongrel hangs after several hours

I'm running into a problem in a Rails application.
After some hours, the application seems to start hanging, and I wasn't able to find where the problem was. There was nothing relevant in the log files, but when I tried to get the url from a browser nothing happened (like mongrel accept the request but wasn't able to respond).
What do you think I can test to understand where the problem is?
I might get voted down for dodging the question, but I recently moved from nginx + mongrel to mod_rails and have been really impressed. Moving to a much simpler setup will undoubtedly save me headaches in the future.
It was a really easy transition, I'd highly recommend it.
Are you sure the problem is caused by Mongrel? Have you tried running your application under WEBrick?
There are a few things you can check, but since you say there's nothing in the logs to indicate error, it sounds like you might be running into a bug when using the log rotation feature of the Logger class. It causes mongrel to lock up. Instead of relying on Logger to rotate your logs, consider using logrotate or some other external log rotation service.
Does this happen at a set number of hours/days every time? How much RAM do you have?
I had this same problem. The couple options I had narrowed it down to were MySQL adapter related. I was running on Red Hat Enterprise Linux 4 (or 5) and the app would hang after a given amount of idle time.
One suggested solution was to compile native MySQL bindings, I had been using the pure Ruby one.
The other was to set the timeout on the MySQL adapter higher than what the connection would idle out on. (I don't have the specific configuration recorded, but as I recall it was in environment.rb and it was some class variable in the mysql adapter.)
I don't recall if either of those solutions fixed it, we moved to Ubuntu shortly after that and hadn't had a problem since.
Check the Mongrel FAQ:
http://mongrel.rubyforge.org/wiki/FAQ
From my experience, mongrel hangs when:
the log file got too big (hundreds of megabytes in size). you have to setup log rotation.
the MySQL driver times out
you have to change the timeout settings of your MySQL driver by adding this to your environment.rb:
ActiveRecord::Base.verification_timeout = 14400
(this is further explained in the deployment section of the FAQ)
Unfortunately, Rails (and thereby Mongrel) using up too much memory over time and crashing is a known problem (50K+ Google entries for "Ruby, rails, crashing, memory"). The current ruby interpreter has the property that it sometimes simply fails entirely to give memory back to the system - it may reuse the memory it has but it won't give it up.
There are numerous schemes for monitoring, killing and restarting Mongrel instances in a production environment - for example: (choosing at random) rails monitor . Until the problem is fixed more decisively, one of these may be your best bet.
We have experienced this same issue. First off, install the mongrel_proctitle gem
http://github.com/rtomayko/mongrel_proctitle/tree/master
This gem/plugin will allow you to view the mongrel processes via "ps" and you can see if a Mongrel is hung. An issue we have seen with Mongrel is that it will happily accept connections and enqueue them, then wedge itself. This plugin will help you see when a Mongrel has been wedged but then you must use another monitoring app to actually restart a a wedged Mongrel, something like Monit or God
You might also want to consider putting a more balanced reverse proxy in front of your Mongrels, something HAproxy, instead of nginx, Apache or Lighttpd. With a setting of "maxconn 1" in HAproxy you can assure that the queue is being maintained by HAproxy versus Mongrel. The other reverse proxies (nginx, Apache, Lighttpd) only do round-robin which means that they can load up your Mongrel queue, inadvertently.
My personal choice is God as it is much more flexible.
tl;dr Install this gem plugin and keep an eye on your Mongrels. Try Apache+Phusion Passenger.

Are you using AWSDBProxy? Is there a performance hit when scaling out?

It seems that the only tutorials out there talking about using Amazon's SimpleDB in a rails site are using AWSDBProxy... Personally, I find this counter-intuitive to scaling out, considering the server layout of a typical Rails site below (using AWSDBProxy):
Plugin here: http://agilewebdevelopment.com/plugins/aws_sdb_proxy
Image here: http://www.freeimagehosting.net/uploads/91be4e0617.png
As you can see, even if we add more mongrels, we have two problems.
We have a single point of failure far less stable than our load balancer
We have to force all our information through this one WEBrick server
The solution is, of course, to add more AWSDBProxies... but why not then just use the following code in say, a class, skipping the proxy all together?
service = AwsSdb::Service.new(Logger.new(nil),
CONFIG['aws_access_key_id'],
CONFIG['aws_secret_access_key'])
service.query(domain, query)
So what I'm getting at, is if you are using AWSDBProxy, what are you justifications for it? And if you are indeed using it, what is your performance like? If you have hard numbers, this would be even more appreciated!
I'm not using it, nor have I ever heard of it, but this is what I would think are reasonable reasons.
You're running your main app server on EC2, so the chance of Internet FAIL doesn't really affect you more than once.
You run one proxy on each of your app servers. So it's connection going down is no worse than it's connection(s) to the database going down.
Because it can be done. This is as good a reason as any in an open source project. Sometimes it takes building a thing before you know whether said thing is a good/bad idea.
You don't have the traffic levels to need a load balancer. Then your diagram squashes down to a line, if not a single machine.

Resources