Increase reliability of capistrano deploys

Increase reliability of capistrano deploys - ruby-on-rails

I've used capistrano for a long time, but always for sites that weren't critical. If something went wrong, a few minutes of downtime weren't a big problem.
Now I'm working on a more critical service and need to cover my edge cases. One of which is if my local connection to a server becomes interrupted in the middle of a deployment.
One solution I can think of is to do deployments directly from the server, inside of a screen session. This seems like a reasonable and obvious solution, but I'm surprised I've never read about it elsewhere or even seen it recommended in the capistrano documentation.
Any pointers / tips are welcome. Thanks!

There is only a very small time window during a typical Capistrano deploy where a dropped connection would cause trouble. This window is when the current release is linked to the new version and the server is told to restart. If your connection drops before or after that, you're fine.
If you positively need to be safe from disconnects 100%, you can log onto the server, open a screen session and do a cap deploy from the latest release folder.

Related

Could performance issues imerge when using ActionCable in Production?

I'm planning to have a Rails App that has a very content rich interactive page where many users will connect to.
Development has went well and small time testing on the Dev servers went without a hitch either.
Problems started when we started alpha testing with selected groups of people. the sever would grind to a halt suddenly. Nginx would stop because of queue being full. I was at a lose for a while, but after looking around, came to the conclusion that the live actioncable was completely eating up my memory.This especially gets bad when the user reloads the page multiple times that subscribes to actioncable, causing additional process to become active, completely stopping the server, only being cured by a nginx reboot.
I currently run a 2core 1GB memory SSD run VPS server for alpha testing, perhaps at tops 20 concurrent users.Should I be running into performance problems with such load? or should tuning the code or redis, passenger fix this?
I know its hard to say any definitive things without more specifics, but could a ballpark estimate be done with the information?

After some gogoling and testing Nginx settings, adding this directive to the nginx settings for passenger has seemed to dramatically improve the performance issue.
location /special_websocket_endpoint {
passenger_app_group_name foo_websocket;
passenger_force_max_concurrent_requests_per_process 0;
}
more info here
https://www.phusionpassenger.com/library/config/nginx/tuning_sse_and_websockets/

20 concurrent users plus multiple tabs per user is still less than about 100 concurrent websocket connections, it is not that lot.
First thing I'd look for is leaks - when for some reason websocket connection or other resources (open files etc.) does not get freed when actual user disconnects. Make sure you're running fresh versions of rails/passenger, as there was a bug in rails causing similar behaviour (see https://blog.phusion.nl/2016/07/07/actioncable-under-stress-p1/ for details)
Also while actioncable+passenger inside nginx allows you to run everything inside single process, it is not a good idea when you expect some load.
When running a clean nginx and separate rails servers for regular requests and cable - at least other parts of the app will continue some kind of working in such conditions.

After hitting "Power Cycle" on DigitalOcean, the app doesn't work

I needed to create an image of the current droplet on DigitalOcean, so I powered off the droplet from dashboard, created the new image of the current droplet and then wanted to turn on the droplet again. So I hit "Power Cycle".
The problem is that after 10 minutes, the app is still not up. What happened and, how can I fix it?
Thanks

It sounds like the application may not be configured to start on boot. The power cycle feature at DO is basically just a gentle reboot (it tries to wait for things to shut down instead of deploying more harsh methods to stop the processes). Does the application start up if you SSH in and start it manually? If so, next check on what starts it automatically at boot. It might have encountered an error (environment errors are common) and should have reported somewhere (syslog, application log, etc.).
Just to help you out moving forward, DO supports live snapshots now, so you should be able to snapshot your droplet without powering it off. :)

Zero downtime on Heroku

Is it possible to do something like the Github zero downtime deploy on Heroku using Unicorn on the Cedar stack?
I'm not entirely sure how the restart works on Heroku and what control we have over restarting processes, but I like the possibility of zero downtime deploys and up until now, from what I've read, it's not possible
There are a few things that would be required for this to work.
First off, we'd need backwards compatible migrations. I leave that up to our team to figure out.
Secondly, we'd want to migrate the db right after a push, but before the restart (assuming our migrations are fully backwards compatible, this should not affect anything)
Thirdly, we'd want to instruct Unicorn to launch a new master process and fork some workers, then swap the PIDs and gracefully shut down the old process/workers
I've scoured the docs but I can't find anything that would indicate this is possible on Heroku. Any thoughts?

I can't address migrations, but the part about restarting processes and avoiding wait time:
There is an beta feature for heroku called preboot. After a deploy, it boots your new dynos first and waits a while before switching traffic and killing the old ones:
https://devcenter.heroku.com/articles/labs-preboot/
I also wrote a blog post that has some measurements on my app's performance improvements using this feature:
http://ylan.segal-family.com/blog/2012/08/27/deploy-to-heroku-with-near-zero-downtime/

You might be interested in their feature called preboot.
Taken from their documentation:
This feature provides seamless deploys by booting web dynos with new code before killing existing web dynos.
Some apps take a long time to boot up, and this can cause unacceptable delays in serving HTTP requests during deployment.
There are a few caveats:
You must have at least two web dynos to use this feature. If you have your web process type scaled to 1 or 0, preboot will be disabled.
Whoever is doing the deployment will have to wait a few minutes before the new code starts serving user requests; this happens later than it would without preboot (but in the meanwhile, user requests are still served promptly by old dynos).
There will be a short period (a minute or two) where heroku ps shows the status of the new code, but user requests are still being served by old code.
There is much more information about it, so refer to their documentation.

It is possible, but requires a fair amount of forward planning. As of Rails 3.1 there's three tasks that need carrying out
Upload the new code
Run any database migrations
Sync the assets
Uploading code and restarting is fairly straightforward, the main problem lies with the other two, but the way round them is the pretty much the same.
Essentially you need to:
Make the code compatible with the migration you need to run
Run the migration, and remove any code written specifically for it
For instance, if you want to remove a column, you’ll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.
In short, you need to consider your database and the code compatability an work around them so that the two can overlap in terms of versioning.
An alternative to this method might be to have two versions of the application running on Heroku at the same time. When you deploy, switch the domain to the other version, do the deploy, and switch it back again. This will help in most instances, but again, database compat is an issue.
Personally, I would say that if your deployments are significant to require this sort of consideration, taking parts of the application offline are probably the safest answer. By breaking up an application into several smaller applications can help mitigate this and is a mechanism that I use regularly.

No - this is currently not possible using Unicorn on Heroku cedar. I've been bugging Heroku about this for weeks.
Here was Heroku Support's reply to my email on March 8, 2012:
Hi, you could enable maintenance mode when doing a deploy, at least your users would see a maintenance page instead of an error, and also request queue wouldn't build up.
We're definitely aware this is a pain and we're working to offer rolling / zero-downtime deploys in the future. We have no ETA to announce, though.

Mongrel hangs after several hours

I'm running into a problem in a Rails application.
After some hours, the application seems to start hanging, and I wasn't able to find where the problem was. There was nothing relevant in the log files, but when I tried to get the url from a browser nothing happened (like mongrel accept the request but wasn't able to respond).
What do you think I can test to understand where the problem is?

I might get voted down for dodging the question, but I recently moved from nginx + mongrel to mod_rails and have been really impressed. Moving to a much simpler setup will undoubtedly save me headaches in the future.
It was a really easy transition, I'd highly recommend it.

Are you sure the problem is caused by Mongrel? Have you tried running your application under WEBrick?

There are a few things you can check, but since you say there's nothing in the logs to indicate error, it sounds like you might be running into a bug when using the log rotation feature of the Logger class. It causes mongrel to lock up. Instead of relying on Logger to rotate your logs, consider using logrotate or some other external log rotation service.

Does this happen at a set number of hours/days every time? How much RAM do you have?

I had this same problem. The couple options I had narrowed it down to were MySQL adapter related. I was running on Red Hat Enterprise Linux 4 (or 5) and the app would hang after a given amount of idle time.
One suggested solution was to compile native MySQL bindings, I had been using the pure Ruby one.
The other was to set the timeout on the MySQL adapter higher than what the connection would idle out on. (I don't have the specific configuration recorded, but as I recall it was in environment.rb and it was some class variable in the mysql adapter.)
I don't recall if either of those solutions fixed it, we moved to Ubuntu shortly after that and hadn't had a problem since.

Check the Mongrel FAQ:
http://mongrel.rubyforge.org/wiki/FAQ
From my experience, mongrel hangs when:
the log file got too big (hundreds of megabytes in size). you have to setup log rotation.
the MySQL driver times out
you have to change the timeout settings of your MySQL driver by adding this to your environment.rb:
ActiveRecord::Base.verification_timeout = 14400
(this is further explained in the deployment section of the FAQ)

Unfortunately, Rails (and thereby Mongrel) using up too much memory over time and crashing is a known problem (50K+ Google entries for "Ruby, rails, crashing, memory"). The current ruby interpreter has the property that it sometimes simply fails entirely to give memory back to the system - it may reuse the memory it has but it won't give it up.
There are numerous schemes for monitoring, killing and restarting Mongrel instances in a production environment - for example: (choosing at random) rails monitor . Until the problem is fixed more decisively, one of these may be your best bet.

We have experienced this same issue. First off, install the mongrel_proctitle gem
http://github.com/rtomayko/mongrel_proctitle/tree/master
This gem/plugin will allow you to view the mongrel processes via "ps" and you can see if a Mongrel is hung. An issue we have seen with Mongrel is that it will happily accept connections and enqueue them, then wedge itself. This plugin will help you see when a Mongrel has been wedged but then you must use another monitoring app to actually restart a a wedged Mongrel, something like Monit or God
You might also want to consider putting a more balanced reverse proxy in front of your Mongrels, something HAproxy, instead of nginx, Apache or Lighttpd. With a setting of "maxconn 1" in HAproxy you can assure that the queue is being maintained by HAproxy versus Mongrel. The other reverse proxies (nginx, Apache, Lighttpd) only do round-robin which means that they can load up your Mongrel queue, inadvertently.
My personal choice is God as it is much more flexible.
tl;dr Install this gem plugin and keep an eye on your Mongrels. Try Apache+Phusion Passenger.

How can I find out why my app is slow?

I have a simple Rails app deployed on a 500 MB Slicehost VPN. I'm the only one who uses the app. When I run it on my laptop, it's fast enough. But the deployed version is insanely slow. It take 6 to 10 seconds to load the login screen.
I would like to find out why it's so slow. Is it my code? (Don't think so because it's much faster locally, but maybe.) Is it Slicehost's server being overloaded? Is it the Internet?
Can someone suggest a technique or set of steps I can take to help narrow down the cause of this problem?
Update:
Sorry forgot to mention. I'm running it under CentOS 5 using Phusion Passenger (AKA mod_rails or mod_rack).

If it is just slow on the first time you load it is probably because of passenger killing the process due to inactivity. I don't remember all the details but I do recall reading people who used cron jobs to keep at least one process alive to avoid this lag that can occur with passenger needed to reload the environment.
Edit: more details here
Specifically - pool idle time defaults to 2 minutes which means after two minutes of idling passenger would have to reload the environment to serve the next request.

First, find out if there's a particularly slow response from the server. Use Firefox and the Firebug plugin to see how long each component (including JavaScript and graphics) takes to download. Assuming the main page itself is what is taking all the time, you can start profiling the application. You'll need to find a good profiler, and as I don't actually work in Ruby on Rails, I can't suggest any: google "profile ruby on rails" for some options.
As YenTheFirst points out, the server software and config you're using may contribute to a slowdown, but A) slicehost doesn't choose that, you do, as Slicehost just provides very raw server "slices" that you can treat as dedicated machines. B) you're unlikely to see a script that runs instantly suddenly take 6 seconds just because it's running as CGI. Something else must be going on. Check how much RAM you're using: have you gone into swap? Is the login slow only the first time it's hit indicating some startup issue, or is it always that slow? Is static content served slow? That'd tend to mean some network issue (either on the Slicehost side, or your local network) is slowing things down, assuming you're not in swap.
When you say "fast enough" you're being vague: does the laptop version take 1 second to the Slicehost 6? That wouldn't be entirely surprising, if the laptop is decent: after all, the reason slices are cheap is because they're a fraction of a full server. You're using probably 1/32 of an 8 core machine at Slicehost, as opposed to both cores of a modern laptop. The Slicehost cores are quick, but your laptop could be a screamer compared to 1/4 of core. :)

Try to pint point where the slowness lies
1/ application is slow, or infrastructure (network + web server)
put a static file on your web server, and access it through your browser
2/ If it is fast, it is probable a problem with application + server configuration.
database access is slow
try a page with a simpel loop: is it slow?
3/ If it slow, it is probably your infrastructure. You can check:
bad network connection: do a packet capture (with Wireshark for example) and look for retransmissions, duplicate packets, etc.
DNS resolution is slow?
server is misconfigured?
etc.

What is Slicehost using to serve it?
Fast options are things like: Mongrel, or apache's mod_rails (also called passenger phusion or
something like that)
These are dedicated servers (or plugins to servers) which run an instance of your rails app.
If your host isn't using that, then it's probably defaulting to CGI. Rails comes with a simple CGI script that will serve the page, but it reloads the app for every page.
(edit: I suspect that this is the most likely case, that your app is running off of the CGI in /webapp_directory/public/dispatch.cgi, which would explain the slowness. This tends to be a default deployment on many hosts, since it doesn't require extra configuration on their part, but it doesn't give good performance)
If your host supports "Fast CGI", rails supports that too. Fast CGI will open a CGI session, and keep it open for multiple pages, so you get much better performance, but it's not nearly as good as Mongrel or mod_rails.
Secondly, is it in 'production' or 'development' mode? The easy way to tell is to go to a page in your app that gives an error. If it shows you a stack trace, it's in development mode, which is slower than production mode. Mongrel and mod_rails have startup options to determine whether to run the app in production or development mode.
Finally, if your database is slow for whatever reason, that will be a big bottleneck as well. If you do have a good deployment (Mongrel/mod_rails/etc.) in production mode, try looking into that.

Do you have a lot of data in your DB? I would double check that you have indexed all the appropriate columns- because this can make a huge difference. On your local dev system, you probably have a lot more memory than on your 500 mb slice, which would result in the DB running a lot slower if you have big, un indexed tables. You can also run the slow queries logger in MySql to pinpoint columns without indexes.
Other than that, yes- passenger will need to spool up a process for you if you have not been using the site recently. If this is the case, you should see a significant speed increase on second, and especially third and later page loads.

You might want to run a local virtual machine with 500 MB. Are you doing a lot of client-server interaction? Delays over the WAN are significant

You might want to check out RPM (there's a free "lite" version too) and/or New Relic's Tune Up.

Your CPU time is guaranteed by Slicehost using the Xen virtualization system, so it's not that. Don't have the other answers for you, sorry! Might try 'top' on a console while you're trying to access the page.

If you are using FireFox and doing localhost testing (or maybe even on LAN) you may want to try editing the network.dns.disableIPv6 setting.
Type about:config in the address bar and filter for network.dns.disableIPv6 and double-click to set to true.
This bug has been reported mainly from Vista OS's, but some others as well.

You could try running 'top' when you SSH in to see which process is heavy. If you also have problems logging you, perhaps you may try getting Statistics in the Slicehost manager.
If you discover it is MySQL's fault, consider decreasing the number of servers it can spawn.
512 seems decent for Rails application, you might have to check if you misconfigured too.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart