Capistrano not restarting Mongrel clusters properly

Capistrano not restarting Mongrel clusters properly - ruby-on-rails

I have a cluster of three mongrels running under nginx, and I deploy the app using Capistrano 2.4.3. When I "cap deploy" when there is a running system, the behavior is:
The app is deployed. The code is successfully updated.
In the cap deploy output, there is this:
executing "sudo -p 'sudo password: '
mongrel_rails cluster::restart -C
/var/www/rails/myapp/current/config/mongrel_cluster.yml"
servers: ["myip"]
[myip] executing command
** [out :: myip] stopping port 9096
** [out :: myip] stopping port 9097
** [out :: myip] stopping port 9098
** [out :: myip] already started port 9096
** [out :: myip] already started port 9097
** [out :: myip] already started port 9098
I check immediately on the server and find that Mongrel is still running, and the PID files are still present for the previous three instances.
A short time later (less than one minute), I find that Mongrel is no longer running, the PID files are gone, and it has failed to restart.
If I start mongrel on the server by hand, the app starts up just fine.
It seems like 'mongrel_rails cluster::restart' isn't properly waiting for a full stop
before attempting a restart of the cluster. How do I diagnose and fix this issue?
EDIT: Here's the answer:
mongrel_cluster, in the "restart" task, simply does this:
def run
stop
start
end
It doesn't do any waiting or checking to see that the process exited before invoking "start". This is a known bug with an outstanding patch submitted. I applied the patch to Mongrel Cluster and the problem disappeared.

You can explicitly tell the mongrel_cluster recipes to remove the pid files before a start by adding the following in your capistrano recipes:
# helps keep mongrel pid files clean
set :mongrel_clean, true
This causes it to pass the --clean option to mongrel_cluster_ctl.
I went back and looked at one of my deployment recipes and noticed that I had also changed the way my restart task worked. Take a look at the following message in the mongrel users group:
mongrel users discussion of restart
The following is my deploy:restart task. I admit it's a bit of a hack.
namespace :deploy do
desc "Restart the Mongrel processes on the app server."
task :restart, :roles => :app do
mongrel.cluster.stop
sleep 2.5
mongrel.cluster.start
end
end

First, narrow the scope of what your testing by only calling cap deploy:restart. You might want to pass the --debug option to prompt before remote execution or the --dry-run option just to see what's going on as you tweak your settings.
At first glance, this sounds like a permissions issue on the pid files or mongrel processes, but it's difficult to know for sure. A couple things that catch my eye are:
the :runner variable is explicity set to nil -- Was there a specific reason for this?
Capistrano 2.4 introduced a new behavior for the :admin_runner variable. Without seeing the entire recipe, is this possibly related to your problem?
:runner vs. :admin_runner (from capistrano 2.4 release)
Some cappers have noted that having deploy:setup and deploy:cleanup run as the :runner user messed up their carefully crafted permissions. I agreed that this was a problem. With this release, deploy:start, deploy:stop, and deploy:restart all continue to use the :runner user when sudoing, but deploy:setup and deploy:cleanup will use the :admin_runner user. The :admin_runner variable is unset, by default, meaning those tasks will sudo as root, but if you want them to run as :runner, just do “set :admin_runner, runner”.
My recommendation for what to do next. Manually stop the mongrels and clean up the PIDs. Start the mongrels manually. Next, continue to run cap deploy:restart while debugging the problem. Repeat as necessary.

Either way, my mongrels are starting before the previous stop command has finished shutting 'em all down.
sleep 2.5 is not a good solution, if it takes longer than 2.5 seconds to halt all running mongrels.
There seems to be a need for:
stop && start
vs.
stop; start
(this is how bash works, && waits for the first command to finish w/o error, while ";" simply runs the next command).
I wonder if there is a:
wait cluster_stop
then cluster_start

I hate to be so basic, but it sounds like the pid files are still hanging around when it is trying to start. Make sure that mongrel is stopped by hand. Clean up the pid files by hand. Then do a cap deploy.

Good discussion: http://www.ruby-forum.com/topic/139734#745030

Related

What's the better way to execute daemons when Rails server runing

I have some gems in my Rails App, such as resque, sunspot. I run the following command manually when the machines boots:
rake sunspot:solr:start
/usr/local/bin/redis-server /usr/local/etc/redis.conf
rake resque:work QUEUE='*'
Is there a better practice to run these daemon in the background? And is there any side-effect when run these tasks run in the background?

My solution to that is to use a mix of god, capistrano and whenever. A specific problem I have is that I want all app processes to be run as user, so initd scripts are not an option (this could be done, but it's quite a pain of user switching / environment loading).
God
The basic idea is to use god to start / restart / monitor processes. God may be difficult to get start with, but is very powerful :
running god alone will start all your processes (webserver, bg jobs, whatever)
it can detect a process crashed and restart it
you can group processes and batch restart them (staging, production, background, devops, etc)
Whenever
You still have to start god on server restart. A good mean to do so is to use user crontab. Most cron implementation have a special instruction called #reboot, which allows you to run a specific command on server restart :
#reboot /bin/bash -l -c 'cd /home/my_app && SERVER=true god -c production/current/config/app.god"
Whenever is a gem that allows easy management for crontab, including generating reboot command. While it's not absolutely necessary for achieving what I describe, it's really useful for its capistrano integration.
Capistrano
You not only want to start your processes on server restart, you also want to restart them on deploy. If your background jobs code is not up to date, problem will arise.
Capistrano allows to easily handle that, just ask god to restart the whole group (like : god restart production) in a post deploy capistrano task, and it will be handled seamlessly.
Whenever's capistrano integration also ensure your crontab is always up to date, updating it if you changed your config/schedule.rb file.

You can use something like foreman to manage these processes. You can define process types and other things in a Procfile and you can start and do whatever with them.

Interrupt Rails Server Output w/o Stopping the Server during a Capistrano Deployment

I'm doing a Capistrano deployment of a Rails app. It's been a lot of fun for the most part.
After the deployment is complete (in deploy:restart), I would like to start the Rails server, watch the output for a while, and then hit Ctrl-C to send an interrupt, thereby stopping the output, and proceeding the deploy:cleanup task. At one point it seemed like this was working, except that it appeared to be considering the interrupt to be an exception and so was saying it "Cannot Start the Rails Server" even though it was actually started and running. I wanted to rescue the interrupt, and so wrote the following, based in part on another thread here:
namespace :deploy do
task :restart, :roles => :app, :except => { :no_release => true } do
begin
logger.info 'Attempting to Start the Rails Server'
run "cd #{release_path} && script/rails s"
rescue SystemExit, Interrupt
logger.info %q[Au revoir! And don't worry. The server will continue running just fine without you hanging around looking over it's shoulder.]
rescue Exception => error
logger.important 'Cannot Start the Rails Server. This may be a problem.'
logger.info "#{error}"
end
end
end
However, this doesn't work. Before I hit Ctrl-C, while the server is still running, as I would expect, I'm getting this sort of thing:
** [out :: server.example.com] Started GET "/assets/bootstrap.js?body=1" for 178.120.25.53 at 2012-07-09 19:10:53 +0000
** [out :: server.example.com] Served asset /bootstrap.js - 200 OK (11ms)
And then after I send the interrupt, I'm getting this:
** Au revoir! And don't worry. The server will continue running just fine without you hanging around looking over it's shoulder.
triggering after callbacks for `deploy:restart'
* executing `deploy:cleanup'
* executing "ls -xt /srv/www/my_project/releases"
servers: ["server.example.com"]
[server.example.com] executing command
command finished in 774ms
** keeping 1 of 2 deployed releases
* executing "rm -rf /srv/www/my_project/releases/20120709190209"
servers: ["server.example.com"]
[server.example.com] executing command
command finished in 811ms
Which looks right...but as it turns out, Rails is not, in fact, still running, as a grep of the processes before and after reveals.
Before Ctrl-C, I see both the Capistrano command (19358), and the Rails server it started (19507):
user#server.example.com:~$ ps ax | grep rails | grep -v grep
19358 pts/1 Ss+ 0:01 bash -c cd /srv/www/my_project/releases/20120709190521 && script/rails s
19507 pts/1 Sl+ 0:41 ruby script/rails s
After Ctrl-C, the Rails server is still there, or it appears to be:
user#server.example.com:~$ ps ax | grep rails | grep -v grep
19507 ? Sl 0:41 ruby script/rails s
But after I attempt to hit the site in a web browser, it disappears! Weird eh?
user#server.example.com:~$ ps ax | grep rails | grep -v grep
user#server.example.com:~$ [no output; returned to prompt]
So, my question is: How do I do this thing? How do sever the communication between the running Rails process and Capistrano, allow Capistrano to move on to it's remaining tasks, and then give me back my terminal prompt, without stopping the Rails server? Any help would be appreciated.

I've now realized that this was a PEBCAC error. The Begin-Rescue-End block in my Capistrano script was not catching (rescuing) my inbound Ctrl-C. It was merely passing it along to the running Rails server process, which was obediently exiting with a SystemExit, which was passed back up the line to the Capistrano script, which then caught the outbound exception. At that point it was a done deal. No amount of catching and handling the outbound exceptions from within the context of the Capistrano script was ever going to prevent the Rails server from stopping. So, I understand now why it wasn't working. But I am still curious if there's a way to do what I was trying to do. It would mean catching my inbound interrupt in Capistrano somewhere and handling it before it could be passed on to the server.

You can use Signal.trap in Ruby to catch Ctrl-C.
But I'm not sure how you can do what you need to with Capistrano – you want to spawn a grandchild process that won't be terminated when the Capistrano process is.

Unicorn completely ignores USR2 signal

I'm experiencing a rather strange problem with unicorn on my production server.
Although the config file states preload_app true, sending USR2 to the master process does not generate any response, and it seems like unicorn is ignoring the signal altogether.
On another server sending USR2 changes the master process to and (old) state and starts a new master process successfully.
The problematic server is using RVM & bundler, so I'm assuming it's somehow related (the other one is vanilla ruby).
Sending signals other than USR2 (QUIT, HUP) works just fine.
Is there a way to trace what's going on behind the scenes here? Unicorn's log file is completely empty.

I suspect your issue might be that your Gemfile has changed, but you haven't started your unicorn in a way that allows USR2 to use the new Gemfile. It's therefore crashing when you try to restart the app.
Check your /log/unicorn.log for details of what might be failing.
If you're using Capistrano, specify the BUNDLE_GEMFILE as the symlink, e.g.:
run "cd #{current_path} && BUNDLE_GEMFILE=#{current_path}/Gemfile bundle exec unicorn -c #{config_path} -E #{unicorn_env} -D"
Here's a PR that demostrates this.

I experienced a similar problem, but my logs clearly identified the issue: sending USR2 would initially work on deployments, but as deployments got cleaned up, the release that the Unicorn master was initially started on would get deleted, so attempts at sending a USR2 signal would appear to do nothing / fail, with the error log stating:
forked child re-executing... 53
/var/www/application/releases/153565b36021c0b8c9cbab1cc373a9c5199073db/vendor/bundle/ruby/1.9.1/gems/unicorn-4.3.1/lib/unicorn/http_server.rb:439:in
`exec': No such file or directory -
/var/www/application/releases/153565b36021c0b8c9cbab1cc373a9c5199073db/vendor/bundle/ruby/1.9.1/bin/unicorn
(Errno::ENOENT)
The Unicorn documents mention this potential problem at http://unicorn.bogomips.org/Sandbox.html: "cleaning up old revisions will cause revision-specific installations of unicorn to go missing and upgrades to fail", which in my case meant USR2 appeared to 'do nothing'.
I'm using Chef's application recipe to deploy applications, which creates a symlinked vendor_bundle directory that is shared across deployments, but calling bundle exec unicorn still resulted in the original Unicorn master holding a path reference that included a specific release directory.
To fix it I had to call bundle exec /var/www/application/shared/vendor_bundle/ruby/1.9.1/bin/unicorn to ensure the Unicorn master had a path to a binary that would be valid from one deployment to the next. Once that was done I could deploy to me heart's content, and kill -USR2 PID would work as advertised.
The Unicorn docs mention you can manually change the binary path reference by setting the following in the Unicorn config file and sending HUP to reload Unicorn before sending a USR2 to fork a new master: Unicorn::HttpServer::START_CTX[0] = "/some/path/to/bin/unicorn"
Perhaps this is useful to some people in similar situations, but I didn't implement this as it appears specifying an absolute path to the shared unicorn binary was enough.

I've encountered a similar problem on my VDS. Strace'ing revealed the cause:
write(2, "E, [2011-07-23T04:40:27.240227 #19450] ERROR -- : Cannot allocate memory - fork(2) (Errno::ENOMEM) <...>
Try increasing the memory size, XEN memory on demand limits (they were too hard in my case), or maybe turn on overcommit, through the latter may have some serious unwanted side effects, so do it carefully.

Using God to monitor Unicorn - Start exited with non-zero code = 1

I am working on a God script to monitor my Unicorns. I started with GitHub's examples script and have been modifying it to match my server configuration. Once God is running, commands such as god stop unicorn and god restart unicorn work just fine.
However, god start unicorn results in WARN: unicorn start command exited with non-zero code = 1. The weird part is that if I copy the start script directly from the config file, it starts right up like a brand new mustang.
This is my start command:
/usr/local/bin/unicorn_rails -c /home/my-linux-user/my-rails-app/config/unicorn.rb -E production -D
I have declared all paths as absolute in the config file. Any ideas what might be preventing this script from working?

I haven't used unicorn as an app server, but I've used god for monitoring before.
If I remember rightly when you start god and give your config file, it automatically starts whatever you've told it to watch. Unicorn is probably already running, which is why it's throwing the error.
Check this by running god status once you've started god. If that's not the case you can check on the command line what the comand's exit status is:
/usr/local/bin/unicorn_rails -c /home/my-linux-user/my-rails-app/config/unicorn.rb -E production -D;
echo $?;
that echo will print the exit status of the last command. If it's zero, the last command reported no errors. Try starting unicorn twice in a row, I expect the second time it'll return 1, because it's already running.
EDIT:
including the actual solution from comments, as this seems to be a popular response:
You can set an explicit user and group if your process requires to be run as a specific user.
God.watch do |w|
w.uid = 'root'
w.gid = 'root'
# remainder of config
end

My problem was that I never bundled as root. Here is what I did:
sudo bash
cd RAILS_ROOT
bundle
You get a warning telling you to never do this:
Don't run Bundler as root. Bundler can ask for sudo if it is needed,
and installing your bundle as root will break this application for all
non-root users on this machine.
But it was the only way I could get resque or unicorn to run with god. This was on an ec2 instance if that helps anyone.

Add the log option has helped me greatly in debugging.
God.watch do |w|
w.log = "#{RAILS_ROOT}/log/god.log"
# remainder of config
end
In the end, my bug turned out to be the start_script in God was executed in development environment. I fixed this by appending the RAILS_ENV to the start script.
start_script = "RAILS_ENV=#{ENV['RACK_ENV']} bundle exec sidekiq -P #{pid_file} -C #{config_file} -L #{log_file} -d"

How to execute a command on the server with Capistrano?

I have a very simple task called update_feeds:
desc "Update feeds"
task :update_feeds do
run "cd #{release_path}"
run "script/console production"
run "FeedEntry.update_all"
end
Whenever I try to run this task, I get the following message:
[out :: mysite.com] sh: script/console: No such file or directory
I figured it's because I am not in the right directory, but trying
run "cd ~/user/mysite.com/current"
instead of
run "cd #{release_path}"
Also fails. Running the exact same commands manually (through ssh) works perfectly.
Why can't capistrano properly cd (change directory) into the site directory to run the command?
Thanks!
Update: Picked an answer, and thank you so much to all who replied.
The best answer may actually be the one on server fault, though the gist of both (the one on server fault and the one on stack overflow) is the same.

You want to use script/runner. It starts an instance of the app to execute the method you want to call. Its slow though as it has to load all of your rails app.
~/user/mysite.com/current/script/runner -e production FeedEntry.update_all 2>&1
You can run that from the capistrano task.

I cannot imagine that you would be able to remotely log into rails console from capistrano. I suggest you call your model method from a rake task.
How do I run a rake task from Capistrano?
As for the latter part of your question, are you logging into the server with the same user account as capistrano?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart