Why keep a copy of an app on the DB host? - ruby-on-rails

A lot of Capistrano example recipes include a :db role. By default the deploy task exports the app code to all hosts in all roles. So that suggests that it's typical for people to keep a copy of their app on the DB host. Also, in Capistrano's distributed deploy.rb recipe, :deploy:migrate looks like this:
task :migrate, :roles => :db, :only => { :primary => true } do
# ...
end
My question is, why is it done like that? Wouldn't it be cleaner to keep app code off the DB host (which might not even have Ruby installed) and run migrations from the production box?

The db server runs migrations because it is the one 'responsible' for the database(s).
One could also imagine security policies that only allows for creating/dropping/changing of tables from the database server itself.
There might even be slight performance gains if there is data being loaded during a migration, although that is a terrible idea to begin with.
If you have the need to reference your database host and do not need a copy of the code on it you can use something like this:
role :db, 'dbhost', :no_release => true
Sample code to run migrations on an application server:
role :app, 'apphost', :runs_migrations => true
task :migrate, :roles = :app, :only => {:runs_migrations => true } do
#...
end

Related

capistrano-resque: Multistage with Different :workers

I have two production AWS instances which will be running resque but listening for different queues. Here is an example of my current configuration:
config/deploy/production/prod_resque_1.rb
server "<ip>", :web, :app, :resque_worker, :db, primary: true
set :resque_log_file, "log/resque.log"
set :resque_environment_task, true
set :workers, {
"queue1" => 5,
"*" => 2
}
after "deploy:restart", "resque:restart"
config/deploy/production/prod_resque_2.rb
server "<ip>", :web, :app, :resque_worker, :db, primary: true
set :resque_log_file, "log/resque.log"
set :resque_environment_task, true
set :workers, {
"queue2,queue3,queue4" => 5
}
after "deploy:restart", "resque:restart"
Then, I have a "global" recipe:
load 'config/deploy/production/common'
load 'config/deploy/production/prod_resque_1'
load 'config/deploy/production/prod_resque_2'
The obvious problem is, when I call cap prod_resque resque:start, the :workers definition in prod_resque_1 is overwritten by the load of prod_resque_2, resulting in both prod_resque_1 and prod_resque_2 both having workers listening to queue2, queue3, and queue4 only.
My work around has been to run cap prod_resque_1 resque:start then cap prod_resque_2 resque:start, but this kind of defeats the purpose of capistrano.
Any suggestions for a cleaner solution allowing me to run cap prod_resque resque:start and have the "first" server running 7 workers, 5 listening to queue1 and 2 listening to all queues, and the "second" server running 5 workers, only listening to queue2, queue3, and queue4?
An example of this is given in the capistrano-resque docs: if you assign a different role to each server (or groups of servers), then you can define workers on a per role basis.
In your case you would do something like
role :queue_one_workers, [ip_from_prod_resque_1]
role :other_queue_workers, [ip_from_prod_resque_2]
set :workers, {
:queue_one_workers => {"queue1" => 5, "*" => 2},
:other_queue_workers => {"queue2" => 5, "queue3" => 5, "queue4" => 5}
}

Multiple delayed_jobs roles with Capistrano?

I have a question that I am not finding much useful information for. I'm wondering if this is possible and, if so, how to best implement it.
We are building an app in Rails which has heavy data-processing in the background via DelayedJob (…it is working well for us.)
The app runs in AWS and we have a few different environments configured in Capistrano.
When we have heavy processing loads, our DelayedJob queues can back up--which is mostly fine. I do have one or two queues that I'd like to have a separate node tend to. Since it would be ignoring the 'clogged' queues, it would keep tending its one or two queues and they would stay current. For example, some individual jobs can take over an hour and I wouldn't want a forgotten-password-email delivery to be held up for 90 minutes until the next worker completes a task and checks for a priority job.
What I want is to have a separate EC2 instance that has one worker launched that tends to two different, explicit queues.
I can do this manually on my dev machine by launching one or two workers with the '--QUEUES' option.
Here is my question, how can I define a new role in capistrano and tell that role's nodes to start a different number of workers and tend to specific queues? Again, my normal delayed_jobs role is set to 3 workers and runs all queues.
Is this possible? Is there a better way?
Presently on Rails 3.2.13 with PostgreSQL 9.2 and the delayed_job gem.
Try this code - place it in deploy.rb after requiring default delayed_job recipes.
# This overrides default delayed_job tasks to support args per role
# If you want to use command line options, for example to start multiple workers,
# define a Capistrano variable delayed_job_args_per_role:
#
# set :delayed_job_args_per_role, {:worker_heavy => "-n 4",:worker_light => "-n 1" }
#
# Target server roles are taken from delayed_job_args_per_role keys.
namespace :delayed_job do
def args_per_host(host)
roles.each do |role|
find_servers(:roles => role).each do |server|
return args[role] if server.host == host
end
end
end
def args
fetch(:delayed_job_args_per_role, {:app => ""})
end
def roles
args.keys
end
desc "Start the delayed_job process"
task :start, :roles => lambda { roles } do
find_servers_for_task(current_task).each do |server|
run "cd #{current_path};#{rails_env} script/delayed_job start #{args_per_host server.host}", :hosts => server.host
end
end
desc "Restart the delayed_job process"
task :restart, :roles => lambda { roles } do
find_servers_for_task(current_task).each do |server|
run "cd #{current_path};#{rails_env} script/delayed_job restart #{args_per_host server.host}", :hosts => server.host
end
end
end
P.S. I've tested it only with single role in hash, but multiple roles should work fine too.
In Capistrano3, using the official capistrano3-delayed-job gem, you can do this without modifying the Capistrano methods:
# If you have several servers handling Delayed Jobs and you want to configure
# different pools per server, you can define delayed_job_pools_per_server:
#
# set :delayed_job_pools_per_server, {
# 'server11-prod' => {
# 'default,emails' => 3,
# 'loud_notifications' => 1,
# 'silent_notifications' => 1,
# },
# 'server12-prod' => {
# 'default' => 2
# }
# }
# Server names (server11-prod, server12-prod) in :delayed_job_pools_per_server
# must match the hostnames on Delayed Job servers. You can verify it by running
# `hostname` on your servers.
# If you use :delayed_job_pools_per_server, :delayed_job_pools will be ignored.

Rails - Nginx needs to be restarted after deploying with Capistrano?

I am using Capistrano to deploy my Rails application. whenever I deploy, changes would not be reflected on the browser, and I still need to restart nginx to update the site (running sudo /etc/init.d/nginx restart). I'm not really sure why but isn't it supposed to be updated after restarting application? (using touch /app/tmp/restart.txt)
Here's my deploy.rb
require "rvm/capistrano"
set :rvm_ruby_string, 'ruby-1.9.3-p194#app_name'
set :rvm_type, :user
require "bundler/capistrano"
set :application, "app_name"
set :user, "me"
set :deploy_to, "/home/#{user}/#{application}"
set :deploy_via, :copy
set :use_sudo, false
set :scm, :git
set :repository, "~/Sites/#{application}/.git"
set :branch, "master"
role :web, '1.2.3.4'
role :app, '1.2.3.4'
role :db, '1.2.3.4', :primary => true
role :db, '1.2.3.4'
namespace :deploy do
task :start do ; end
task :stop do ; end
task :restart, :roles => :app, :except => { :no_release => true } do
run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"
end
end
You shouldn't have to restart or reload nginx. Just touching tmp/restart.txt should be enough to tell passenger to reload the app.
If you're using a recent version of capistrano, you can even drop entire 'namespace :deploy' part. Capistrano already touches tmp/restart.txt after a successful deploy.
I realized that the deployment setup matches
http://coding.smashingmagazine.com/2011/06/28/setup-a-ubuntu-vps-for-hosting-ruby-on-rails-applications-2/
When I followed this tutorial(about a year ago), I installed slightly newer versions of nginx and passenger. From what I remember, I think these newer versions prompted me to use nginx as a service when I ran any type of init.d command. (Ubuntu 10.04)
Anyways I would switch out the code
run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"
to
run "#{sudo} service nginx #{command}"
And see if that works.
Maybe the problem is in how exactly you started Passenger. Capistrano points the symlink 'current' to the latest release. The task
run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"
is using that 'current' to place the restart.txt. But according to http://code.google.com/p/phusion-passenger/issues/detail?id=547 , Passenger is "pinned" to the 'current' it was started in, while the task writes 'restart.txt' to the current 'current', so to speak. So Passenger doesn't "see" that it's supposed to restart.
If you cd'ed to the then 'current' and started Passenger from there, it gets pinned to the directory the 'current' symlink points to at that point and doesn't follow the changes of the symlink. So you might need to get rid of the 'cd ... && passenger start...' and provide the path to Passenger directly. I extended the deploy:start and deploy:stop tasks you have in your recipie as well to say
task :start, :roles => :app, :except => { :no_release => true } do
run "passenger start #{current_path} -a 127.0.0.1 -p 3000 -e production -d"
end
task :stop, :roles => :app, :except => { :no_release => true } do
run "passenger stop #{current_path} -p 3000"
end

What exactly is a "role" in Capistrano?

What is the purpose and function of "roles" in a Capistrano recipe? When I look at sample recipes, I often see something like this:
role :app, 'somedomain.com'
role :web, 'somedomain.com'
role :db, 'somedomain.com', :primary => true
So it looks like a role is basically a server where Capistrano executes commands. If that's the case, then why would it be called a "role" rather than a "host" or "server"?
In the above example, what is the difference between the :app and :web roles?
What does the :primary => true option do?
Roles allow you to write capistrano tasks that only apply to certain servers. This really only applies to multi-server deployments. The default roles of "app", "web", and "db" are also used internally, so their presence is not optional (AFAIK)
In the sample you provided, there is no functional difference.
The ":primary => true" is an attribute that allows for further granularity in specifying servers in custom tasks.
Here is an example of role specification in a task definition:
task :migrate, :roles => :db, :only => { :primary => true } do
# ...
end
See the capistrano website # https://github.com/capistrano/capistrano/wiki/2.x-DSL-Configuration-Roles-Role for a more extensive explanation.
The ":primary => true" option indicates that the database server is primary server. This is important for when you want to use replication with MySQL, for example. It allows you to create another mirrored database server that can be used for automatic failover. It's also used for deciding on which database server the model migrations should be run (as those changes will be replicated to the failover servers). This link clarifies it a bit more: https://github.com/capistrano/capistrano/wiki/2.x-from-the-beginning#back-to-configuration

Primary servers in capistrano

I have a task in capistrano wherein I want just a single line to run only if the server is a marked as primary. Is there a variable or method that I can reference inside a task? 'primary?' or 'primary' doesn't seem to work.
I've also tried something akin to the following:
after "deploy", "task1"
after "deploy", "task2"
after "deploy", "task3"
task :task1, :roles => :app do
*code*
end
task :task2, :roles => :app, :only => {:primary => true} do
*code for just primary server*
end
task :task3, :roles => :app do
*more code*
end
But even this doesn't seem to work (all three tasks get run on every server).
I've been working on this on and off for a few days and I'm having no luck with my searches. Thoughts?
I've solved the issue, but it wasn't pretty. The thing that I've found is that you need to use the 'primary => true' on a per-task basis. Looking at the code, it appears that capistrano generates a list of the servers that the task will be run on before the task is run.

Resources