Rolling Deployments with Capistrano

Our rails application runs on a number of servers, each with a number of mongrels, and we use Capistrano to handle our deployments.

Typically, for deployments that involve changes to database structures (migrations), we put up our maintenance page, perform the deployment, and remove the maintenance page. That is all handled in one capistrano command:

cap production deploy:migrations

For most other changes, we deploy code and restart our servers without posting the maintenance page:

cap production deploy

All of the servers are deployed simultaneously, and all mongrels on our servers are then restarted simultaneously. There could be an extended delay (tens of seconds) for someone accessing our site during this period, as the mongrels that are restarting are unable to service requests.

A better experience for our customers would be to deploy the code simultaneously, but then restart the mongrels one-by-one. In this way, our overall raw capacity is diminished, but requests are being serviced continuously through the deployment interval.

We could spend a lot of time “shaving the yak” on this one, but keeping it simple works for most cases. Here’s what our restart task looks like:

desc <<-DESC
   Restart the Mongrel processes on the app server by starting and stopping the cluster. This uses the :use_sudo
   variable to determine whether to use sudo or not. By default, :use_sudo is set to true.
   If roll is true (either invoked via roll task or roll is set) will sleep between restarts of mongrels.

task :restart, :roles =&gt; :app do
 sudo "/usr/bin/monit restart all -g #{daemon_group}"

 if exists?(:roll)
   mongrel_count = app_servers.first.attrib?('mongrel_servers').to_i
   starting_port = app_servers.first.attrib?('mongrel_port_number').to_i
   mongrel_count.times do |i|
     sleep rolling_delay # Give daemons and the other mongrel 10 seconds to recover.
     sudo "/usr/bin/monit restart mongrel_#{starting_port + i}"
   sudo "/usr/bin/monit restart all -g #{mongrel_group}"

There’s nothing fancy here – we loop through all of the mongrels, and restart them one by one using monit, waiting 10 seconds between restarts.

This type of deployment must be used carefully, since different mongrels will be running different code—any mongrel could service any request at any time. Generally, any time controller actions, validations, or state-encoding methods (e.g. session variables) are changed, rolling deploys can’t be used. But to fix an embarrassing misspelling during prime time with no downtime, it can’t be beat.