Rolling Deployments with Capistrano
Our rails application runs on a number of servers, each with a number of mongrels, and we use Capistrano to handle our deployments.
Typically, for deployments that involve changes to database structures (migrations), we put up our maintenance page, perform the deployment, and remove the maintenance page. That is all handled in one capistrano command:
cap production deploy:migrations
For most other changes, we deploy code and restart our servers without posting the maintenance page:
cap production deploy
All of the servers are deployed simultaneously, and all mongrels on our servers are then restarted simultaneously. There could be an extended delay (tens of seconds) for someone accessing our site during this period, as the mongrels that are restarting are unable to service requests.
A better experience for our customers would be to deploy the code simultaneously, but then restart the mongrels one-by-one. In this way, our overall raw capacity is diminished, but requests are being serviced continuously through the deployment interval.
We could spend a lot of time “shaving the yak” on this one, but keeping it simple works for most cases. Here’s what our restart task looks like:
desc <<-DESC Restart the Mongrel processes on the app server by starting and stopping the cluster. This uses the :use_sudo variable to determine whether to use sudo or not. By default, :use_sudo is set to true. If roll is true (either invoked via roll task or roll is set) will sleep between restarts of mongrels. DESC task :restart, :roles => :app do sudo "/usr/bin/monit restart all -g #{daemon_group}" if exists?(:roll) rolling_restart_message mongrel_count = app_servers.first.attrib?('mongrel_servers').to_i starting_port = app_servers.first.attrib?('mongrel_port_number').to_i mongrel_count.times do |i| sleep rolling_delay # Give daemons and the other mongrel 10 seconds to recover. sudo "/usr/bin/monit restart mongrel_#{starting_port + i}" end else sudo "/usr/bin/monit restart all -g #{mongrel_group}" end end
There’s nothing fancy here – we loop through all of the mongrels, and restart them one by one using monit, waiting 10 seconds between restarts.
This type of deployment must be used carefully, since different mongrels will be running different code—any mongrel could service any request at any time. Generally, any time controller actions, validations, or state-encoding methods (e.g. session variables) are changed, rolling deploys can’t be used. But to fix an embarrassing misspelling during prime time with no downtime, it can’t be beat.