Activity Pages Now Available

You’ve always been able to view Workspace activity from the User Home and from notification emails. Now, we’ve made it even easier to see Workspace-specific activity from the Workspace itself, with the new Activity page.

To enable the Activity page, click the Edit Pages gear next to the page titles in a Workspace, tick the Activity checkbox, and click Save.

With the Activity page, you’ll now see what everyone has been up to since your last visit.

You can also use the Personal view to display only the activities that you have generated.

Finding and fixing a long standing bug in the Ruby Amazon S3 Library

The AWS::S3 library for Ruby has been around since the release of Amazon S3 in 2006; hundreds, if not thousands, of applications use it. Consequently, it is not usually “the suspect” when looking for the cause of intermittent access errors to S3. However, we recently found and fixed an error that has been present in the signature calculation method since the library was first released.

We use S3 as the backing store for Onehub Workspaces, and we do a lot of S3 operations. During routine log monitoring we noticed a slow, but persistent, stream of HTTP 403 (Unauthorized Access) errors from S3. These errors were not frequent enough to cause problems for our customers; applications using S3 should be designed anticipate errors, and retry. Still, we felt that further investigation was warranted.

To manage the logs generated by all of the Onehub services, we use Papertrail. Papertrail allows us to run a real-time search against our production logs, showing us requests to S3 like this:


https://s3.amazonaws.com/<bucket>/<object>?AWSAccessKeyId=<ouraccesskey>&Expires=1328127911&Signature=l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D

The components used to calculate the signature are well documented by Amazon. When a signature fails, S3 will provide the components it attempted to use in the XML returned with the error message. We noticed that the signatures in these errors were different than those that should have been calculated for the provided Expires time. We monkey-patched the #canonical_signature method of AWS::S3::Authentication::Signature to handle a closure.

module AWS
  module S3
    class Authentication
      class Signature
        private
        def encoded_canonical
          digest = OpenSSL::Digest::Digest.new('sha1')
          b64_hmac = [OpenSSL::HMAC.digest(digest, secret_access_key, canonical_string)].pack("m").strip
          if options[:debug_proc]
            options[:debug_proc].call(sprintf("AWS::S3::Authentication::Signature - request %s encoded canonical: %s %s  canonical: [%s]", @request.path, b64_hmac, CGI.escape(b64_hmac), canonical_string))
          end
          url_encode? ? CGI.escape(b64_hmac) : b64_hmac
        end
      end
    end
  end
end

This enabled us to pass in an option to AWS::S3#url_for containing a closure with our debugging method.

options = options.merge({:debug_proc => lambda{|x| logger.warn(x)}})
the_url = AssetStore.url_for(key_name, options)

We put this through testing, and into production, then waited for the next error to appear.

AWS::S3::Authentication::Signature - request /<bucket>/<keyname> encoded canonical: l74ewTX9hh0s2oiLoIY83V+lLuM= l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D canonical: [GET#012#012#0121328127912#012/<bucket>/<keyname>]

https://s3.amazonaws.com/<bucket>/<object>?AWSAccessKeyId=<OURACCESSKEY>&Expires=1328127911&Signature=l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D

From here we could see the error. The Expires time used to calculate the signature was different that the time provided in the URL. The value 1328127911 is in the URL, while 1328127912 was used to calculate the signature!

But why?

It took a bit of digging through the AWS::S3 source, but we found the culprit. When generating these S3 URLs, we pass an expires_in option to AWS::S3#url_for.

# Signature is the abstract super class for the Header and QueryString authentication methods. It does the job
# of computing the canonical_string using the CanonicalString class as well as encoding the canonical string. The subclasses
# parameterize these computations and arrange them in a string form appropriate to how they are used, in one case a http request
# header value, and in the other case key/value query string parameter pairs.
class Signature < String #:nodoc:
  attr_reader :request, :access_key_id, :secret_access_key, :options

  def initialize(request, access_key_id, secret_access_key, options = {})
    super()
    @request, @access_key_id, @secret_access_key = request, access_key_id, secret_access_key
    @options = options
  end

  private

    def canonical_string
      options = {}
      options[:expires] = expires if expires?
      CanonicalString.new(request, options)
    end
    memoized :canonical_string

    def encoded_canonical
      digest   = OpenSSL::Digest::Digest.new('sha1')
      b64_hmac = [OpenSSL::HMAC.digest(digest, secret_access_key, canonical_string)].pack("m").strip
      url_encode? ? CGI.escape(b64_hmac) : b64_hmac
    end

    def url_encode?
      !@options[:url_encode].nil?
    end

    def expires?
      is_a? QueryString
    end

    def date
      request['date'].to_s.strip.empty? ? Time.now : Time.parse(request['date'])
    end
end

# Provides query string authentication by computing the three authorization parameters: AWSAccessKeyId, Expires and Signature.
# More details about the various authentication schemes can be found in the docs for its containing module, Authentication.
class QueryString < Signature #:nodoc:
  constant :DEFAULT_EXPIRY, 300 # 5 minutes
  def initialize(*args)
    super
    options[:url_encode] = true
    self << build
  end

  private

    # Will return one of three values, in the following order of precedence:
    #
    #   1) Seconds since the epoch explicitly passed in the +:expires+ option
    #   2) The current time in seconds since the epoch plus the number of seconds passed in
    #      the +:expires_in+ option
    #   3) The current time in seconds since the epoch plus the default number of seconds (60 seconds)
    def expires
      return options[:expires] if options[:expires]
      date.to_i + expires_in
    end

    def expires_in
      options.has_key?(:expires_in) ? Integer(options[:expires_in]) : DEFAULT_EXPIRY
    end

    # Keep in alphabetical order
    def build
      "AWSAccessKeyId=#{access_key_id}&Expires=#{expires}&Signature=#{encoded_canonical}"
    end
end

The #initialize method is the entry point, but most of the work is done by #build. The bug was immediately apparent once we looked at #expires. Because #build calls #expires, and then #encoded_canonical calls it later, the date used can change. The #date method uses Time.now, if these calls happened on different seconds, they would result in different values. The solution is to memoize the time; it could be done in #expires or #date.

def expires
  return options[:expires] if options[:expires]
  @expires ||= date.to_i + expires_in
end

Interestingly, this error is only possible if the expires_in option is used. We suspect most people either use the library’s DEFAULT_EXPIRY or pass in an expires option, both of which cause #expires to avoid the call to #date.

After a bit of testing we put this code into production and have eliminated these errors, resulting in better performance for our customers. We have also submitted a pull request to the library maintainer.

Using Godaddy SSL Certificates with NGINX

Have you just installed your new Godaddy certificate into your NGINX web server, and are you finding that some browsers (notably Safari) don’t trust your website when using your Godaddy SSL Certificate?

This is manifest by the error message “Safari can’t identify the identity of the website ‘your.url.here’” and is caused by the “chain of trust” being incomplete between your certificate and any of the root certificates that your browser client has installed.

Here’s a quick cure for an NGINX installation:

Download the gd_bundle.crt and gd_intermediate.crt certificates from Godaddy’s certificate repository, then combine them:

cat yourcert.crt gd_intermediate.crt gd_bundle.crt > yourcert_bundle.crt

This concatenates your certificate and the Godaddy intermediate certificates into one file. Put the file yourcert_bundle.crt in the place that NGINX is looking for your certs (specified in nginx.conf). Reload your NGINX configuration with:

kill -HUP <pid of nginx>

You should be ready to go! If you want more information on the entire chain of trust, you can download the Godaddy root certificate (gd-class2-root.crt) and use the OpenSSL command utility:

openssl s_client -CAfile gd-class2-root.crt -connect www.yourdomain.com:443  -verify 10

This will pull the certificate from yourdomain.com server, and attempt to verify the chain of trust to whatever root you’ve specified (-CAfile gd-class2-root.crt):

verify depth is 10
CONNECTED(00000003)
depth=2 /C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority
verify return:1
depth=1 /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07992287
verify return:1
depth=0 /O=*.yourdomain.com/OU=Domain Control    Validated/CN=*.yourdomain.com
verify return:1
—-
Certificate chain
 0 s:/O=*.yourdomain.com/OU=Domain Control Validated/CN=*.yourdomain.com
   i:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07992287
 1 s:/O=*.yourdomain.com/OU=Domain Control Validated/CN=*.yourdomain.com
   i:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07992287
 2 s:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certificates.godaddy.com/repository/CN=Go Daddy Secure Certification Authority/serialNumber=07992287
   i:/C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority
—-
Server certificate
<Continued Output>

This shows that the certificate obtained from the site was verified all the way to a root certificate (specified by -CAfile).

Adding Columns to large MySQL Tables Quickly

Suppose that you have a MySQL Database, and in that database you have a non-trivial table with more than a million records. If you’re using that table with your Rails application, you might at some point like to add some additional columns.

It’s tempting to just write the migration like:

class AddThreeColumnsToQuarks < ActiveRecord::Migration
  def self.up
    add_column :quarks, :arbitrary_field1, :integer
    add_column :quarks, :arbitrary_field2, :string
    add_column :quarks, :arbitrary_field3, :integer
  end

  def self.down
    remove_column :quarks, :arbitrary_field1
    remove_column :quarks, :arbitrary_field2
    remove_column :quarks, :arbitrary_field3
  end
end

Should you do that, you will find that although it works, MySQL will take a fantastic amount of time to add the column when you have a lot of rows. What ActiveRecord is doing is adding each column individually with an alter statement:

ALTER TABLE `quarks` ADD `arbitrary_field1` int(11)
ALTER TABLE `quarks` ADD `arbitrary_field2` varchar(255)
ALTER TABLE `quarks` ADD `arbitrary_field3` int(11)

Each one of those ALTER statements makes a new temporary table, copies records from your existing table into the new table, and then replaces the old table with the new table. Five thousand records in the database? Adding three columns will copy the DB three times. Fifteen thousand rows are copied.

One can make this better by combining the ALTERs into one statement (as long as the ALTER contains a single type of operation, such as ADD COLUMN). The copy of the data in the table still takes a while. A few million rows? You might be waiting tens of minutes.

A FASTER way of adding columns is to create your own new table, then SELECT all of the rows from the existing table into it. You can create the structure from the existing table, then modify the structure however you’d like, then SELECT in the data. MAKE SURE that you SELECT the information into the new table in the same order as the fields are defined. Here’s an example:

class AddThreeColumnsToQuarks &lt; ActiveRecord::Migration
  def self.up
    sql = ActiveRecord::Base.connection()
    sql.execute "SET autocommit=0"
    sql.begin_db_transaction
    sql.execute("CREATE TABLE quarks_new LIKE quarks")
    add_column :quarks_new, :arbitrary_field1, :integer
    add_column :quarks_new, :arbitrary_field2, :string
    add_column :quarks_new, :arbitrary_field3, :integer
    sql.execute("INSERT INTO quarks_new SELECT *, NULL, NULL, NULL FROM quarks")
    rename_table :quarks, :quarks_old
    rename_table :quarks_new, :quarks
    sql.commit_db_transaction
    # don't forget to remove quarks_old someday
  end

  def self.down
    drop_table :quarks
    rename_table :quarks_old, :quarks
  end
end

You can change the NULLs into whatever default values you’d like the new columns in the existing rows to have.

How much faster can this be? On one table in one of our databases, a single add_column was approximately 17 minutes. When we used this technique, we reduced the time to add a column to approximately 45 seconds. YMMV —however you’ll notice a big improvement.

What about the indices? The CREATE TABLE .. LIKE preserves column attributes and indices. For more information, see the MySQL on line manual.

Rolling Deployments with Capistrano

Our rails application runs on a number of servers, each with a number of mongrels, and we use Capistrano to handle our deployments.

Typically, for deployments that involve changes to database structures (migrations), we put up our maintenance page, perform the deployment, and remove the maintenance page. That is all handled in one capistrano command:

cap production deploy:migrations

For most other changes, we deploy code and restart our servers without posting the maintenance page:

cap production deploy

All of the servers are deployed simultaneously, and all mongrels on our servers are then restarted simultaneously. There could be an extended delay (tens of seconds) for someone accessing our site during this period, as the mongrels that are restarting are unable to service requests.

A better experience for our customers would be to deploy the code simultaneously, but then restart the mongrels one-by-one. In this way, our overall raw capacity is diminished, but requests are being serviced continuously through the deployment interval.

We could spend a lot of time “shaving the yak” on this one, but keeping it simple works for most cases. Here’s what our restart task looks like:

desc <<-DESC
   Restart the Mongrel processes on the app server by starting and stopping the cluster. This uses the :use_sudo
   variable to determine whether to use sudo or not. By default, :use_sudo is set to true.
   If roll is true (either invoked via roll task or roll is set) will sleep between restarts of mongrels.
         DESC

task :restart, :roles =&gt; :app do
 sudo "/usr/bin/monit restart all -g #{daemon_group}"

 if exists?(:roll)
   rolling_restart_message
   mongrel_count = app_servers.first.attrib?('mongrel_servers').to_i
   starting_port = app_servers.first.attrib?('mongrel_port_number').to_i
   mongrel_count.times do |i|
     sleep rolling_delay # Give daemons and the other mongrel 10 seconds to recover.
     sudo "/usr/bin/monit restart mongrel_#{starting_port + i}"
   end
 else
   sudo "/usr/bin/monit restart all -g #{mongrel_group}"
 end
end

There’s nothing fancy here – we loop through all of the mongrels, and restart them one by one using monit, waiting 10 seconds between restarts.

This type of deployment must be used carefully, since different mongrels will be running different code—any mongrel could service any request at any time. Generally, any time controller actions, validations, or state-encoding methods (e.g. session variables) are changed, rolling deploys can’t be used. But to fix an embarrassing misspelling during prime time with no downtime, it can’t be beat.

Encrypting your files with Rails – Part II

This is the second post in a two part series (part 1 is here) on adding encryption to attachment_fu for Rails applications.

What about making the file available for download? AVOIDING THE ISSUE OF SCALABILITY FOR A MOMENT (since sendfile is not the right way to serve files from Rails), we want to use a variant of sendfile to do the decryption and send the file. Here’s a modified version of send_file that uses an extra hash parameter (acme) to decrypt if provided:

module ActionController
   module Streaming

   def send_file_x(path, options = {}) #:doc:
      raise MissingFile, "Cannot read file #{path}" unless File.file?(path) and File.readable?(path)

      options[:length]   ||= File.size(path)
      options[:filename] ||= File.basename(path)
      send_file_headers! options

      @performed_render = false
      logger.warn("Sending file #{path}")
      if options[:stream]
        render :status => options[:status], :text => Proc.new { |response, output|
          logger.info "Streaming file #{path}" unless logger.nil?
          len = options[:buffer_size] || 4096
          if options[:acme]
            c = OpenSSL::Cipher::Cipher.new("aes-256-cbc")
            c.decrypt
            c.key = key = options[:acme]
            c.iv = iv = Digest::SHA1.hexdigest("OneFishTwoFish")
          end
          File.open(path, 'rb') do |file|
            while buf = file.read(len)
              if options[:acme]
                output.write(c.update(buf))
              else
                output.write(buf)
              end
            end
          end
        }
      else
        logger.info "Sending file #{path}" unless logger.nil?
        File.open(path, 'rb') { |file| render :status => options[:status], :text => file.read }
      end
    end
  end
end

The code could be made more efficient by not performing the options[:acme] test each time a buffer is written. Our controller action that downloads a file would call it like so:

send_file_x(@file_item.stored_filename,
  :filename      =>  @file_item.filename,
  :type             =>  @file_item.content_type,
  :disposition  =>  'attachment',
  :stream    =>  'true',
  :buffer_size  =>  4096,
  :acme => @file_item.acme)

In a production environment, send_file consumes to many server resources – the rails application, and method used to service it (FastCGI, Mongrel, etc.) are tied up serving the file.

It’s more likely the case that the rails application will be behind a reverse proxy like nginx; in that case, a directive is sent to the server to provide the file (usually through an HTTP header). For nginx, serving a non-encrypted, static file would be done by sending a header with the location of a file:

if defined?(NGINX_FOR_DOWNLOAD) && NGINX_FOR_DOWNLOAD
  # code omitted – set up file name and path
  response.headers['X-Accel-Redirect'] = NGINX_PATH_FOR_ _DOWNLOAD + path
  response.headers['Content-Type'] = file_item.content_type
  render :nothing=>true;
else
  send_file_x(File.join(RAILS_ROOT, FILE_STORAGE_PATH, path_parts, file_item.filename),
      :type         => file_item.content_type,
      :disposition  => 'attachment',
      :stream    => 'true',
      :buffer_size  => 4096,
      :acme      => nil,
      :encoding     => 'utf8',
      :filename     => URI.encode(file_item.filename))
End

For more information on nginx and rails, learn more about NginxXSendfile.

To perform a similar feat of decrypting and sending a file for nginx, a new module would need to be written for nginx that takes an additional header variable ‘X-Accel-Redirect-Key’ and uses that as the key to send the file, decrypting as it goes.

Encrypting your files with Rails – Part I

This is the first post in a two part series on adding encryption to attachment_fu for Rails applications.

Let’s say that you’re building an application that will be used by a number of different people, and it involves storing information in files, and providing that information to the right person(s) at the right time.

There are choices about where to store the ‘files’ for users – in a database, in a file system (on disk), or in a specialized store (e.g. Amazon S3). For this example, we’re going to choose storing files ‘on disk’ in a file system – locally, across the network, NAS, SAN – wherever, as long as we can ‘see’ the information as a file.

One traditional way to enforce permissions on files is to have the file system itself enforce whether someone can or can’t have access to a file. In the old days, on *nix systems, that meant juggling user and group databases, and working within the ‘user-group-other’ paradigm. More modern attributed file systems make this easier, however that might not work with your web application, because you might not want to synchronize file system attribute information with your web identity information.

You might ultimately decide to control access yourself using some sort of User information DB (homegrown, LDAP, etc.), and explicitly control access by protecting the URLs which download specific files.

As a further measure, you might want to consider encrypting information on a file level. When a file is uploaded, you would generate an encryption key, encrypt the file on disk with that key, store the key separately, and use that key when a file is accessed (as close to the point that the file is downloaded as possible).

For prototype or low-volume applications, let’s look at what it would take to modify attachment_fu and a download controller action to accomplish this, and then extrapolate to what it might take in a higher-performance environment.

Attachment_fu is a nice plug-in to quickly enable file uploads. Plenty of examples are available on how to use it – we’re going to modify it to generate a an encryption password when a file is uploaded, then encrypt that file as it’s stored into the file system.
Here’s the original code for saveto_storage in attachmentfu/backends/filesystembackend.rb.

      #Saves the file to the file system
      def save_to_storage
        if save_attachment?
          # TODO: This overwrites the file if it exists, maybe have an allow_overwrite option?
          FileUtils.mkdir_p(File.dirname(full_filename))
          File.cp(temp_path, full_filename)
          File.chmod(attachment_options[:chmod] || 0644, full_filename)
        end
        @old_filename = nil
        true
      end

Borrowing some example code from OpenSSL for doing aes encryption (you could use a different method if you like), we’ve modified save_to_storage to accept a key acme, which is used to encrypt the file as it is copied from temporary storage.

      #Saves the file to the file system
      def save_to_storage(acme=nil)
        if save_attachment?
          # TODO: This overwrites the file if it exists, maybe have an allow_overwrite option?
          FileUtils.mkdir_p(File.dirname(full_filename))
          if acme.nil?
            File.cp(temp_path, full_filename)
          else
            c = OpenSSL::Cipher::Cipher.new("aes-256-cbc")
            c.encrypt
            #
            c.key = key = acme
            c.iv = iv = Digest::SHA1.hexdigest("OneFishTwoFish")
            output = File.open(full_filename,'wb')
            File.open(temp_path, 'rb') do |file|
              while buf = file.read(4096)
                output.write(c.update(buf))
              end
              file.close
            end
            output.write(c.final)
            output.close
          end
          File.chmod(attachment_options[:chmod] || 0644, full_filename)
        end
        @old_filename = nil
        true
      end

Where do we get ‘acme’? filesystembackend is called from attachmentfu.rb – afterprocessattachment needs to generate a key, pass it to savetostorage, and also make the key available in the model attachmentfu is mixed-in with.

 # Cleans up after processing.  Thumbnails are created, the attachment is stored to the backend, and the temp_paths are cleared.
    def after_process_attachment
      if @saved_attachment
        if respond_to?(:process_attachment_with_processing) && thumbnailable? && !attachment_options[:thumbnails].blank? && parent_id.nil?
          temp_file = temp_path || create_temp_file
          attachment_options[:thumbnails].each { |suffix, size| create_or_update_thumbnail(temp_file, suffix, *size) }
        end

        myacme = nil
        if attachment_options[:encrypted_storage]
          myacme = Digest::SHA1.hexdigest(Clock.now.to_i.to_s+rand.to_s)
        end

        save_to_storage(myacme)

        @temp_paths.clear
        @saved_attachment = nil
        write_attribute :acme, myacme
        callback :after_attachment_saved
      end
    end

Note that we’re using Clock.now.toi.tos+rand.to_s as the key… for a real-world example, you would likely additionally seed this with a phrase only known to you.

Your model code should have an attribute named acme, of type string. When your model object is saved, acme will also be updated.

UPDATE: Read Part II