Fixing a long standing bug in the Ruby Amazon S3 Library
The AWS::S3 library for Ruby has been around since the release of Amazon S3 in 2006; hundreds, if not thousands, of applications use it. Consequently, it is not usually “the suspect” when looking for the cause of intermittent access errors to S3. However, we recently found and fixed an error that has been present in the signature calculation method since the library was first released.
We use S3 as the backing store for Onehub Workspaces, and we do a lot of S3 operations. During routine log monitoring we noticed a slow, but persistent, stream of HTTP 403 (Unauthorized Access) errors from S3. These errors were not frequent enough to cause problems for our customers; applications using S3 should be designed anticipate errors, and retry. Still, we felt that further investigation was warranted.
To manage the logs generated by all of the Onehub services, we use Papertrail. Papertrail allows us to run a real-time search against our production logs, showing us requests to S3 like this:
https://s3.amazonaws.com/<bucket>/<object>?AWSAccessKeyId=<ouraccesskey>&Expires=1328127911&Signature=l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D
The components used to calculate the signature are well documented by Amazon. When a signature fails, S3 will provide the components it attempted to use in the XML returned with the error message. We noticed that the signatures in these errors were different than those that should have been calculated for the provided Expires time. We monkey-patched the #canonical_signature
method of AWS::S3::Authentication::Signature
to handle a closure.
module AWS module S3 class Authentication class Signature private def encoded_canonical digest = OpenSSL::Digest::Digest.new('sha1') b64_hmac = [OpenSSL::HMAC.digest(digest, secret_access_key, canonical_string)].pack("m").strip if options[:debug_proc] options[:debug_proc].call(sprintf("AWS::S3::Authentication::Signature - request %s encoded canonical: %s %s canonical: [%s]", @request.path, b64_hmac, CGI.escape(b64_hmac), canonical_string)) end url_encode? ? CGI.escape(b64_hmac) : b64_hmac end end end end end
This enabled us to pass in an option to AWS::S3#url_for
containing a closure with our debugging method.
options = options.merge({:debug_proc => lambda{|x| logger.warn(x)}}) the_url = AssetStore.url_for(key_name, options)
We put this through testing, and into production, then waited for the next error to appear.
AWS::S3::Authentication::Signature - request /<bucket>/<keyname> encoded canonical: l74ewTX9hh0s2oiLoIY83V+lLuM= l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D canonical: [GET#012#012#0121328127912#012/<bucket>/<keyname>] https://s3.amazonaws.com/<bucket>/<object>?AWSAccessKeyId=<OURACCESSKEY>&Expires=1328127911&Signature=l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D
From here we could see the error. The Expires
time used to calculate the signature was different that the time provided in the URL. The value 1328127911
is in the URL, while 1328127912
was used to calculate the signature!
But why?
It took a bit of digging through the AWS::S3
source, but we found the culprit. When generating these S3 URLs, we pass an expires_in
option to AWS::S3#url_for
.
# Signature is the abstract super class for the Header and QueryString authentication methods. It does the job # of computing the canonical_string using the CanonicalString class as well as encoding the canonical string. The subclasses # parameterize these computations and arrange them in a string form appropriate to how they are used, in one case a http request # header value, and in the other case key/value query string parameter pairs. class Signature < String #:nodoc: attr_reader :request, :access_key_id, :secret_access_key, :options def initialize(request, access_key_id, secret_access_key, options = {}) super() @request, @access_key_id, @secret_access_key = request, access_key_id, secret_access_key @options = options end private def canonical_string options = {} options[:expires] = expires if expires? CanonicalString.new(request, options) end memoized :canonical_string def encoded_canonical digest = OpenSSL::Digest::Digest.new('sha1') b64_hmac = [OpenSSL::HMAC.digest(digest, secret_access_key, canonical_string)].pack("m").strip url_encode? ? CGI.escape(b64_hmac) : b64_hmac end def url_encode? !@options[:url_encode].nil? end def expires? is_a? QueryString end def date request['date'].to_s.strip.empty? ? Time.now : Time.parse(request['date']) end end # Provides query string authentication by computing the three authorization parameters: AWSAccessKeyId, Expires and Signature. # More details about the various authentication schemes can be found in the docs for its containing module, Authentication. class QueryString < Signature #:nodoc: constant :DEFAULT_EXPIRY, 300 # 5 minutes def initialize(*args) super options[:url_encode] = true self << build end private # Will return one of three values, in the following order of precedence: # # 1) Seconds since the epoch explicitly passed in the +:expires+ option # 2) The current time in seconds since the epoch plus the number of seconds passed in # the +:expires_in+ option # 3) The current time in seconds since the epoch plus the default number of seconds (60 seconds) def expires return options[:expires] if options[:expires] date.to_i + expires_in end def expires_in options.has_key?(:expires_in) ? Integer(options[:expires_in]) : DEFAULT_EXPIRY end # Keep in alphabetical order def build "AWSAccessKeyId=#{access_key_id}&Expires=#{expires}&Signature=#{encoded_canonical}" end end
The #initialize
method is the entry point, but most of the work is done by #build
. The bug was immediately apparent once we looked at #expires
. Because #build
calls #expires
, and then #encoded_canonical
calls it later, the date used can change. The #date
method uses Time.now
, if these calls happened on different seconds, they would result in different values. The solution is to memoize the time; it could be done in #expires
or #date
.
def expires return options[:expires] if options[:expires] @expires ||= date.to_i + expires_in end
Interestingly, this error is only possible if the expires_in
option is used. We suspect most people either use the library’s DEFAULT_EXPIRY
or pass in an expires
option, both of which cause #expires
to avoid the call to #date
.
After a bit of testing we put this code into production and have eliminated these errors, resulting in better performance for our customers. We have also submitted a pull request to the library maintainer.