Adding Columns to large MySQL Tables Quickly

Suppose that you have a MySQL Database, and in that database you have a non-trivial table with more than a million records. If you’re using that table with your Rails application, you might at some point like to add some additional columns.

It’s tempting to just write the migration like:

class AddThreeColumnsToQuarks < ActiveRecord::Migration
  def self.up
    add_column :quarks, :arbitrary_field1, :integer
    add_column :quarks, :arbitrary_field2, :string
    add_column :quarks, :arbitrary_field3, :integer
  end

  def self.down
    remove_column :quarks, :arbitrary_field1
    remove_column :quarks, :arbitrary_field2
    remove_column :quarks, :arbitrary_field3
  end
end

Should you do that, you will find that although it works, MySQL will take a fantastic amount of time to add the column when you have a lot of rows. What ActiveRecord is doing is adding each column individually with an alter statement:

ALTER TABLE `quarks` ADD `arbitrary_field1` int(11)
ALTER TABLE `quarks` ADD `arbitrary_field2` varchar(255)
ALTER TABLE `quarks` ADD `arbitrary_field3` int(11)

Each one of those ALTER statements makes a new temporary table, copies records from your existing table into the new table, and then replaces the old table with the new table. Five thousand records in the database? Adding three columns will copy the DB three times. Fifteen thousand rows are copied.

One can make this better by combining the ALTERs into one statement (as long as the ALTER contains a single type of operation, such as ADD COLUMN). The copy of the data in the table still takes a while. A few million rows? You might be waiting tens of minutes.

A FASTER way of adding columns is to create your own new table, then SELECT all of the rows from the existing table into it. You can create the structure from the existing table, then modify the structure however you’d like, then SELECT in the data. MAKE SURE that you SELECT the information into the new table in the same order as the fields are defined. Here’s an example:

class AddThreeColumnsToQuarks &lt; ActiveRecord::Migration
  def self.up
    sql = ActiveRecord::Base.connection()
    sql.execute "SET autocommit=0"
    sql.begin_db_transaction
    sql.execute("CREATE TABLE quarks_new LIKE quarks")
    add_column :quarks_new, :arbitrary_field1, :integer
    add_column :quarks_new, :arbitrary_field2, :string
    add_column :quarks_new, :arbitrary_field3, :integer
    sql.execute("INSERT INTO quarks_new SELECT *, NULL, NULL, NULL FROM quarks")
    rename_table :quarks, :quarks_old
    rename_table :quarks_new, :quarks
    sql.commit_db_transaction
    # don't forget to remove quarks_old someday
  end

  def self.down
    drop_table :quarks
    rename_table :quarks_old, :quarks
  end
end

You can change the NULLs into whatever default values you’d like the new columns in the existing rows to have.

How much faster can this be? On one table in one of our databases, a single add_column was approximately 17 minutes. When we used this technique, we reduced the time to add a column to approximately 45 seconds. YMMV —however you’ll notice a big improvement.

What about the indices? The CREATE TABLE .. LIKE preserves column attributes and indices. For more information, see the MySQL on line manual.

4 thoughts on “Adding Columns to large MySQL Tables Quickly

  1. Thx for your blog post, it was usefull.

    A warning though, using transactions as you do will not update schema_info table, so you end up with previous db:version. Mysql does not handle DDL within transactions anyway.

    My implementation:

    class ActiveRecord::Migration

    def self.fast_add_columns(table_name, new_columns_count)
    tmp_table_name = "#{table_name}_tmp"
    execute "DROP TABLE IF EXISTS #{tmp_table_name}"
    execute "CREATE TABLE #{tmp_table_name} LIKE #{table_name}"

    yield tmp_table_name

    execute("INSERT INTO #{tmp_table_name} SELECT * #{', NULL' * new_columns_count} FROM #{table_name}")
    drop_table table_name
    rename_table tmp_table_name, table_name
    end

    end

  2. If the table is huge, say 10-100 million rows, the insert operation (here INSERT INTO quarks_new SELECT * FROM quarks;) still takes a lot of time. Unfortunately MySQL innodb tables do not allow to disable indices. Is there a way to increase the performance of the insert?

  3. This is how it worked for me:

    class AddThreeColumnsToQuarks < ActiveRecord::Migration
    def self.up
    sql = ActiveRecord::Base.connection()
    sql.execute "SET autocommit=0"
    sql.transaction do
    sql.execute("CREATE TABLE quarks_new LIKE quarks")
    add_column :quarks_new, :arbitrary_field1, :integer
    add_column :quarks_new, :arbitrary_field2, :string
    add_column :quarks_new, :arbitrary_field3, :integer
    sql.execute("INSERT INTO quarks_new SELECT *, NULL, NULL, NULL FROM quarks")
    rename_table :quarks, :quarks_old
    rename_table :quarks_new, :quarks
    end
    sql.execute "SET autocommit=1"
    # don't forget to remove quarks_old someday
    end

    def self.down
    drop_table :quarks
    rename_table :quarks_old, :quarks
    end

Comments are closed.