Monday, December 01, 2008

Speed up your MySQL data load

Perhaps you have some large datasets you would like to load into MySQL, such as a big text file generated from mysqldump that contains many sql INSERT statements. You can obviously populate your empty database by redirecting the content of this file into the mysql command line like such:


$ mysql -u root -p your_database < /tmp/table.dump


Another way to load data into your database or table is to use the sql syntax LOAD DATA INFILE. It requires you to provide a file that is in some delimiter format such as csv or tab-delimited values. One can use the mysqldump command line switches --fields-*-by and --tab to dump the content of the database to such format.

After you started either of these data load, you may want to kick off your shoes and take a nap, cos this will take a while if your MySQL server is not tuned and your dump file is pretty big (in this example, an InnoDB table of 790,000 rows [~500 MB dump] took 45 mins on my MBP using a Macports [mysql5 @5.0.67_1] installation with default settings). If you have to reload your database/tables often, this is unbearably slow.

Fortunately, there are a few MySQL tuning settings that you can specify at mysql server start-up time to tremendously speed up your data load time. Keep in mind, it is advisable to flush your MySQL logs (like the following) prior to tinkering with any log file size settings to avoid any log file corruptions:

(Before you proceed, I recommend you backup your data first)


$ sudo mysqladmin flush-logs
$ ... then shutdown your MySQL server ...
$ sudo rm /path/to/mysql/datadir/ib_logfile*
$ ... then start your MySQL server ...


So, now it's time to put in our magic sauce. On my Macports MySQL installation, the settings were meagerly defaulted to:


+---------------------------------+---------+
| Variable_name | Value |
+---------------------------------+---------+
| innodb_additional_mem_pool_size | 1048576 |
| innodb_buffer_pool_size | 8388608 |
| innodb_log_buffer_size | 1048576 |
| innodb_log_file_size | 5242880 |
| bulk_insert_buffer_size | 8388608 |
+---------------------------------+---------+


Add the following settings in your appropriate my.cnf file. Your settings may vary with the kind of machine you have (mine is a MBP 2GB RAM):


[mysqld]
innodb_additional_mem_pool_size=256M
innodb_buffer_pool_size=512M
innodb_log_buffer_size=256M
innodb_log_file_size=512M
bulk_insert_buffer_size=256M


Our settings will now look like:


+---------------------------------+-----------+
| Variable_name | Value |
+---------------------------------+-----------+
| innodb_additional_mem_pool_size | 268435456 |
| innodb_buffer_pool_size | 536870912 |
| innodb_log_buffer_size | 268435456 |
| innodb_log_file_size | 536870912 |
| bulk_insert_buffer_size | 268435456 |
+---------------------------------+-----------+


Then, as I reload my table, it only takes just 4' 55", a whopping 900% performance boost.

Bonus Tip:

You might notice that you cannot use SELECT COUNT(*) to track your data load progress while LOAD DATA INFILE is being executed (it always returns a count of zero). Instead, you can use the SHOW INNODB STATUS command. Find the line that says "undo log entries" under section TRANSACTIONS, and that number is the number of rows inserted so far. You can also look at how many inserts were performed per second by finding the line "inserts/s" under the ROW OPERATIONS section.