Backing up large MySQL databases
The data backend for our server monitoring application, Server Density, uses MySQL. We are investigating using a non-RDBMS for the metrics data but plan to continue using MySQL for less intensive use such as storing users. In the meantime, we had to deal with a problem where we have a large amount of data that needs backing up each day.
MySQL has a number of backup tools bundled with it that are generally very good. Initially we simply used mysqldump to create a SQL output for each customer database. This has the advantage of being easy to restore and it does not lock the tables. It is important for us to ensure that tables are not locked because we are doing at least 1 insert every 60 seconds (for the monitoring data from 1 server) and need to read that data in almost-real-time to determine whether alerts need to be triggered.
The locking problem can be resolved by using replication to “mirror” the data on another server and take it offline to perform backups, then let it catch up again later. However, we are doing a huge number of queries so aside from the catchup time, the initial import (and any subsequent re-import if the replication failed and had to be restarted) proved very difficult to do. So difficult as to make it impractical.
This kind of problem comes when tables start getting large. The server snapshots feature stores a full list of processes and associated information for each minute for a month, which amounts to a large quantity of raw data. Backing this up even into a single file (from mysqldump) would result in the files being many hundreds of MBs for just one table. These are difficult to manage and very slow to restore. And copying the raw files requires replication to allow the shutdown of the server for the copying (you cannot just copy the files whilst the server is in use).
The solution that we discovered was to use a tool called mk-parallel-dump and it’s companion mk-parallel-restore, both part of the maatkit MySQL toolkit. They still use mysqldump (no locking, portable data dump into SQL) but they do so using threads to make the backups extremely quick. They have a large number of options including splitting the dump into many files and will only re-dump data if it has changed since the last backup, on a per table level. The backups are also compressed.
During testing, we decided not to backup the server snapshot data which is not critically important and now allows us to backup all customer databases in around 90 seconds. Restoring every database (about 20GB in total) in a test took about 300 seconds.
The database is a very important aspect of any application and scaling and backups are difficult technical problems, particularly when you have large quantities of data. 20GB is a significant database size and will continue to grow, but we’re confident this part of our backup strategy will scale nicely until we switch to a clustered non-RDBMS, planned for the coming months.
Enjoy this post? You may also like Multi data center redundancy – sysadmin considerations