In this paper, byNetease cloudRelease.

We refer to the migration of part or all of the Schema and data in the database to another instance as instance migration. The exported instance is called the source instance, and the imported instance is called the target instance.

According to the different types of database migration, it can be divided into migration between the same database, for example, from MySQL to MySQL; And migrations across database types, such as from Oracle to MySQL. This paper will introduce the implementation of RDS instance migration function of netease cloud basic service, and discuss how to efficiently complete the instance migration task.

Usage scenarios

So why migrate MySQL instances? How to migrate in different scenarios? Instance migration scenarios can be summarized as follows: 1. Migration from self-built instances to RDS: In cloud services are not fully popularization, there is a lot of self-built database instance, netease company, for example, in the comprehensive business before the cloud, netease blog, netease mail products such as hundreds of MySQL instance is deployed on the physical server directly, and along with the expansion of business, for instance is necessarily expansion, specifications, and so on. Compared with self-built instances, RDS instances have natural advantages in fault handling, online capacity expansion and upgrade. Therefore, at present, most of the databases of Netease Internet products have migrated instances to RDS using the instance migration function.


2. Migrate from other public cloud platforms to RDS: Since the launch of netease Cloud Basic Service RDS, many users have migrated MySQL instances deployed on other public cloud platforms to RDS. We counted the instance migration function and found that 50% were MySQL instances used to migrate other public clouds.

Instance migration technology implementation

Before designing the instance migration function, we conducted a full survey on the public cloud in the industry. Only two mainstream public cloud platforms provide instance migration function, so why only two, mainly because providing online instance migration function needs to solve a series of problems, which can be summarized as follows: 1. How can I quickly back up consistent data from a source instance? 2. How to handle the impact on source instance services during backup? 3. How can I quickly import a backup to a target instance? 4. How do I synchronize incremental data from the source instance to the target instance? 5. How to ensure efficient instance migration? The following is a step-by-step analysis of how netease cloud basic service RDS solves these problems.

1 Multi-threaded logical backup

The method to solve the first problem is to export the consistency data of source instance by using multi-thread logical backup. There are many logical backup tools for MySQL data, including the classic mysqlPump, mysqlPump, Percona open source tool myDumper; The physical backup is mainly Percona’s Xtrabackup tool. As the saying goes, there is no best, only the best, so which of these backup tools is the best for instance migration? Our answer is mydumper. First of all, we excluded Xtrabackup. Although physical backup has advantages in performance, it cannot remotely backup the source instance. During instance migration, it is impossible to ask users to grant permission to operate the source instance server, especially when migrating RDS instances of other public cloud platforms. In addition, the backup data produced by a physical backup tends to be larger than the data exported from a logical backup, because Xtrabackup copies physical files directly, whereas a logical backup exports SQL statements. The following are the comparative test results of several backup tools for your reference:

After excluding physical backups, there are three options: mysqlDump, mysqlPump, and MyDumper. We ended up with MyDumper because myDumper is multithreaded. Wait a minute! Mysqlpump is multithreaded. Yes, mysqlPump’s multithreading is even more advanced than MyDumper’s (see References 1 and 2), but mysqlPump is table-level concurrency and still immature, whereas MyDumper is record-level concurrency, which makes it easier to take advantage of multithreading for scenarios with a single large table. You may be wondering how MyDumper implements record-level multithreaded consistent backup. Its backup flow chart is as follows:


Mydumper consists of the main thread and multiple worker threads for data consistency backup. The main thread performs FTWRL or Lock Tables tablelist Read block write operations to establish a consistency backup point and record the current BinLog and GTID. The worker thread sets the transaction level of each session to repeatable-read while the main thread still holds the lock and starts the snapshot read. Since each table cannot be written or updated at this time, the data read by the snapshot of the worker thread is the data of the consistency backup point established by the main thread. After all worker threads have started snapshot reading, the main thread releases the read lock if there is no non-transactional table such as MyISAM (see Reference 3 for a detailed analysis of the myDumper principle).

2 Service load monitoring and adjustment

Whether physical backup or logical backup, online database services will be affected to some extent. How to deal with the impact of the backup on the source instance business is the second problem we need to solve. RDS instances of netease cloud Infrastructure services are designed on the principle that online business is always more important than migration tasks. Since the monitoring data at the server layer where the source instance resides is not available, we do a lot of optimizations at the MySQL database layer to reduce the impact. Including the introduction of lock time timeout mechanism, intelligent adjustment of export concurrency based on business load and InnoDB Buffer Pool (BP) pollution control. As mentioned earlier, in order to obtain consistent data, various backup tools, including Xtrabackup and MyDumper, require a short read lock process for the source instance. Normally, this process is short, but there may be exceptions, such as a large MyISAM table in the source instance, which may hold the lock for a longer time. In order to be able to avoid the lock time is too long lead to the business of the write operation is blocked, the use of netease cloud infrastructure services (hives) during the instance migration to RDS, the user can choose if they hold read lock for the longest time, and as shown in the figure below, if more than the threshold value of time, will unlock unconditionally and migration operation fails, the user can choose to retry slack period in the business.

After the consistency snapshot is successfully locked and unlocked, various schemas and table data are exported. Users should choose the number of concurrent threads to export data according to the online service load of the source instance and the SERVER I/O capability of the instance, as shown in the figure above. Business load is not always predictable, but business is always the most important, so when there is a short business peak, we want to return the limited IO capacity of the server to the business rather than use it for migration. Netease Cloud Basic Service (HONEYCOMB) RDS provides a load monitoring threshold option. When the service load exceeds this threshold, the system suspends migration operations until the service load falls below the threshold again. If the user chooses multi-threaded export, the number of threads can be dynamically adjusted according to the service load to ensure that the data export operation can be completed as quickly as possible under the premise of prioritizing the service. The following figure shows an example of adaptive adjustment of export threads based on business load.

In the process of logical export, InnoDB BP parameters are selectively adjusted according to the user’s migration account rights to minimize the pollution of BP hotspot data by the query operation of migration connection. As far as possible, data entering BP due to migration should be kept on the cold data side of BP’s LRU List and replaced out of BP as soon as possible (see Reference 4). Of course, setting BP parameters requires the account to have Super permission, which cannot be optimized for the source instance on the public cloud.

3 Multi-threaded data import

Myloader, a multithreaded recovery tool matching MyDumper, is used to import the backup data to the target RDS instance. The execution process of MyLoader is shown in the figure below.

Because the target instance has no load, you can increase the number of concurrent threads to use up the I/O capacity of the target instance. In addition, during data import, we maximize the import performance by disabling slow log and binary log and setting innodb_flush_LOG_at_trx_COMMIT to 0. After data import is complete, the corresponding parameters are adjusted to their original values. This is how we solve the third problem.

4 Parallel filtering replication

After the data import is complete, the migration is complete for full migration scenarios. If you choose incremental migration, you also need to migrate the incremental data (Update/Delete) generated on the source instance during data export and import to the target instance. We use MySQL replication to synchronize these data. Because the replication of MySQL 5.5, 5.6, and 5.7 versions is quite different, we select the corresponding target instance version based on the source instance version. For MySQL 5.5 and later source instances, select InnoSQL 5.5.30 from netease as the target RDS instance version. For MySQL 5.6 and 5.7, select InnoSQL 5.7.12 as the target instance version. The reasons for the above version pairing are: First, we want users to adopt the latest stable version of MySQL 5.7 as much as possible, because MySQL 5.7 is the best version ever, bringing a lot of great features, including GTID-based replication, SYS tables, etc., while addressing and optimizing a lot of bugs or deficiencies compared to previous versions. Second, replication can be configured more easily. MySQL 5.7 provides gTID-based and binlog-based replication mechanisms to seamlessly adapt to different replication configurations of source instances. When migrating a source instance, users can migrate all or only part of the database on the instance. MySQL 5.7 can use the new “CHANGE REPLICATION FILTER” syntax to FILTER REPLICATION Settings online without restarting mysqld. InnoSQL 5.5.30 is used for the target instance because MySQL 5.5 or later does not meet the instance UUID required to replicate MySQL 5.7. Of course, compared to the community edition MySQL 5.5.30, InnoSQL 5.5.30 implements online filtering and replication. We use parallel replication technology to improve the efficiency of incremental data synchronization and quickly shorten the master-slave replication latency. Because the GTID feature of MySQL 5.6 is not perfect, the database-based parallel replication mode is adopted when the MySQL 5.6 is migrated to MySQL 5.7 to avoid the replication error caused by the failure of the parallel replication caused by the GTID EVENT in LOGICAL_CLOCK. Thus, the fourth problem has been solved.

How can migration be done efficiently

I believe we all agree that instance migration is a heavy operation, no one is idle to do an instance migration of online databases. Now that you’ve decided to migrate instances, you want to be able to do it all at once to avoid going back and forth. How to ensure efficient migration is particularly important. Users need to evaluate and prepare for migration first. Netease Cloud basic service RDS improves the success rate of migration by pre-checking migration and providing error retry.

1. Migration assessment and preparation

Users need to evaluate the migration before the migration, including selecting the migration period of service peak, so as to minimize the impact on services and improve the migration speed. Ensure that a one-time switchover can be performed when services are connected to the database, shortening the switchover time and avoiding data inconsistency caused by connecting some service logic to the source database and other service logic to the target database. Secondly, according to the amount of data to be migrated, the storage space of the target instance is reasonably selected to avoid failure due to insufficient target instance space. Finally, you need to create database accounts that meet migration requirements.

2. Pre-check

We want to identify and correct all the possible causes of migration failure before we start. Migration pre-check is an important means, mainly including user created in the source instance migration MySQL account permission check, MySQL parameter Settings check. The migration permission check is used to check whether the migrated account can be successfully migrated, including the permission to view the Schema of database definition, table definition, view, trigger, stored procedure, and function. Lock Table permissions for tables in the selected database, and Select permissions for data in the Table; If you select incremental migration, you also need to check whether the account has Replication slave and Replication Client rights. Check permissions required for migration by querying system libraries such as MySQL, INFORMATION_SCHEMA, or Performance_SCHEMA for the source instance. If incremental data synchronization is required, set parameters server_id and log_bin correctly for the source instance. If errors are found in the pre-check, a clear message is displayed and users are instructed to adjust parameters before the pre-check.

Error retry

During the migration, the progress display function is provided, as shown below:


A percentage with a progress bar is displayed for each phase of the migration and is refreshed automatically periodically. At the same time, the overall migration progress will be displayed for users to view at any time. If an error occurs during data export or import, an error message is displayed. The main cause of a migration error is that the MyISAM table times out. Based on the error message, you can modify the migration parameters and retry without restarting the migration.

4. End the migration

After verifying that there is no replication delay between the target and source instances, you can end the migration and switch the service IP to the target instance IP. Of course, before THE IP switch, please ensure that the database account required for service access has been created on the RDS instance of netease cloud basic service and given proper permissions.


Understand netease Cloud:

The official website of netease Cloud is www.163yun.com/

New user package: www.163yun.com/gift

Netease Cloud community: sq.163yun.com/