Abstract
In the last article, we introduced the routine of data migration, but did not introduce the specific scheme. This article focuses on the specific data migration scheme
I. Design objectives
To design a data migration solution, you need to achieve the following goals
-
The migration speed QPS needs to reach 1K, so as to ensure that 100 million data can be completed in 1~2 days
-
Migration QPS Controllable migration may affect online services and requires dynamic QPS adjustment
-
Data integrity, no loss of data can not be omitted, although we have the process of data verification, but when designing the data migration scheme, it is necessary to pack as much data as possible to avoid loss.
-
Schedule controllable migration process can be interrupted, retried. For example, migrate 1/10 of the data first, and then move on
Ii. Architecture design
A data migration task consists of three steps, as shown in the following figure
Because of the speed of migration, we broke down each step to make sure that each part could be asynchronized and processed concurrently. This will speed things up.
Through the data
Complete traversal of the old database. Different databases have different methods. For example, for mysql, you can use a ready-made binlog, which contains the full amount of data.
For other databases, there are usually two scenarios
- Single-thread cursor traversal one-way traversal, this does not need to manage data synchronization between threads, relatively simple implementation. Why not directly check the data record, because compared with the cursor, check the data record is relatively large, large network overhead. Cursors can pull up to 1000 numbers at a time, taking up only a small amount of data and allocating it in memory. The acquisition time of a single value is much less than 1ms, so the QPS traversing the data block can easily reach more than 1K even for a single thread.
At the same time because it is sequential traversal, so you can ensure that the data will not be lost. The data written to the task queue after successful traversal can be recorded to a storage, such as Redis, which can ensure cursor interrupt, or after service restart, can continue traversal from this key, so as to achieve interruptible migration 2. Multithreaded block traversal needs to fragment data in advance to ensure that each piece does not conflict. For example, all the data is distributed from A to Z. So we could have 26 threads, each of which is responsible for traversing A and Z. Because of the different data
Task queue
Task queues require high concurrency writes and can support long time storage. Kafka, RocketMQ and other message queues can meet, QPS can reach more than ten thousand level, can meet the performance requirements of the current scheme.
Batch commits can also be used to further speed up writing.
Write a new library
Writing to a new library is not required to be sequential, so supporting horizontal scaling can be infinitely faster.
Three. Concrete implementation
It is necessary to select specific implementation solutions based on specific business scenarios and existing infrastructure of the company.
Scenario 1: A small amount of data, less than 10 million
This can be done using a local thread pool, without the need for additional task queues. A simple and efficient
Scenario 2: A large amount of data is required and the company has an offline data processing infrastructure
The database to Kafka component writes data to Kafka and then writes processing jobs to Flink.
Scenario 3: The company has a large amount of data and no infrastructure
Iterate through the old database, write to the message queue, then listen for the message, query the data, write to the new library. It’s also easy to implement.