Data migration case analysis
The article addresses: blog.piaoruiqing.com/2019/10/27/…
preface
During data migration, to ensure data consistency, services are usually stopped. During this period, services cannot be provided to users or only partial services can be provided. At the same time, in order to ensure the correctness of services and data after migration, post-migration testing also takes up a lot of time. The loss caused by this is relatively large.
Next, this article will discuss how to carry out data migration in the case of non-stop service.
case
The order system has a set of order tables:
Database: MySQL
Table name: order_{0~19}, {0~19} suffix, a total of 20 tables.
Primary key: order_id, order ID, obtained by snowflake algorithm, can obtain the creation time by ID.
Original sub-table policy: order_id % 20
With the increase of service volume, the data volume of each sub-table exceeds tens of millions. If this situation continues, serious performance problems will occur. In this case, the original sub-table must be migrated.
Requirements:
- Migrate data from the original 20 sub-tables to the new table
- The whole process of migration can not be stopped, to provide complete external services.
- Provides a complete rollback solution. Data generated during migration cannot be lost or manually modified.
Analysis of the
Readers who have had some experience with separate tables may have found the strategy quite irrational in this case, leaving the reason behind (after all, after a few waves of people, it’s no longer possible to find the person to hang and beat up).
Analyze the raw data table: the order data will definitely increase with time and the volume of business, and the fixed number of sub-tables will lead to performance degradation as the volume of data increases. Therefore, after data migration, the number of sub-tables can no longer be fixed, even if it is changed from 20 to 100, it will reach the bottleneck one day.
The order data grows over time and becomes cold data after the refund period, with reduced usage. Therefore, it is a good choice to table orders by the time they were created. It is important to note that order_id is obtained by the snowflake algorithm. The creation time can be obtained from order_id, and the shard key can be obtained directly from order_id.
Analysis of Migration Scheme
There are different data migration schemes from the business layer to the database layer. Let’s first list some for comparison:
-
Service layer: Hardcoded at the service layer, data is double-written and divided at a certain point in time. New data is written to a new table at the same time, and the old data is migrated to the new table after running for a period of time. High cost, serious business coupling, no consideration.
-
Connection layer: it is an advanced version of Scheme 1. It intercepts SQL for double-write at the connection layer and decouples services. However, it has the same problem with Scheme 1: the cycle is long, and the old data can only be migrated to ensure that it does not change.
-
Trigger: Synchronizes newly generated data to a new table through a trigger, essentially the same as 2.
-
Database log: Backs up the database at a time point T, migrates the data in the backup database to a new table, reads the logs at a time point T, restores the data to the new table, and writes data continuously. After the two data keep in sync, the new code is launched.
-
Disguised slave library: Compared with solution 4, there is no need to read logs directly, which solves the problem that the database cannot read logs directly on the cloud.
By comparison, both schemes 4 and 5 are optional. Since the database is on the cloud, it is not convenient to read logs directly, and scheme 5 has mature open source middleware ** Canal ** available, so the author chooses Scheme 5.
Canal File address: github.com/alibaba/can…
Rollback scheme analysis
When a new code goes live, you can never be 100% sure. If the migration fails, rollback must be performed. Therefore, you need to ensure that the original data and new data synchronization.
Therefore, on the basis of scheme 5 in the previous section, after traffic is cut to the new cluster, we stop data synchronization and start to synchronize data from the new table to the old table at the time of traffic cut. The scheme is also disguised as slave database. This ensures data synchronization between the old and new tables. If an exception occurs, traffic is switched back to the old cluster.
Overall scheme design
Backing up Source Data
- perform
flush logs
: Generates a new binlog from which data recovery will start. - Backup data table (order_{0~19}): Copy the source (old) data table from master database A to backup database B
Restore and synchronize data
- Create enough new tables in main library A, order new table by month.
- Write A script to read the order table in the backup library B, write A new order table in the master library A.
- The old table data is synchronized to the new table through canal and is named [synchronization procedure -A].
online
- Compile the new code and play a new cluster to make sure it is fully started.
- perform
flush logs
A new binlog is generated from where the new table synchronizes data to the old table. - Traffic is cut to the new cluster.
- Stop [synchronization process -A].
- Start to synchronize data from the new table to the old table.
The fallback
Test the data in a timely manner, and switch the traffic back to the old cluster once serious exceptions are found.
conclusion
flash logs
Do this before backing up the source data table, so that even a small interval of time does not affect the final consistency of the data (always listen to binlog).- Data is priceless, so exercise caution.
If this article is helpful to you, please give a thumbs up (~ ▽ ~)”
Recommended reading
- Open API Gateway Practice # 1 — Design an API gateway
- Open API Gateway Practice ii – Replay attack and Defense
- Open API Gateway Practice iii – Limiting traffic
- Build K8S from scratch with official documentation
- Kubernetes(2) Application deployment
- How do I access the service from outside
Welcome to pay attention to the public account (code such as poem)
This article was published on
Park Seo-kyung’s blog, allow non-commercial reprint, but reprint must retain the original author
PiaoRuiQingAnd links:
blog.piaoruiqing.comFor negotiation or cooperation on authorization, please contact:
[email protected].