In the Canal article, we introduced some methods of incremental data migration, but we are not very deep in the introduction of the migration of data as a whole. Here we want to introduce the migration of data as a whole, hoping to help you.

background

There is a famous line in Xing Ye’s A Chinese Odyssey: “I once had a sincere love that I didn’t cherish, and I only regretted it when I lost it. The most painful thing in the world is this. If God could give me a chance to do it again, I would say three words to any girl: I love you, if I have to put a time limit on this love, I hope it is ten thousand years!” In the eyes of our developers, this sentiment is the same as the data in our database. We hope that it will not change for ten thousand years, but things often go against our wishes. With the continuous development of the company and the continuous change of business, our requirements for data are also constantly changing, which may be as follows:

  • Division of database and table: the business development is getting faster and faster, resulting in more and more pressure on the stand-alone database and more and more data. At this time, the method of division of database is usually used to solve this problem, and the database traffic is evenly divided to different machines. From standalone database to sub-database this process, we need to complete the migration of our data, we can successfully sub-database in the way of using our data.
  • Change the storage medium: the sub-library introduced above, generally speaking, after we migrated, the storage medium is still the same, for example, the previous use of standalone Mysql, after the sub-database has become a multi-machine Mysql, our database table fields have not changed, migration is relatively simple. If you need a lot of complex queries, using Mysql may not be a viable solution. If you need to change the storage medium of the query, such as ElasticSearch, this migration will be slightly more complicated. It involves data conversion between different storage media.
  • Switching over new systems: In the rapid development of general companies, there will be a lot of repeated construction projects for fast speed. When the company is in a certain period of time, these projects will often be merged into a platform or medium platform, such as some of our membership systems, e-commerce systems and so on. This time often face a problem, need to transfer data from the old system to new system, is more complex, this time is likely to change is not only a storage medium, likely program language is different also, from the point of view of upper, departments may also different, so the difficulty of this kind of data migration is relatively high, the risk is more big.

In actual business development, we will make different migration schemes according to different situations. Next, we will discuss how to migrate data exactly.

Data migration

In fact, data migration is not accomplished overnight. Each data migration takes a long time, maybe a week or several months. Generally speaking, the process of data migration is similar to the following figure:‘First of all, we need to migrate the existing data in our database in batches, and then we need to deal with the new part of the data. We need to write this part of the data in real time after the original database and then write to our new storage. In this process, we need to constantly carry out data verification. When we check the basic problem is not big, then do the flow cutting operation, until the flow is completely cut, we can no longer do data verification and incremental data migration.

Stock data Migration

First of all, let’s talk about how stock data migration should be done. Stock data migration has been searched in the open source community and found no useful tools. Currently, The DTS of Ali Cloud provides stock data migration, and DTS supports migration between ishomogeneous and heterogeneous data sources. Basic support for common industry databases such as Mysql,Orcale,SQL Server, etc. DTS is more suitable for the first two scenarios mentioned before. One is the scenario of separate libraries. If the DRDS of Ali Cloud is used, data can be directly migrated to DRDS through DTS; the other is the scenario of heterogeneous data.

So how does DTS stock migration work? In fact, the following steps are relatively simple:

  1. When the stock migration task starts, we get the maximum and minimum ids that are currently to be migrated
  2. Set a segment, such as 10,000, and query 10,000 data each time from the minimum ID to the DTS server for PROCESSING. SQL is as follows:
select * from table_name where id > curId and id < curId + 10000;
Copy the code

3. When the ID is greater than or equal to maxId, the storage data migration task is complete

Of course, we may not use Ali Cloud in the actual migration process, or in our third scenario, we need to do a lot of conversion between database fields, which IS not supported by DTS, so we can imitate DTS and migrate data by reading data in batches in sections. What needs to be noted here is that we need to control the size and frequency of sections when migrating data in batches to prevent the normal operation of our line from being affected.

Incremental data migration

Storage data migration schemes are limited, but incremental data migration methods are a hundred flowers bloom. Generally speaking, we have the following methods:

  • DTS: Alibaba Cloud’S DTS is a one-stop service, providing both storage data migration and incremental data migration, but it needs to charge by volume.
  • Service double-write: More suitable for the system without switching migration, that is, only changed the storage but the system is still the same, such as sub-database sub-table, Redis data synchronization, this approach is relatively simple directly in the code to write the data that needs to be migrated, but because it is not the same database can not guarantee transactions, Data loss may occur during data migration. This process will be resolved through subsequent data verification.
  • MQ asynchronous write: This can be applied to all scenarios where an MQ message is sent when there is a data change and the consumer receives the message before the data is updated. This is similar to the double write above, but it makes the database operation MQ asynchronous so the probability of problems is much lower
  • Listen for binlogs: We can use canal or other open source services such as Databus to listen for binlogs in the same way as message MQ, except that we have omitted the sending of messages. This approach is basically minimal in terms of a development volume.

Which of these methods should we use? I personally recommend listening for the binlog, because listening for the binlog reduces the development cost, we just need to implement the consumer logic, the data is consistent, because it is listening for the binlog, there is no need to worry about the problem that the double write is not a transaction.

Data validation

Although many of the schemes mentioned above are mature cloud services (DTS) or middleware (CANAL), they are all likely to have some data loss. Generally speaking, data loss is relatively rare, but it is very difficult to troubleshoot. It is possible that DTS or CANAL accidentally shake. Or the loss of data caused by carelessness when receiving it. Since there is no way to prevent our data from being lost in the migration process, we should correct it by other means.

Generally speaking, there is a data verification step when migrating data, but different teams may choose different data verification schemes:

  • When we were in Meituan before, we would do a double read, that is, all our reads would read a copy from the new one, but the old one was still returned. At this time, we need to verify this part of the data. If there is any problem, we can send an alarm to repair manually or automatically. In this way, the commonly used data can be repaired quickly, and of course, a full data check will be run from time to time, but the time of the check to repair data is relatively lag.
  • Now after the ape counselling, before we didn’t use that way, because although double check read quickly found that the data is wrong, but we don’t have so high on this part of the data of a real-time calibration and double read a code around or slightly larger, but can’t rely on is not full quantity check regularly to ensure that, This will lead to our data verification time will be very prolonged. We adopted a compromise method. We learned from the idea of T+1 in the reconciliation. We obtained the data updated yesterday in the old database at dawn every day and compared it with the data in our new database one by one.

Of course, in the actual development process we also need to pay attention to the following points:

  • How to ensure the correctness of a data verification task? The verification task is originally to correct other data, but if it has problems, it will lose the significance of verification. At present, we can only rely on review code to ensure the correctness of the verification task.
  • When verifying tasks, you need to pay attention to the printing of logs. Sometimes problems may be directly caused by all data problems, so the verification task may type a large number of error logs, and then alarm, which may suspend the system or affect other people’s services. To make it easier, you could create warn for some non-manual alarms. To make it more complicated, you could package a tool that prints an error for a certain period of time and then no longer prints.
  • Verification tasks do not affect online services. Usually, verification tasks write a lot of query statements, which may result in batch table scanning. If the code is not written properly, the database may hang up.

Shear flow

When there is basically no error in our data verification, it shows that our migration program is relatively stable, so we can directly use our new data? Of course it is not possible, if we switch, smooth then of course it is very good, if there is a problem, so it will affect all users.

So what we need to do next is grayscale, tangent flow. For different services, the cutting flow dimension will be different. For the cutting flow of the user dimension, we usually cut the flow in the mode of taking the module of the userId. For the business of the tenant or merchant dimension, we need to cut the flow in the mode of taking the module of the tenant ID. Cut the flow need to develop a plan to cut flow, at any time and release how much traffic, and cut the flow must choose flow less time flow, cut every cut flow need to do to log detailed observations, there is a problem as soon as possible to repair, flow rate of a release process is a process from slow to fast, For example, at the beginning of the amount of 1% to continue to stack, to the back of the time we directly to 10%, 20% of the amount to fast volume. Because if there is a problem, it will often be found in small flow, if there is no problem with small flow, then the follow-up can be fast volume.

Note the primary key ID

In the process of data migration, pay special attention to the primary key ID. As mentioned in the above double-write scheme, the primary key ID needs to be specified manually to prevent the ID generation sequence from being wrong.

If we because depots table and migration, we need to consider our future can’t be the primary key Id Id, you need to use a distributed ids, is recommended here is Meituan open source leaf, he supports two modes is a snowflake algorithm increasing trend, but all Id is Long, suitable for the application of some support Long as Id. There’s also the number segment mode, which increases from a base ID that you set up. And basically go is memory generation, performance is also very fast.

Of course, we also have a case where we need to migrate the system. The primary key ID of the previous system already exists in the new system, so we need to do some mapping for our ID. If we already know which systems will be migrated in the future, we can use the reserved method. For example, the current data of system A is 100 million to 100 million, and the data of system B is also 100 million to 100 million. Now we need to merge the two systems into A new system. Then we can estimate some Buffer slightly, for example, leave 100 to 150 million for system A, so that A does not need to be mapped, and 150 to 300 million for system B, then we need to subtract 150 million when we convert to the old system Id, and finally the new Id of our new system will start to increase from 300 million. But what if there is no planned reservation in the system? You can do this in the following two ways:

  • It is necessary to add a new table and make a mapping record between the ID of the old system and the ID of the new system. This workload is relatively large, because our general migration will involve dozens or hundreds of tables, and the cost of recording is still very high.
  • If the id is Long, we can take advantage of the fact that Long is 64-bit. We can make a rule that the id of our new system starts with a relatively large number, for example, the number greater than Int. The part of the small Int can be left to our old system for ID migration. For example, the data volume of 150 million above is actually only 28 bits, and our Int is 32 bits, so there are still 4 bits available, which can represent the migration of 16 systems. Of course, if more systems are planned to migrate, you can set the id starting point of the new system to be larger. As shown below:

conclusion

Finally, to summarize the routine, there are four steps, one attention: storage, increment, check, cut flow, and finally pay attention to ID. No matter how big the magnitude of data, basically follow this routine migration will not appear a big problem. I hope this article can help you in your subsequent data migration work.

If you find this article helpful to you, your attention and forwarding will be my biggest support.