100 million level data migration

Background: Mysql5.6 + sub-database sub-table + cross-database instance, requires online migration + switching function

Overall implementation

Flowchat st=>start: records the export site (for incremental synchronization) op1=>operation: starts to export data. Op2 =>operation: uploads the exported data file to the parsing server. In1 => inputOutput: Con1 =>condition: checks whether all files are imported into the library. In2 => inputOutput: incremental synchronized data is imported into the library. Op3 => Operation: indicates the interface (or message queue stacking) on the flow limiting and migrating side. Check whether the incremental data is consistent. Op4 =>operation: Indicates whether to switch between incremental data. En =>end: St -> OP1 -> OP2 -> IN1 -> CON1 (yes)-> IN2 CON1 (NO)-> OP1 IN2 -> OP3 -> CON2 con2(yes)-> OP4 CON2 (NO)-> OP1 OP4 -> EN

Mysql Migration problem

Disk space

-- Check the disk space occupied by each database
select
TABLE_SCHEMA,
concat(truncate(sum(data_length)/1024/1024.2),' MB') as data_size,
concat(truncate(sum(index_length)/1024/1024.2),'MB') as index_size
from information_schema.tables
group by TABLE_SCHEMA
ORDER BY data_size desc
Copy the code
TABLE_SCHEMA data_size index_size
learn 27.62 MB 0.06 MB
mysql 2.23 MB 0.22 MB
information_schema 0.15 MB 0.00 MB
sys 0.01 MB 0.00 MB
performance_schema 0.00 MB 0.00 MB
-- Check the disk space occupied by each table in the database
select
TABLE_NAME,
concat(truncate(data_length/1024/1024.2),' MB') as data_size,
concat(truncate(index_length/1024/1024.2),' MB') as index_size
from information_schema.tables
where TABLE_SCHEMA = 'learn'
order by data_size desc;
Copy the code
TABLE_NAME data_size index_size
test_index 27.56 MB 0.00 MB
pay_deduction_flow 0.01 MB 0.01 MB
nmc_task_prog 0.01 MB 0.03 MB
nmc_task_act 0.01 MB 0.01 MB
nmc_act_mab 0.01 MB 0.00 MB

If you have a mysql console, it’s better to look at it directly. GodenDB, DRDS, etc., can see the disk space water level on the console

Calculate the space occupied by the table

Ensure that the data migration location is sufficient to prevent disk space overflow

  • Algorithm formula:
    • Field space size =Field bytes* number of rows
    • Index space size =(key_length + 4) / 0.67
      • PrimaryKey takes up no space and counts in the field space
      • But if there is an index that matches the primary key field,(key_length + 4) * 3/0.67

You can find out how to calculate the size of the mysql tablespace by referring to this blog post, which provides the corresponding table building sentences, as well as the test for index building and checking the size of the mysql tablespace

How to Migrate Quickly

  • On the serial number
    • If you need to increment the serial number automatically, it will also cause frequent inserts. When migrating data, it is better to carry the primary key ID, delete the auto_INCREMENT first, and then set the auto_increment after the migration. In addition, it is also necessary to consider whether the tables are associated with primary key IDS during data migration. The primary key ID must be in the migration scope.
  • The only index
    • If you can get rid of it, you better get rid of it. If you can’t get rid of it,Migrate data before inserting indexeswithInsert indexes before migrating dataThe time difference is close, and if you can’t avoid a unique index scenario, you have to migrate as is.
  • Data cleaning
    • It is best to clear data from data sources before migration. Generally, not all data is valid. Therefore, try to reduce the amount of data to be migrated and the operation time
  • Migration patterns
    • select import csvSince THE mysql product I use doesn’t come with exported CSV content, this is on a Linux serverSelect concat(field,',')Through concat statement full export to TXT file
    • mysqldumpNote that single threading is enabled, which is recommended by Ali for less than 20 million data volumes, where it still takes a long time to export data.
    • select concatExport into a file, and then through the program to parse the file into the library, suitable for more than 20 million data source migration
      • Text capacity size, 100 million lines of TXT text is equivalent to 1G, with the process, should also pay attention to the reasonable reading of resources and timely release.
      • After parsing the text, the text is sent to the message queue, and the message queue is asynchronously imported to the library, which speeds up the import speed.
        • Message queues should pay attention to consuming threads to prevent the application from being overwhelmed by too many threads and the consumption rate falling behind the production rate
      • If the data source is a sub-database and sub-table, you can send the export statement to the specified sub-database and sub-table to export multiple files based on the sub-database or sub-table and parse multiple files at the same time to speed up file parsing.

Data parsing import

  • A preconfigured template program that reads files and parses files through the preconfigured template by configuring the order, size, spacing, and so on of the file fields in the database or in the configuration file
  • In addition to parsing file templates, you can configure handler templates
    • Such as configuring spring services to fetch beans through Spring to process business logic
    • Obtain the service through Reflection Reflection for processing
    • Configure the message queue content, send it to the message queue, and implement a set of processing logic subscribe message queue

Data parsing import

  • Preset the processing logic, configure the field name, order, length, delimiter, etc., through the FORM of JSON can be customized, expanded, every time you need to parse a file, only need to configure the logical name, SQL name, related fields, etc.