100 million level data migration
Background: Mysql5.6 + sub-database sub-table + cross-database instance, requires online migration + switching function
Overall implementation
Flowchat st=>start: records the export site (for incremental synchronization) op1=>operation: starts to export data. Op2 =>operation: uploads the exported data file to the parsing server. In1 => inputOutput: Con1 =>condition: checks whether all files are imported into the library. In2 => inputOutput: incremental synchronized data is imported into the library. Op3 => Operation: indicates the interface (or message queue stacking) on the flow limiting and migrating side. Check whether the incremental data is consistent. Op4 =>operation: Indicates whether to switch between incremental data. En =>end: St -> OP1 -> OP2 -> IN1 -> CON1 (yes)-> IN2 CON1 (NO)-> OP1 IN2 -> OP3 -> CON2 con2(yes)-> OP4 CON2 (NO)-> OP1 OP4 -> EN
Mysql Migration problem
Disk space
-- Check the disk space occupied by each database
select
TABLE_SCHEMA,
concat(truncate(sum(data_length)/1024/1024.2),' MB') as data_size,
concat(truncate(sum(index_length)/1024/1024.2),'MB') as index_size
from information_schema.tables
group by TABLE_SCHEMA
ORDER BY data_size desc
Copy the code
TABLE_SCHEMA | data_size | index_size |
---|---|---|
learn | 27.62 MB | 0.06 MB |
mysql | 2.23 MB | 0.22 MB |
information_schema | 0.15 MB | 0.00 MB |
sys | 0.01 MB | 0.00 MB |
performance_schema | 0.00 MB | 0.00 MB |
-- Check the disk space occupied by each table in the database
select
TABLE_NAME,
concat(truncate(data_length/1024/1024.2),' MB') as data_size,
concat(truncate(index_length/1024/1024.2),' MB') as index_size
from information_schema.tables
where TABLE_SCHEMA = 'learn'
order by data_size desc;
Copy the code
TABLE_NAME | data_size | index_size |
---|---|---|
test_index | 27.56 MB | 0.00 MB |
pay_deduction_flow | 0.01 MB | 0.01 MB |
nmc_task_prog | 0.01 MB | 0.03 MB |
nmc_task_act | 0.01 MB | 0.01 MB |
nmc_act_mab | 0.01 MB | 0.00 MB |
If you have a mysql console, it’s better to look at it directly. GodenDB, DRDS, etc., can see the disk space water level on the console
Calculate the space occupied by the table
Ensure that the data migration location is sufficient to prevent disk space overflow
- Algorithm formula:
- Field space size =
Field bytes
* number of rows - Index space size =
(key_length + 4) / 0.67
- PrimaryKey takes up no space and counts in the field space
- But if there is an index that matches the primary key field,
(key_length + 4) * 3/0.67
- Field space size =
You can find out how to calculate the size of the mysql tablespace by referring to this blog post, which provides the corresponding table building sentences, as well as the test for index building and checking the size of the mysql tablespace
How to Migrate Quickly
- On the serial number
- If you need to increment the serial number automatically, it will also cause frequent inserts. When migrating data, it is better to carry the primary key ID, delete the auto_INCREMENT first, and then set the auto_increment after the migration. In addition, it is also necessary to consider whether the tables are associated with primary key IDS during data migration. The primary key ID must be in the migration scope.
- The only index
- If you can get rid of it, you better get rid of it. If you can’t get rid of it,
Migrate data before inserting indexes
withInsert indexes before migrating data
The time difference is close, and if you can’t avoid a unique index scenario, you have to migrate as is.
- If you can get rid of it, you better get rid of it. If you can’t get rid of it,
- Data cleaning
- It is best to clear data from data sources before migration. Generally, not all data is valid. Therefore, try to reduce the amount of data to be migrated and the operation time
- Migration patterns
select import csv
Since THE mysql product I use doesn’t come with exported CSV content, this is on a Linux serverSelect concat(field,',')
Through concat statement full export to TXT filemysqldump
Note that single threading is enabled, which is recommended by Ali for less than 20 million data volumes, where it still takes a long time to export data.select concat
Export into a file, and then through the program to parse the file into the library, suitable for more than 20 million data source migration- Text capacity size, 100 million lines of TXT text is equivalent to 1G, with the process, should also pay attention to the reasonable reading of resources and timely release.
- After parsing the text, the text is sent to the message queue, and the message queue is asynchronously imported to the library, which speeds up the import speed.
- Message queues should pay attention to consuming threads to prevent the application from being overwhelmed by too many threads and the consumption rate falling behind the production rate
- If the data source is a sub-database and sub-table, you can send the export statement to the specified sub-database and sub-table to export multiple files based on the sub-database or sub-table and parse multiple files at the same time to speed up file parsing.
Data parsing import
- A preconfigured template program that reads files and parses files through the preconfigured template by configuring the order, size, spacing, and so on of the file fields in the database or in the configuration file
- In addition to parsing file templates, you can configure handler templates
- Such as configuring spring services to fetch beans through Spring to process business logic
- Obtain the service through Reflection Reflection for processing
- Configure the message queue content, send it to the message queue, and implement a set of processing logic subscribe message queue
Data parsing import
- Preset the processing logic, configure the field name, order, length, delimiter, etc., through the FORM of JSON can be customized, expanded, every time you need to parse a file, only need to configure the logical name, SQL name, related fields, etc.