(image via https://pixabay.com/en/military-stealth-bomber-refueling-602729/)
In the previous article from SQL Server to MySQL (1) : Heterogeneous Database migration – Log4D, we introduced the basic problems and the full solution of heterogeneous database migration from SQL Server to MySQL. The full solution can meet the requirements of some scenarios, but this solution is still flawed: the migration process requires downtime, and the duration of the downtime depends on the amount of data. For the core business, downtime means loss. For example, a user-centric service that uses a full solution with its data volume can result in several hours of downtime during migration. Once the user center goes out of service, almost all systems that depend on this central service go down.
Can you do a seamless online migration? Does the system require no or only very brief downtime? As a technical person with pursuit, we must think of ways to solve the above problems.
Principles and processes of online migration
For Oracle to MySQL, there are mature solutions on the market – YuGONG project of Alibaba. Before we solve the online migration of SQL Server to MySQL, let’s take a look at how Yugong does the online migration of Oracle.
The following figure shows yugong’s incremental migration process for Oracle to MySQL:
There are four steps:
- Incremental data collection (creating incremental materialized views of Oracle tables)
- Make a full copy
- Incremental replication (data verification can be performed in parallel)
- Stop writing in the old library and cut to the new library
Oracle Materialized View is a mechanism provided by Oracle. A materialized view is a copy of the master library at a point in time. It can be understood as a Snapshot at that point in time. When the data of the master library is continuously updated, materialized views can be updated in a separate batch, called Refreshes. Changes between a batch of Refreshes correspond to changes in the database content. Materialized views are often used to copy data from a master library to a slave library and to cache complex queries in a data warehouse.
Materialized views can be configured in a variety of ways, but refresh mode and refresh time are important here. There are three refresh modes:
- Complete Refresh: Delete all data records and regenerate the materialized view
- Fast Refresh: Refresh incrementally
- Force Refresh: Use Complete Refresh and Fast Refres depending on the conditions
The Refresh mechanism has two modes: Refresh-on-commit and Refresh-on- Demand.
Based on materialized view, Oracle can complete the acquisition of incremental data, so as to meet ali’s online data migration. To generalize this technical question, what are the features required for online incremental migration? We reached the following conclusion (for the source database) :
- Incremental changes: Support incremental access to incremental database changes
- Latency: The time required to obtain the changed data should be as low as possible
- Idempotent consistency: Should changing data be consumed idempotent, i.e. regardless of the state of the existing data in the target database
Back to the question we face, does SQL Server have a mechanism to satisfy these three features? The answer is yes, SQL Server officially provides CDC functionality.
How CDC works
What is CDC? CDC stands for Change Data Capture and is designed to address incremental Data. It is a new feature in SQL Server 2008. Previously, only after INSERT/after DELETE/After Update Trigger in SQL Server 2005 could be used to obtain data changes.
Here’s how the CDC works:
When a database Table changes, the Capture process captures the changes from the Transaction log and logs them to the Change Table. With this data, users can find out about the changes through specific CDC query functions.
CDC data structure and basic usage
CDC’s core data are those Change tables, here we show you what Change Table looks like, you can have an intuitive understanding.
To open the CDC function of a table, use the following functions: fruits
-- enable cdc for db
sys.sp_cdc_enable_db;
-- enable by table
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'fruits', @role_name = NULL;
-- list cdc enabled table
SELECT name, is_cdc_enabled from sys.databases where is_cdc_enabled = 1;
Copy the code
The CDC function has been enabled. If you want to check which tables have CDC function enabled, you can use SQL:
-- list cdc enabled table
SELECT name, is_cdc_enabled from sys.databases where is_cdc_enabled = 1;
Copy the code
Dbo_fruits_CT = cdc.dbo_fruits_ct = cdc.dbo_fruits_ct
.schema cdc.dbo_fruits_CT
name default nullable type length indexed
-------------- ------- -------- ------------ ------ -------
__$end_lsn null YES binary 10 NO
__$operation null NO int 4 NO
__$seqval null NO binary 10 NO
__$start_lsn null NO binary 10 YES
__$update_mask null YES varbinary 128 NO
id null YES int 4 NO
name null YES varchar(255) 255 NO
Copy the code
Fields starting with __ in this table are metadata recorded by CDC, and ID and name are original fields of FRUITS. This means that the CDC table structure corresponds to the original table structure one to one.
Next we do some business operations to make some changes to the database data and then look at the CDC’s Change Table:
-- 1 step
DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10);
-- 2 step
SET @begin_time = 'the 2017-09-11 14:03:00. 000';
SET @end_time = 'the 2017-09-11 14:10:00. 000';
-- 3 step
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
-- 4 step
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_fruits(@begin_lsn, @end_lsn, 'all');
Copy the code
The operation meaning here is:
- Define four variables that need to be used in a stored procedure
- Begin_time/end_time is the string format time for Human Readable
- Begin_lsn and end_Lsn are Log Sequence numbers converted by the CDC function and represent the unique operation IDS of database changes
- Query CDC change data based on begin_lsn/END_lsn
The query data is as follows:
__$start_lsn __$end_lsn __$seqval __$operation __$update_mask id name
-------------------- ---------- -------------------- ------------ -------------- -- ------0000dede0000019f001a null 0000dede0000019f0018 2 03 1 apple 0000dede000001ad0004 null 0000dede000001ad0003 2 03 2 apple2 0000dede000001ba0003 null 0000dede000001ba0002 3 02 2 apple2 0000dede000001ba0003 null 0000dede000001ba0002 4 02 2 apple3 0000dede000001c10003 null 0000dede000001c10002 2 03 3 apple4 0000dede000001cc0005 null 0000dede000001cc0002 1 03 3 apple4Copy the code
$operation = ‘operation’; $operation = ‘operation’;
- 1 = > delete
- 2 = > insert
- 3 => Data before update
- 4 => Updated data
Based on the retrieved data, we can reconstruct the operation of the database during this period:
- A new
id
Two pieces of data that are 1/2 - Updated the
id
Is 2 - Insert the
id
Is 3 - Delete the
id
Is 3
The CDC tuning
Having the CDC meant we were on the right track, and we breathed a little easier. However, in addition to understanding the principle and usage, we also need to have a deep understanding of the working mechanism of CDC, pressure measurement, tuning, and understanding of its limits and boundaries. Otherwise, once there is an uncontrollable situation online, it will bring huge losses to the business.
Let’s start by looking at the CDC workflow to see what core parameters can be adjusted:
Here’s how a CDC Job works:
- The blue area indicates the maximum number of Log scans performed per Log scan. Maxscans number (
maxscans
) - The blue area is also controlled by the maximum scan transcation number:
maxtrans
- The light blue area is the scanning interval in seconds:
pollinginterval
These three parameters balance CDC server resource consumption, throughput, and latency, and can be adjusted for specific scenarios such as large fields, wide tables, and BLOB tables to meet business needs. Their default values are as follows:
maxscan
The default value of 10maxtrans
The default value of 500pollinginterval
Default value: 5 seconds
The CDC pressure measurement
With the core parameters that can be adjusted, we are about to run multiple forms of testing on the CDC. Before the manometry, we also need to identify key health indicators, which are:
- Memory: Buffer-cache-hit/Page-life-expectancy/Page-split
- SQL > select * from ‘batch-requets/sqL-compilations/SQL-re-compilations/transactions count’ WHERE
- Resource cost: user-connections/processes-blocked/lock-waits/checkpoint-pages
- Operating system layer: CPU usage and disk I/O
For space reasons, we can’t post all the test results, so here’s an example of inserting a million data (random data) under concurrent 30:
The test concluded that, under the default CDC parameters:
The opening/closing Process of CDC will result in several Process blocks, and the following (15K TPS) Process of heavy traffic request will result in about 20 Process blocks. During this process, the SERVER I/O/CPU does not fluctuate significantly. Mssql.sql-statisss.sql-compilations fluctuate violently when you enable or disable the SERVER. After the CDC is enabled, QPS/Page I/o does not fluctuate significantly under heavy traffic requests, nor does it fluctuate to server I/o/CPU. After the CDC is enabled, it can work normally under 16k TPS.
If performance is not up to par, there are some simple tuning guidelines:
- Adjust maxScan maxTrans PollingInterval
- Reduce insertion immediately after insertion
- Avoid mass write operations
- Restrict the fields that need to be logged
- Close NET Changes whenever possible
- Run cleanup when no task pressure
- Monitor the size of log files and I/O pressure to ensure that disks are not written out
- To set up the filegroup_name
- Set fileGroup before enabling SP_CDC_ENABle_TABLE
Online migration mechanism of YUGONG
OK, cut to the current location, we already have the CDC tool, but this only provides a possibility, we need another tool to consume the CDC data and feed it to MySQL.
Good thing yugong is there. Yugong officially provides the encapsulation of Oracle to MySQL, and abstracts the Source/Target/SQL Tempalte interfaces. As long as we implement relevant interfaces, we can complete the consumption of data from SQL Server to MySQL.
We won’t expand on it here, but I will spend a special article on how to develop on YUGONG. To give you a preview, we have opened source the YUGONG version of SQL Server.
How to roll back
For projects such as database migration, we not only need to ensure one-way writes from SQL Server to MySQL, but also from MySQL to SQL Server.
The process also considers the elements of incremental writing: incremental consumption, delay, and idempotent consistency.
MySQL binlog can satisfy these three elements. Note that there are three modes of MySQL binlog: Statement based, Row Based, and Mixed. Only Row based can meet the requirement of idempotent consistency.
After confirming that this is theoretically possible, we also need a tool to read the binlog and convert it into a data format that SQL Server can consume and then write it to SQL Server.
We turn to Canal, another Alibaba project. Canal is a binlog incremental subscription & consumption component provided by Ali middleware team. The component is called because Canal provides the Canal-server application and the Canal Client Library. Canal simulates a MySQL instance and connects to the Master as a Slave. And then read the binlog in real time. It’s up to the user to decide what they want to do with the binlog after they’ve read it.
We designed a simple data stream based on Canal and added the following functions to YUGONG:
- Write function of SQL Server
- The ability to consume the Canal data source
The binlog in Canal Server can only make one-time consumption, and the internal implementation is a Queue. In order to meet the ability of repeated consumption of data, we also designed an additional link to put Canal’s data into the Queue, and the data can be repeated consumption at any time in the future. We chose Redis as the Queue and the data flow is as follows.
Best practices
Database migration to Windows, is the most do not allow error link. Applications are stateless and can be rolled back quickly if problems occur. However, database migration requires consideration, resource preparation, release process, and troubleshooting.
Considering that many business divisions need to go through this process, our project team solidified every step and formed a best practice. Our migration steps are as follows for your reference:
The big stage | phase | The matters | Whether or not complete | head | Time consuming | The start time | Completion time | note |
---|---|---|---|---|---|---|---|---|
day | Stock data stage | Create the MySQL database and prepare account resources | DBA | |||||
Open the CDC | DBA | |||||||
Dump a Snapshot from Slave SQLServer to Backup SQLServer | DBA | |||||||
Backup SQL Server consumes data, ETL to MySQL | DBA | |||||||
Incremental data stage | Verify that ETL data has been consumed and check the total number of data items | DBA | ||||||
CDC data is consumed from Slave SQLServer and continues to be written to MySQL | DBA | |||||||
Yugong was used to check the consistency of data throughout the day | DBA | |||||||
Check for inconsistent data and check manually 10 minutes later to confirm the CDC delay | DBA | |||||||
Check the total data entry | DBA | |||||||
Full inspection of sampling tables using YUGONG | DBA | |||||||
In the morning | Application Release Phase | Stop the SQL Server application | Technical manager | |||||
Check that there is no connection to the SQL Server | DBA | |||||||
Yugong was used to check the consistency of data throughout the day | DBA | |||||||
Check the total data entry | DBA | |||||||
Enable mysql-based applications | operations | |||||||
Testing phase | Test whether the application is working properly and return to all functions | QA | ||||||
(Temporarily added) Test ReadOnly DB application access status | QA | |||||||
Complete the stage | Access to the traffic | operations | ||||||
(Optional) Rollback phase | If a problem is found, the application is directly switched back to the SQL Server | operations | ||||||
After the data audit, new data compensation | DBA | |||||||
(Optional) During the rollback, use Canal to read the binlog and use Canal Client to restore it to SQL Server | DBA |
Reference
- Materialized View Concepts and Architecture
- Tuning the Performance of Change Data Capture in SQL Server 2008 | Microsoft Docs
- Alibaba/Yugong: Data migration synchronization tool from Alibaba to Oracle (full + incremental, target MySQL/DRDS support)
- Alibaba/Canal: Incremental subscription & consumption component of Alibaba mysql database Binlog. Ali cloud DRDS (https://www.aliyun.com/product/drds), alibaba TDDL secondary indexes, small table replication powerd by canal.
The original link: https://blog.alswl.com/2018/05/sql-server-migration-2/
Welcome to follow my wechat public account: Peep Leopard
3a1ff193cee606bd1e2ea554a16353ee