From SQL Server to MySQL (3) : Open Source power

We use two chapters from SQL Server to MySQL (I) : Heterogeneous database migration/FROM SQL Server to MySQL (II) : Online migration, switching engines in the air to introduce the problems we encountered and solutions. Yugong is the core ETL tool for both offline full migration and online seamless migration.

Yugong is a mature tool that has played an important role in Alibaba’s de-IOE campaign. It, along with Otter/Canal, is produced by Alibaba’s middleware team. Yugong aims at heterogeneous database migration; Canal is designed to solve MySQL binlog subscription and consumption problems; Otter, on top of Canal, solves database synchronization problems with a quasi-real-time standard. Otter is equipped with more robust management tools and distributed coordination tools than YUGONG, so as to run steadily for a long time. Yugong is designed for one-off migration work, which is more job-oriented. Of course, YUGONG itself is of good quality and has no problem in long-term operation. We have a colleague in the production line who uses our modified YuGONG to synchronize data from the management platform to the user front desk. It has been running stably for more than half a year.

Yugong system structure

I will not repeat how to use YuGONG here. If you need it, please go to the official document to check the usage document.

I went straight to the key: dissecting the Yugong core module. Yugong data stream is a standard ETL process, which is divided into three categories: Extractor/Translator/Applier:

Let’s take a look at these three categories of concrete design.

Extractor

YuGongLifeCycle: Yugong component lifecycle declaration
AbstractYuGongLifeCycle: Yugong component lifecycle some implementation
RecordExtractor: Basic Extractor Interface
AbstractRecordExtractor: basic Extractor virtual class, do a part of the implementation
AbstractOracleRecordExtractor: Oracle Extractor virtual class, do part of the Oracle related implementation
OracleOnceFullRecordExtractor: Oracle one-time Extractor based on specific SQL
OracleFullRecordExtractor: Oracle Full Extractor
OracleRecRecordExtractor: Oracle record Extractor, used to create materialized views
OracleMaterializedIncRecordExtractorOracle incremental Extrator based on materialized views
OracleAllRecordExtractor: Oracle Automation Extractor, first Mark then Full, then Inc

Exctractor reads data from the Source DB and writes it to memory. Yugong provides an Extractor that abstracts the AbstractRecordExtractor class. In addition, Yugong designed the YuGongLifeCycle class to achieve component lifecycle management.

Translator

DataTranslator: Translator base class for Row-level data processing
TableTranslator: Translator base class for Table level processing (not used in official code)
AbstractDataTranslator: Data Translator virtual class, partially implemented
EncodeDataTranslator: Translation code format Translator
OracleIncreamentDataTranslator: Prepares the Translator for Oracle incremental data, which adjusts some data states
BackTableDataTranslator: Demo, which allows the Translator to write back data
BillOutDataTranslator: Demo, which contains some Ali-business logic translators
MidBillOutDetailDataTranslator: Demo, which contains some Ali-business logic translators

Translators read RowData in memory and then transform it. Most translators do stateless operations such as encoding transformations. In addition, a small number of translators do business logic operations, such as data write back.

Applier

RecordApplier: Basic Applier Interface
AbstractRecordApplierThe basic Applier virtual class is partially implemented
CheckRecordRecordApplier: Checks data consistency Applier does not write data
FullRecordRecordApplier: Full data Applier that uses UPSERT to update data
IncreamentRecordApplierIncremental Applier, which uses Oracle materialized views as data sources
AllRecordRecordApplier: Automatic Applier, using full data Applier first, then incremental data Applier

The Applier writes the Translator’s processed data to the Target DB. Yugong provides consistency checking, full, incremental Applier. In particular, AllRecordRecordApplier provides full automation.

Others

Besides ETL, YUGONG also has some important classes: control class and tool class.

SqlTemplate: provides base CLASS SQL templates for operations such as CRUD and UPSERT
OracleSqlTemplate: Oracle SQL template based on SqlTemplate
RecordDiffer: Consistency check differ
YugongController: Application controller: controls the flow of application data
YugongInstance: Controls a single migration task instance. One table corresponds to one YugongInstance

The old soldier’s problem

Saying yugong has a problem is a bit of a clicker, after all she is a proven veteran. But for us, the open source version of Yugong has a few shortcomings:

SQL Server reading is not supported
SQL Server write not supported (Rollback requires write to SQL Server)
MySQL reading is not supported

In addition to database support, Yugong does have some room for improvement in engineering. We ended up spending a lot of time doing engineering improvements.

Instead of the default package method (using maven-assembly-plugin to generate lFs-like tar.gz files), use fat JAR mode to generate only single-file executable JAR packages
Discard ini configuration files in favor of YAML configuration file format (old configuration still uses INI files, YAML mainly manages table structure changes)
Modify the Plugin pattern to reflect fetch Java classes instead of Java runtime compilation
Split Unit Test/Integration Test to reduce refactoring costs
Refactoring the Oracle inheritance structure to open the SQL Server/MySQL interface
Support Canal Redis format data as MySQL online incremental data source