We use two chapters from SQL Server to MySQL (I) : Heterogeneous database migration/FROM SQL Server to MySQL (II) : Online migration, switching engines in the air to introduce the problems we encountered and solutions. Yugong is the core ETL tool for both offline full migration and online seamless migration.
Yugong is a mature tool that has played an important role in Alibaba’s de-IOE campaign. It, along with Otter/Canal, is produced by Alibaba’s middleware team. Yugong aims at heterogeneous database migration; Canal is designed to solve MySQL binlog subscription and consumption problems; Otter, on top of Canal, solves database synchronization problems with a quasi-real-time standard. Otter is equipped with more robust management tools and distributed coordination tools than YUGONG, so as to run steadily for a long time. Yugong is designed for one-off migration work, which is more job-oriented. Of course, YUGONG itself is of good quality and has no problem in long-term operation. We have a colleague in the production line who uses our modified YuGONG to synchronize data from the management platform to the user front desk. It has been running stably for more than half a year.
Yugong system structure
I will not repeat how to use YuGONG here. If you need it, please go to the official document to check the usage document.
I went straight to the key: dissecting the Yugong core module. Yugong data stream is a standard ETL process, which is divided into three categories: Extractor/Translator/Applier:
Let’s take a look at these three categories of concrete design.
Extractor
YuGongLifeCycle
: Yugong component lifecycle declarationAbstractYuGongLifeCycle
: Yugong component lifecycle some implementationRecordExtractor
: Basic Extractor InterfaceAbstractRecordExtractor
: basic Extractor virtual class, do a part of the implementationAbstractOracleRecordExtractor
: Oracle Extractor virtual class, do part of the Oracle related implementationOracleOnceFullRecordExtractor
: Oracle one-time Extractor based on specific SQLOracleFullRecordExtractor
: Oracle Full ExtractorOracleRecRecordExtractor
: Oracle record Extractor, used to create materialized viewsOracleMaterializedIncRecordExtractor
Oracle incremental Extrator based on materialized viewsOracleAllRecordExtractor
: Oracle Automation Extractor, first Mark then Full, then Inc
Exctractor reads data from the Source DB and writes it to memory. Yugong provides an Extractor that abstracts the AbstractRecordExtractor class. In addition, Yugong designed the YuGongLifeCycle class to achieve component lifecycle management.
Translator
DataTranslator
: Translator base class for Row-level data processingTableTranslator
: Translator base class for Table level processing (not used in official code)AbstractDataTranslator
: Data Translator virtual class, partially implementedEncodeDataTranslator
: Translation code format TranslatorOracleIncreamentDataTranslator
: Prepares the Translator for Oracle incremental data, which adjusts some data statesBackTableDataTranslator
: Demo, which allows the Translator to write back dataBillOutDataTranslator
: Demo, which contains some Ali-business logic translatorsMidBillOutDetailDataTranslator
: Demo, which contains some Ali-business logic translators
Translators read RowData in memory and then transform it. Most translators do stateless operations such as encoding transformations. In addition, a small number of translators do business logic operations, such as data write back.
Applier
RecordApplier
: Basic Applier InterfaceAbstractRecordApplier
The basic Applier virtual class is partially implementedCheckRecordRecordApplier
: Checks data consistency Applier does not write dataFullRecordRecordApplier
: Full data Applier that uses UPSERT to update dataIncreamentRecordApplier
Incremental Applier, which uses Oracle materialized views as data sourcesAllRecordRecordApplier
: Automatic Applier, using full data Applier first, then incremental data Applier
The Applier writes the Translator’s processed data to the Target DB. Yugong provides consistency checking, full, incremental Applier. In particular, AllRecordRecordApplier provides full automation.
Others
Besides ETL, YUGONG also has some important classes: control class and tool class.
SqlTemplate
: provides base CLASS SQL templates for operations such as CRUD and UPSERTOracleSqlTemplate
: Oracle SQL template based on SqlTemplateRecordDiffer
: Consistency check differYugongController
: Application controller: controls the flow of application dataYugongInstance
: Controls a single migration task instance. One table corresponds to one YugongInstance
The old soldier’s problem
Saying yugong has a problem is a bit of a clicker, after all she is a proven veteran. But for us, the open source version of Yugong has a few shortcomings:
- SQL Server reading is not supported
- SQL Server write not supported (Rollback requires write to SQL Server)
- MySQL reading is not supported
In addition to database support, Yugong does have some room for improvement in engineering. We ended up spending a lot of time doing engineering improvements.
- Instead of the default package method (using maven-assembly-plugin to generate lFs-like tar.gz files), use fat JAR mode to generate only single-file executable JAR packages
- Discard ini configuration files in favor of YAML configuration file format (old configuration still uses INI files, YAML mainly manages table structure changes)
- Modify the Plugin pattern to reflect fetch Java classes instead of Java runtime compilation
- Split Unit Test/Integration Test to reduce refactoring costs
- Refactoring the Oracle inheritance structure to open the SQL Server/MySQL interface
- Support Canal Redis format data as MySQL online incremental data source
Reconstructed structure
Extractor
AbstractSqlServerExtractor
: Added abstract SqlServer ExtractorAbstractMysqlExtractor
: Added abstract MySQL ExtractorAbstractFullRecordExtractor
: Added abstract Full mode ExtractorSqlServerCdcExtractor
: Added SQL Server CDC incremental mode ExtractorMysqlCanalExtractor
: Added MySQL Canal format incremental consumption ExtractorMysqlCanalRedisExtractor
: Added MySQL Canal format incremental consumption Extractor, using Redis to do backtrackingMysqlFullExtractor
: Added MySQL full ExtractorSqlServerFullExtractor
: Added full SQL Server Extractor
After abstracting the three abstract classes, the overall logic is clearer and easier to add new database format support in the future.
Translator
Sha1ShardingTranslator
: According to Sha1 Sharding TranslatorModShardingTranslator
: Specifies the Value Mode Sharding TranslatorRangeShardingTranslator
: Sharding Translator according to the scopeUserRouterMapShardingTranslator
Sharding Translator: For specific servicesUserRouterMapMobileShardingTranslator
Sharding Translator: For specific servicesClassLearningNoteInfoShardingTranslator
: Custom translators are used for specific servicesClassLearningIsActiveReverseShardingTranslator
: Custom translators are used for specific servicesColumnFixDataTranslator
: Adjust the table structure TranslatorNameStyleDataTranslator
: Adjust table field name Translator, which automatically translates the entire table based on the styleCompositeIndexesDataTranslator
: Resolve the Translator’s unique PK determination problem under the compound primary key
Added a new set of translators.
Applier
SqlServerIncreamentRecordApplier
: Added the SQL Server incremental consumption Applier
The Applier structure is minor, with the addition of SQL Server support.
Secondary development experience
How to quickly learn about an open source project? Many students’ first reaction is to read the source code. Look at the source is effective, but the price is too low. If the project is not properly designed, it can quickly get lost in the details of the code. My experience has been to read some of the official Slide shares first, and then read the official core documentation. Slide is high in gold, the core of telling the core.
If you really want to read the source code in detail, I recommend using tools such as IntelliJ’s Diagram function to abstract out the core classes. Other plug-ins, such as SequencePluginReload, easily generate calls between functions and are useful for viewing data flow. During this development process, I also found some problems according to the generated class diagram, so I reconstructed the framework inheritance structure before entering Coding. Improve the overall development efficiency
Judging by the style of the code, Yugong was not created by one person. This somewhat leads to inconsistencies in code style and design. I myself have been in the business line for years and can imagine the need to be rough and fast in fast-moving projects. But later generations to accept development, how much will be a little headache. So BEFORE ENTERING development, I introduced standardized CheckStyle, global formatting with Google Style, and a code quality baseline with Sonar scanning. It was also a double-edged sword, formatting projects led to a lot of diff, and I set myself up for endless problems submitting PR upstream at a later stage.
I made a few mistakes along the way. The biggest headache is that contributing to the open source community was not considered early on, making it difficult to merge upstream in the future, and is still struggling to merge code. In addition, due to the tight time of the overall project, I did not do more detailed unit test coverage for the sake of implementation speed. Best practices for open source software are not followed.
Now I modified Yugong version of the open source address is: https://github.com/alswl/yugong. I also submit a Pull Request https://github.com/alibaba/yugong/pull/66, how is working with the official communication to submit this part into the upstream.
From SQL Server to MySQL (3) : Yugongchangshan – Open Source power – Log4D
Welcome to follow my wechat public account: Peep Leopard
3a1ff193cee606bd1e2ea554a16353ee