We use two chapters from SQL Server to MySQL (I) : Heterogeneous database migration/FROM SQL Server to MySQL (II) : Online migration, switching engines in the air to introduce the problems we encountered and solutions. Yugong is the core ETL tool for both offline full migration and online seamless migration.

Yugong is a mature tool that has played an important role in Alibaba’s de-IOE campaign. It, along with Otter/Canal, is produced by Alibaba’s middleware team. Yugong aims at heterogeneous database migration; Canal is designed to solve MySQL binlog subscription and consumption problems; Otter, on top of Canal, solves database synchronization problems with a quasi-real-time standard. Otter is equipped with more robust management tools and distributed coordination tools than YUGONG, so as to run steadily for a long time. Yugong is designed for one-off migration work, which is more job-oriented. Of course, YUGONG itself is of good quality and has no problem in long-term operation. We have a colleague in the production line who uses our modified YuGONG to synchronize data from the management platform to the user front desk. It has been running stably for more than half a year.

Yugong system structure

I will not repeat how to use YuGONG here. If you need it, please go to the official document to check the usage document.

I went straight to the key: dissecting the Yugong core module. Yugong data stream is a standard ETL process, which is divided into three categories: Extractor/Translator/Applier:

Let’s take a look at these three categories of concrete design.

Extractor

  • YuGongLifeCycle: Yugong component lifecycle declaration
  • AbstractYuGongLifeCycle: Yugong component lifecycle some implementation
  • RecordExtractor: Basic Extractor Interface
  • AbstractRecordExtractor: basic Extractor virtual class, do a part of the implementation
  • AbstractOracleRecordExtractor: Oracle Extractor virtual class, do part of the Oracle related implementation
  • OracleOnceFullRecordExtractor: Oracle one-time Extractor based on specific SQL
  • OracleFullRecordExtractor: Oracle Full Extractor
  • OracleRecRecordExtractor: Oracle record Extractor, used to create materialized views
  • OracleMaterializedIncRecordExtractorOracle incremental Extrator based on materialized views
  • OracleAllRecordExtractor: Oracle Automation Extractor, first Mark then Full, then Inc

Exctractor reads data from the Source DB and writes it to memory. Yugong provides an Extractor that abstracts the AbstractRecordExtractor class. In addition, Yugong designed the YuGongLifeCycle class to achieve component lifecycle management.

Translator

  • DataTranslator: Translator base class for Row-level data processing
  • TableTranslator: Translator base class for Table level processing (not used in official code)
  • AbstractDataTranslator: Data Translator virtual class, partially implemented
  • EncodeDataTranslator: Translation code format Translator
  • OracleIncreamentDataTranslator: Prepares the Translator for Oracle incremental data, which adjusts some data states
  • BackTableDataTranslator: Demo, which allows the Translator to write back data
  • BillOutDataTranslator: Demo, which contains some Ali-business logic translators
  • MidBillOutDetailDataTranslator: Demo, which contains some Ali-business logic translators

Translators read RowData in memory and then transform it. Most translators do stateless operations such as encoding transformations. In addition, a small number of translators do business logic operations, such as data write back.

Applier

  • RecordApplier: Basic Applier Interface
  • AbstractRecordApplierThe basic Applier virtual class is partially implemented
  • CheckRecordRecordApplier: Checks data consistency Applier does not write data
  • FullRecordRecordApplier: Full data Applier that uses UPSERT to update data
  • IncreamentRecordApplierIncremental Applier, which uses Oracle materialized views as data sources
  • AllRecordRecordApplier: Automatic Applier, using full data Applier first, then incremental data Applier

The Applier writes the Translator’s processed data to the Target DB. Yugong provides consistency checking, full, incremental Applier. In particular, AllRecordRecordApplier provides full automation.

Others

Besides ETL, YUGONG also has some important classes: control class and tool class.

  • SqlTemplate: provides base CLASS SQL templates for operations such as CRUD and UPSERT
  • OracleSqlTemplate: Oracle SQL template based on SqlTemplate
  • RecordDiffer: Consistency check differ
  • YugongController: Application controller: controls the flow of application data
  • YugongInstance: Controls a single migration task instance. One table corresponds to one YugongInstance

The old soldier’s problem

Saying yugong has a problem is a bit of a clicker, after all she is a proven veteran. But for us, the open source version of Yugong has a few shortcomings:

  • SQL Server reading is not supported
  • SQL Server write not supported (Rollback requires write to SQL Server)
  • MySQL reading is not supported

In addition to database support, Yugong does have some room for improvement in engineering. We ended up spending a lot of time doing engineering improvements.

  • Instead of the default package method (using maven-assembly-plugin to generate lFs-like tar.gz files), use fat JAR mode to generate only single-file executable JAR packages
  • Discard ini configuration files in favor of YAML configuration file format (old configuration still uses INI files, YAML mainly manages table structure changes)
  • Modify the Plugin pattern to reflect fetch Java classes instead of Java runtime compilation
  • Split Unit Test/Integration Test to reduce refactoring costs
  • Refactoring the Oracle inheritance structure to open the SQL Server/MySQL interface
  • Support Canal Redis format data as MySQL online incremental data source

Reconstructed structure

Extractor

  • AbstractSqlServerExtractor: Added abstract SqlServer Extractor
  • AbstractMysqlExtractor: Added abstract MySQL Extractor
  • AbstractFullRecordExtractor: Added abstract Full mode Extractor
  • SqlServerCdcExtractor: Added SQL Server CDC incremental mode Extractor
  • MysqlCanalExtractor: Added MySQL Canal format incremental consumption Extractor
  • MysqlCanalRedisExtractor: Added MySQL Canal format incremental consumption Extractor, using Redis to do backtracking
  • MysqlFullExtractor: Added MySQL full Extractor
  • SqlServerFullExtractor: Added full SQL Server Extractor

After abstracting the three abstract classes, the overall logic is clearer and easier to add new database format support in the future.

Translator

  • Sha1ShardingTranslator: According to Sha1 Sharding Translator
  • ModShardingTranslator: Specifies the Value Mode Sharding Translator
  • RangeShardingTranslator: Sharding Translator according to the scope
  • UserRouterMapShardingTranslatorSharding Translator: For specific services
  • UserRouterMapMobileShardingTranslatorSharding Translator: For specific services
  • ClassLearningNoteInfoShardingTranslator: Custom translators are used for specific services
  • ClassLearningIsActiveReverseShardingTranslator: Custom translators are used for specific services
  • ColumnFixDataTranslator: Adjust the table structure Translator
  • NameStyleDataTranslator: Adjust table field name Translator, which automatically translates the entire table based on the style
  • CompositeIndexesDataTranslator: Resolve the Translator’s unique PK determination problem under the compound primary key

Added a new set of translators.

Applier

  • SqlServerIncreamentRecordApplier: Added the SQL Server incremental consumption Applier

The Applier structure is minor, with the addition of SQL Server support.

Secondary development experience

How to quickly learn about an open source project? Many students’ first reaction is to read the source code. Look at the source is effective, but the price is too low. If the project is not properly designed, it can quickly get lost in the details of the code. My experience has been to read some of the official Slide shares first, and then read the official core documentation. Slide is high in gold, the core of telling the core.

If you really want to read the source code in detail, I recommend using tools such as IntelliJ’s Diagram function to abstract out the core classes. Other plug-ins, such as SequencePluginReload, easily generate calls between functions and are useful for viewing data flow. During this development process, I also found some problems according to the generated class diagram, so I reconstructed the framework inheritance structure before entering Coding. Improve the overall development efficiency

Judging by the style of the code, Yugong was not created by one person. This somewhat leads to inconsistencies in code style and design. I myself have been in the business line for years and can imagine the need to be rough and fast in fast-moving projects. But later generations to accept development, how much will be a little headache. So BEFORE ENTERING development, I introduced standardized CheckStyle, global formatting with Google Style, and a code quality baseline with Sonar scanning. It was also a double-edged sword, formatting projects led to a lot of diff, and I set myself up for endless problems submitting PR upstream at a later stage.

I made a few mistakes along the way. The biggest headache is that contributing to the open source community was not considered early on, making it difficult to merge upstream in the future, and is still struggling to merge code. In addition, due to the tight time of the overall project, I did not do more detailed unit test coverage for the sake of implementation speed. Best practices for open source software are not followed.

Now I modified Yugong version of the open source address is: https://github.com/alswl/yugong. I also submit a Pull Request https://github.com/alibaba/yugong/pull/66, how is working with the official communication to submit this part into the upstream.


From SQL Server to MySQL (3) : Yugongchangshan – Open Source power – Log4D

Welcome to follow my wechat public account: Peep Leopard

3a1ff193cee606bd1e2ea554a16353ee