As a bridge between data system and business system, data access and transmission is an indispensable part of data system and architecture. The stability and accuracy of data transmission system directly affect the SLA and quality of the whole data system service. In addition, how to improve the ease of use of the system, ensure monitoring services and reduce the cost of system maintenance, elegant response to disasters and other issues are also very important.

This paper introduces the practical experience and summary of autohome real-time computing team in constructing data transmission SDK and transmission platform by using Flink and Flink real-time platform. The contents include:

  1. Background and Requirements
  2. Technology Selection and Design — Why Flink?
  3. Design architecture of data transmission system
  4. Binlog access SDK based on Flink
  5. Platform using
  6. Summary and Prospect

I. Background and demand

As an intelligent data-driven company, Autohome (hereinafter referred to as autohome) naturally has various complex requirements for data, and its data system is responsible for supporting the development of these business requirements. The data transmission system, as one of them, undertakes all kinds of data import and distribution requirements, and supports users to subscribe to data changes. As the support business expands and demand increases. The original access system has exposed certain problems and deficiencies:

  • Lack of effective task and information management mechanism, rely on manual task management, operation and maintenance, information statistics
  • Access program resources waste, lack of flexibility
  • The DDL change problem cannot be handled well and requires manual intervention if necessary
  • The transmission system relies on many components, such as Zookeeper and Redis
  • The technical debt of the code accumulates and the maintenance cost of the code becomes high

In view of the above problems, we decided to develop a new data transmission and distribution system to solve the above problems.

Ii. Technology selection and design — Why Flink?

Before the development of the new system, we analyzed three possible solutions:

  1. Completely self-developed (similar to Otter)
  2. Reuse open source components on the market (Maxwell/Canal/Debezium) for secondary development and integration
  3. Based on Flink component development

We identified the following main design and use goals:

  • The architecture should be o&M friendly in design, provide high availability and fault recovery policies, and support remote live
  • The architecture is designed to provide strong data accuracy and at least promise at-least-once semantics
  • The architecture should be scalable and scalable, with resources allocated on demand
  • Functional design to comprehensive monitoring coverage and improve the alarm mechanism, support metadata information management
  • Functional design to be real-time computing friendly (1)
  • The functionality should be designed to fully protect against DDL changes

In addition, in terms of performance indicators, the latency and throughput of the access system should at least meet the requirements of all normal business conditions.

(1) The ability to integrate with real-time computing platforms

Scheme design and comparison

According to the design ideas and objectives, we sorted out the comparison table of the main functions of the scheme:

(1)Flink has its own high availability and fault recovery, and the real-time computing platform can provide stronger high availability services on this basis. (2) Good coding + Flink mechanism can realize Exactly-Once. (3) The real-time computing platform has its own task deployment management ability

After discussion, we unanimously decided to develop a new transmission platform based on Flink:

  1. Flink DataStream’s programming model and API are very natural and straightforward in dealing with data transfer scenarios
  2. Flink provides consistency assurance and HA/ stability/flow control measures at the framework level so that we don’t have to deal with the difficult and complex issues of development and can easily do it with the framework behind us
  3. Flink naturally has the ability to expand horizontally and vertically, using computing resources as needed
  4. Fully reuse the home Flink real-time computing platform existing components and capabilities — complete monitoring alarm/task life cycle management/remote multi-activity/self-service operation and maintenance functions

Our MVP version was developed in less than three weeks, and the POC results met all of our expected performance and functionality requirements.

Three, the data transmission system design architecture

From the logical level, the real-time data transmission platform of The home can be divided into three parts:

  • Data transfer program
  • Access the task information management module
  • Task execution Runtime module

In terms of implementation:

  • Data transfer program is composed of fixed Flink Jar and SQL Task generated by Flink SQL Codegen Service
  • The management module, as a micro-service, is responsible for communicating with Flink platform components and completing the necessary task management and information management
  • The executive layer relies directly on the Flink platform and Flink platform clusters

Component architecture and interaction logic

Components and interactions involved in the transmission system are shown in the figure below:

AutoDTS is the task information management module of the transmission system, AutoStream Core is the Core system of Flink real-time platform, Jar Service is the storage and management Service of Flink related SDK Jar. Metastore is a metadata management system of Flink platform. Flink Client is a Self-encapsulated Submit Client that supports Restful submission of jobs to YARN/K8S.

The AutoDTS front end directly interacts with users to modify task information and perform task life cycle operations. AutoDTS interacts with Flink platform after processing the task information. Each data transfer task corresponds to a unique task of Flink platform. Meanwhile, part of the task information is processed by AutoDTS and the corresponding flow table is directly created on Metastore. Users directly apply for and use the Flink flow table for the development of SQL tasks.

For different transmission tasks, AutoDTS will entrust Core System to organize task parameters and SQL logic, and load different SDK jars from Jar Service and submit them to Client for execution. For transmission tasks based on SQL Codegen, Flink SQL Codegen Service organizes and translates task parameters into executable Flink SQL tasks. Through SQL tasks, we can directly reuse platform SQL SDKs to execute SQL jobs.

As mentioned earlier, we maximize the reuse of existing components and services, greatly reducing development cycles.

Transmission task type and composition

The data transmission task of the home is divided into two types: access task and distribution task.

  • The access task is responsible for accessing the Changelog Stream from the data source in real time and writing it into Kafka in a unified format. Each table is only used by a single access program as a public data asset for use and consumption by downstream programs
  • Distribute the task, responsible for reading the public Kafka data, and write the data to the specified storage, users according to their own needs to use, have the ownership of the distribution task

As shown in the figure, there are three main access data sources. In addition to Mysql and SqlServer, we also support TiDB’s Changelog(TiCDC) access to Java Client related logic, and contribute our code to TiDB community [1]. For the distributor, SQL CodeGen generation Flink SQL code execution is performed by parsing the user’s task configuration.

4. Binlog access SDK based on Flink

Among these access and distribution SDKS, Binlog access SDK is one of the more difficult ones. Here, we take Binlog access SDK as an example to analyze the main design ideas and development process of access SDK.

Stage and dismantling

According to Flink’s classic Source->Transformation->Sink, the Binlog access task is split into three stages:

Binlog Source

The naive development idea for a Binlog Source is to create a BinaryLogClient and continue fetchBinlogEvent and send it downstream after a simple transformation. Within the established design goals, the following questions need to be carefully considered:

  1. Ensure Source side processing performance
  2. Ensure source is traceable
  3. Ensure the integrity of Mysql transactions

For problem 1, considering the particularity of Binlog Stream, we require the parallelism of Source to be 1 and only 1. And in most cases, fetching binlogEvents from BinaryLogClient is not a performance bottleneck. As long as we keep the BinaryLogClient and BinlogSourceFunction in the same life cycle, they act as producers and consumers, respectively, via bounded blocking queue links, At the same time, BinlogSourceFunction performs logical processing on BinlogEvent as little as possible to reduce the burden of BinlogSourceFunction and improve the performance of the Source stage.

Problems 2 and 3 need to be analyzed from the characteristics and format of Binlog. As we all know, a BinlogEvent carries a unique BinlogPosition. The BinlogPosition is in full order, and we can record the current BinlogPosition when we trigger Checkpoint. But just recording this is not enough. If the data location is recorded, does the next Checkpoint recovery start at the Checkpoint or at the next Checkpoint? On the other hand, we want to send data downstream as a complete transaction rather than truncate the send from the middle of the transaction. Here, we’re going to use a specific type of BinlogEvent, the TransactionEnd event.

The BinlogSourceFunction only uses the BinlogPosition of the TransactionEnd event to save the update point to the state. Since the TransactionEnd event is not a DML event, Does not result in downstream generation of data, so you do not need to worry about the problems mentioned earlier.

The solution of problem 3 requires linkage with Flink’s Checkpoint mechanism. The version of Flink we were using was 1.9.x. On the Source side, the Source and Checkpoint trigger need to be matched by a CheckpointLock. While there are barriers to understanding and use, the CheckppointLock mechanism helps us achieve the goal of Question 3. We ensure that the Source sends data downstream only when it receives a lock, and that it unlock only after a transaction has been sent. This ensures that there is a complete 𝒳 (𝒳 ∈ N) transaction between the two checkpoints. On the other hand, we reduce the interval between checkpoint trigger (200ms~500ms), reduce the number of data transactions between checkpoint, and speed up the data commit.

UnifiedFormatTransform

As the name suggests, the UnifiedFormatTransform converts data to a uniform data format.

Compared to the Binlog Source phase, the UnifiedFormatTransform phase does not have to worry too much about performance. Good encoding and horizontal and vertical capacity expansion can meet most performance requirements. But there is one important issue that needs to be addressed, which is the goal of the feature design mentioned earlier: complete protection against DDL problems.

DDL problems have always been a thorny problem in data synchronization/transmission. The problems include but are not limited to data parsing failures/errors, program failures/restarts, and high recovery costs. In fact, the core idea to solve this problem is very simple, is to resolve the DDL in the program and deal with Schema changes. To implement this feature, we need to complete the following steps:

  • Parser is embedded to parse DDL SQL
  • All DDLS as they appear are parsed, the built-in Schema is updated based on the parsed DDL content, and updated to the Flink state
  • Generate DDL data and send it downstream

In this implementation, we refer to Maxwell [2], embedded the Mysql grammar g4 file of Antlr4, and then customized the listener to complete the Schema update and DDL data generation. The Schema is then saved to the state when Checkpoint is triggered.

After completing the function of resolving DDL in place, no matter simple Alter Table or complex Online DDL, access program can solve the problem smoothly, using state to recover from the breakpoint, and there is no Schema exception problem.

Kafka Sink

The Kafka Sink phase mainly writes converted data to Kafka. Flink native gives The exact-once capability to Kafka Sink, and we are leveraging this capability with Source to provide an end-to-end exact-once solution right out of the box. We ensure that Source sends data according to the complete Mysql Transaction and Sink writes data to Kafka according to the complete Mysql Transaction. For transaction-sensitive scenarios, Transactional consumption can be enabled to process data with strong transaction semantics rather than final consistency.

Other optimization

We also made some optimizations:

  • Gtid supports one-click primary/secondary switchover
  • Program running information is regularly backed up to external storage
  • Binlog Overwrites monitoring indicators related to synchronization tasks

5. Platform use

On the transmission platform, users only need to complete necessary configurations to complete the creation of transmission tasks and use data.

Access to the task

For access tasks, as we mentioned earlier, the data generated by access tasks is treated as a public asset. Therefore, users only need to check whether the data of the required table has been connected. If the data of the required table has been connected, users can apply for the table directly. Otherwise, the system automatically applies for the table after the application is approved.

Distribute the tasks

For the distribution job, users need to create it. Take Iceberg distribution task as an example:

■ Field filtering

Select the data source table fields that are already plugged into the platform for the distribution job

After selecting the operation configuration of some tasks (such as resources, operation environment), a distribution task can be created and run. We can see the corresponding unique Flink platform task ID:

In addition, we also provide rich monitoring query, metadata information query and other functions, make full use of the existing components of the real-time computing platform, realize the close combination of the transmission system and the real-time computing system.

Sixth, summary and prospect

Practice has proved that our choice of Flink based output transmission system development is a wise and correct decision. With the minimum development cost, we have completely solved the previous legacy problems in terms of functions, efficiency and maintainability, comprehensively improved the efficiency and user experience of home access/distribution/data subscription, and also enhanced our technical capabilities in data transmission.

Recently, we have invested more energy in the direction of data lake, and the transmission system has initially supported data access to the data lake. In the future, we hope to continuously improve relevant functions, greatly improve the data access capacity of the data lake, support users to enter the lake, and strengthen the integration of the whole data system.

On the other hand, we see a lot of new features and tools in the new version of Flink. FLIP-27 Source and OperatorCoordinator, for example, are two new mechanisms and tools that we hope will continue to improve our code and expand its capabilities. For the new launch of upsert-Kafka, we have begun to try to carry out preliminary development and integration on the Flink computing platform, hope to get through upsert-Kafka and transmission system, continue to expand and enrich the real-time computing and transmission of business scenarios!

References:

[1] github.com/pingcap/tic…

[2] github.com/zendesk/max…

About the author:

Liu Shou Wei graduated from Dalian University of Technology, Apache Flink Contributor, Scala/Akka heavy fan, joined Autohome in 19 years, responsible for the development and maintenance of real-time computing platform and data transmission platform data.