Canal source code analysis series – engineering structure description

This is the downloaded source code in the project structure of IDEA, we start from top to bottom.

The admin module

Canal-admin is designed to provide canal with o&M functions such as overall configuration management and node operation and maintenance, as well as a relatively friendly WebUI operation interface to facilitate more users to operate quickly and safely.

The client module

As the name implies, this is canal’s client source. Canal uses client-server mode. Usually, we use Canal’s function as a client. If it is Java, we need to refer to the following dependency:

<dependency>
			<groupId>com.alibaba.otter</groupId>
			<artifactId>canal.client</artifactId>
			<version>${project.version}</version>
		</dependency>
Copy the code

The client module is the dependent source module. Client is mainly through CanalConnector interface implementation class to complete most functions of client, including connection, subscription, data acquisition, etc.

Client – adapter module

Client adapter, if you expand the directory structure of the module, you can get a rough idea of what it does from its name. After Canal 1.1.1, the adaptation and startup functions of client data landing were added. Currently, the following functions are supported:

  • Client initiator
  • Synchronously manage REST interfaces
  • Log adapter, as DEMO
  • Data synchronization of relational databases (table to table synchronization), ETL function
  • HBase data synchronization (table to table synchronization), ETL function
  • ElasticSearch multi-table data synchronization (ETL

Some people may wonder why there is a Cient module, but also a Clent-Adapter module. Adapter is actually a module developed to allow users to quickly run Canal. If the data you want to sink is already implemented in the Adapter, you can land quickly.

The common module

The Common module mainly provides some common utility classes and interfaces.

The connector module

The Connector module has several implementations:

  • kafka-connector
  • rabbitmp-connector
  • tcp-connector
  • rocketmq-connector

Dbsync module

Raw binlogs are binary streams that need to be parsed into the corresponding binlog event objects defined in the DBSync module.

Deploy the module

Deploy the module. Canal can be deployed independently, and the deploy module is used to deploy the Canal service independently.

The Deploy module is responsible for starting the Canal Server and Canal Instance. In the deploy directory, we have a start/stop script for canal. For example, if we want to start Canal, we can do this:

sh bin/startup.sh
Copy the code

Docker module

If you need to use Canal in the Docker environment, you can refer to the script inside.

Driver module

Parser connects to mysql through the driver module to obtain binlog.

Example module

As the name suggests, it’s just some demos that use Example.

The filter module

The filter module is mainly used to filter table and field data from binlog. When using Canal, this can be configured on the server or client side. Filter based on Aviater to do matching, there are several implementation classes:

  • AviaterELFilter EL expression matches
  • AviaterRegexFilter regex match
  • AviaterSimpleFilter Simple match

The instance module

One of the core modules. In a Canal instance, data synchronization can only be achieved if Instace is started. Multiple Instance instances can be created within a Single Canal Server Instance. For example, if we have a scenario where we need to synchronize two databases (database and table), we can create two instance instances.

Instance module has four core components: Parser module, Sink module, Store module and Meta module. The core interface is CanalInstance.

Meta modules

The core interface is CanalMetaManager, which implements the mechanism of subscription and consumption. It is mainly used to record the location of mysql binlog consumed by Canal. The CanalMetaManager interface has several implementation classes:

  • FileMixedMetaManager
  • MemoryMetaManager
  • MixedMetaManager
  • PeriodMixedMetaManager
  • ZooKeeperMetaManager

Some of these implementation classes hold references to other implementations to decorate their own functionality (decorator pattern),

The parser module

The Parser module is used to subscribe to binlog events, which are delivered to the Store by sink. The parser module relies on dbsync and driver modules.

Prometheus module

As the name suggests, it is used to monitor and collect indicators.

The Protocol module defines the communication protocol between the client and server. Canal’s data transmission consists of two parts: one is the binlog is converted into the Message defined by us when subscribing to the binlog; the other is the TCP protocol transmitted between the client and server. Both parts are in protobuff format.

The server module

Canal server, corresponding to the entire Canal service instance, a JVM instance has only one server.

The core interface is CanalServer, which has two implementations:

  • CanalServerWithEmbedded
  • CanalServerWithNetty

These two implementations represent two application modes of Canal. CanalServer with Netty plays a role in canal independent deployment scenario. Developers only need to implement CIent, and different applications communicate with CanalServer through Canal Client. Requests from the Canal Client are uniformly accepted and processed by canal Server with Netty.

With CanalServer Withembeded, canal can be embedded in our own services rather than deployed independently. But the demands on developers are high.

Sink module

All the Sink phase does is filter the binlog data according to certain rules. Some data distribution will also be done. Its core interface is CanalEventSink, and its core method sink is used to submit data.

Store module

Used to perform final drop-off, data storage. The core interface is CanalEventStore.

After the introduction of each module, the following picture shows canal’s functional architecture:

Reference:

  • Github.com/alibaba/can…