Canal is an open source project developed in Java by Alibaba. The main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing. Currently, it mainly supports MySQL.
Making address:Github.com/alibaba/can…
MySQL Master/Slave synchronization before introducing the internal principle of Canal, let’s first understand the MySQL Master/Slave synchronization principle:
-
MySQL master starts binary log and writes data changes to binary log events.
-
MySQL slave (I/O thread) copy master binary log events to its relay log
-
MySQL Slave (SQL Thread) replays events in the relay log and reflects data changes to its own data
Canal working principle:
-
Canal emulated the interaction protocol of the MySQL slave, disguised itself as the MySQL slave, and sent the dump protocol to the MySQL master
-
MySQL master receives dump request and starts pushing binary log to slave (canal)
-
Canal parses binary log objects (originally byte streams)
In short, Canal impersonates MySQL as a slave, listening to MySQL’s binlog log to get data. After setting MySQL binlog to row mode, Canal can obtain every Insert/Update/Delete script executed, as well as data before and after modification. Based on this feature, Canal can obtain MySQL data changes efficiently.
Canal architecture:
Description:
Server represents a running instance of Canal, corresponding to a JVM
Instance corresponds to a data queue (1 server corresponds to 1.. N the instance)
**EventParser: ** Data source access, simulate slave and master interaction, protocol parsing
**EventSink: **Parser and Store connector, mainly for data filtering, processing, distribution
**EventStore: ** Is responsible for storage
**MemoryMetaManager: ** Incremental subscription and consumption information manager
Event Parser design:
The whole parser process can be roughly divided into the following steps:
-
Connection Retrieves the log position that was successfully parsed last time (if it was started for the first time, retrieves the initial specified position or the binlog log position of the current database)
-
Connection Establishes a Connection and sends a BINLOG_DUMP request to the MySQL master
-
MySQL starts pushing the binary Log
-
Use BinlogParser to parse the protocol and add some specific information. Such as supplementary field name, field type, primary key information, unsigned type processing, etc
-
Pass parsed data to the EventSink component for data storage (this is a blocking operation until the storage succeeds)
-
Periodically record the binary Log location so that incremental subscriptions can continue after restart
If the master fails, you can continue to synchronize binlog logs from its other slave nodes to avoid a single point of failure.
Event Sink design:
The main functions of EventSink are as follows:
** Data filtering: ** Supports wildcard filtering mode, table name, field content, etc
** Data routing/distribution: ** Solution 1: N (one Parser corresponds to multiple Stores)
** Data merge: ** Solve n:1 (Multiple Parsers correspond to one store)
** Data processing: ** Performs additional processing, such as join, before entering the store
Data 1:n service
In order to make reasonable use of database resources, common services are isolated according to schema, and then routing data sources is carried out on the upper layer of MySQL or DAO to shield the influence of database physical location on development. Alibaba mainly solves the routing problem of data sources through COBAR/TDDL. As a result, there are typically multiple schemas deployed on a database instance, and each schema is concerned by one or more business parties.
Data N :1 service
Likewise, when a business data size after reaching a certain level, will inevitably involve the issue of the horizontal resolution and vertical resolution, in accordance with the need of the separation of data processing, you need to link multiple store for processing, site becomes more consumption, and consumption data cannot get as much as possible in order to ensure the progress. Therefore, in certain business scenarios, the split incremental data needs to be merged, for example, sorting and merging by timestamp/global ID.
Event Store design:
Supports multiple storage modes, such as Memory mode. Disruptor’s RingBuffer is designed to store messages in memory.
RingBuffer design.
Define three cursors:
Put: Last position of the Sink module to write data to the storage (cursor that synchronizes data writes)
Get: the last extraction location obtained by a data subscription (cursor that synchronizes acquired data)
Ack: Last consumption location for a successful data consumption
Use the RingBuffer implementation of Disruptor to straighten out the RingBuffer:
Implementation description:
-
Put/GET/ACK cursor is used for incremental and long storage. The relationship between the three is PUT >= GET >= ACK
-
The get operation to buffer, either by residuing or by ampersand. (& operation: cusor & (size-1), size needs to be 2 index, high efficiency)
The Instance design.
Instance represents an actual running data queue, including EventPaser, EventSink, EventStore, and so on. Abstract the CanalInstanceGenerator, mainly to consider the configuration management mode:
Manager mode: Connect to your own internal Web Console/Manager system. (Currently mainly for internal use)
Spring mode: Build the Spring configuration based on spring XML + Properties definition.
Server design.
Server represents a Canal run instance, which abstracts the two implementations of Embeded /Netty for componentized use.
Incremental subscription/consumption design:
For details about the protocol format, see CanalProtocol.proto.
Data object format: entryProtocol.proto
Entry Header logfileName [binlog file name] logfileOffset [binlog position] executeTime [Timestamp of changes in binlog] schemaName [database instance] TableName [tableName] eventType [insert/update/delete type] entryType [BEGIN/ END/ data ROWDATA] storeValue [byte data, which can be expand, The corresponding type is RowChange. RowChange isDdl [Whether it is a DDL change operation, For example, create TABLE/DROP TABLE SQL [specific DDL SQL] rowDatas [Specific INSERT /update/ DELETE change data, can be multiple, one binlog event can correspond to multiple changes, For example, batch] beforeColumns [Column array] afterColumns [Column array] Column index [Column number] sqlType [JDBC type] name [Column Name] isKey updated [whether it has been changed] isNull [value isNull] valueCopy the code
Supplementary notes to the above:
-
You can provide the contents of the fields before and after the database change, and complete the information such as name and isKey that are not in the binlog
-
You can provide DDL change statements
Canal HA mechanism:
The Canal HA implementation mechanism relies on ZooKeeper and is mainly divided into THE HA of the Canal Server and the Canal Client.
Canal Server: To reduce requests for MySQL dump, only one instance on different servers is in the running state and the others are in standby state at the same time.
Canal Client: To ensure orderliness, only one Canal Client can perform the GET, ACK, and rollback operations on one instance at a time. Otherwise, the order cannot be guaranteed.
Canal Server HA architecture diagram:
General steps:
-
When Canal Server wants to start a Canal instance, it first makes an EPHEMERAL attempt to Zookeeper.
-
After the Zookeeper node is successfully created, the corresponding Canal Server starts the corresponding Canal Instance. The Canal instance that is not successfully created is in standby state
-
If the node created by Canal Server A disappears, Zookeeper immediately notifies the other Canal Servers to perform Step 1 again and select A Canal Server to start instance
-
Each time the Canal Client connects, it first asks Zookeeper who started Canal Instance and then establishes a link with it. If the link is unavailable, it tries to connect again
The Canal Client approach is similar to the Canal Server approach in that it is controlled by preempting the EPHEMERAL node of Zookeeper.
Follow wechat official account: Learn and share big data, get more technical dry goods