Canal
Positioning: Based on database incremental log parsing, provides incremental data subscription & consumption, currently mainly supports mysql.
Principle:
- Canal emulated the interaction protocol of the mysql slave, disguised itself as the mysql slave, and sent the dump protocol to the mysql master
- Mysql master receives dump request and starts pushing binary log to slave(canal)
- Canal parses binary log objects (originally byte streams)
The entire Parser process can be broken down into several steps:
- Connection Gets the location where the last parse was successful (if started for the first time, the original specified location or the binlog point of the current database)
- Connection The BINLOG_DUMP command is generated when the Connection is established
- Mysql starts pushing Binary logs
- The received Binary Log is parsed using Binlog Parser to provide specific information
- Passing to the EventSink module for data storage is a blocking operation until the storage succeeds
- After the storage is successful, the Binary Log location is periodically recorded
- Data filtering: Supports wildcard filtering mode, table name, and field content
- Data routing/Distribution: Solution 1: N (One Parser corresponds to multiple Stores)
- Data merging: Solve n:1 (Multiple Parsers correspond to one store)
- Data processing: Perform additional processing, such as join, before entering the store
Maxwell
Canal is divided into service end and client end, with numerous derivative applications with stable performance and powerful functions. Canal needs to write its own client to consume the data that Canal parses.
Maxwell’s advantage over Canal is that it is easy to use. It directly outputs data changes as JSON strings without writing a client.
Databus
Databus is a low-latency change capture system that has become an integral part of LinkedIn’s data processing pipeline. Databus addresses the basic requirements for reliably capturing, flowing, and processing major data changes. Databus provides the following features:
- Isolation between source and consumer
- Ensure high availability for sequential and at least one delivery
- Consumed from any point in time in the change flow, including full boot functionality for the entire data.
- Partition consumption
- Source Consistency saving
Alibaba Cloud data transmission service DTS
Data Transmission Service (DTS) is a Data flow Service provided by Ali Cloud that supports Data interaction between RDBMS (relational database), NoSQL, OLAP and other Data sources. DTS provides data migration, real-time data subscription and data real-time synchronization, and other data transmission capacity, which can realize constant suits data migration, data beyond the disaster preparedness, long-distance live (unitary), cross-border data synchronization, real-time data warehouse, shunt, cache update query statements, asynchronous messaging notification and various business application scenarios, Helps you build highly secure, scalable, and highly available data architectures.
Advantages: Data Transmission service DTS supports Data Transmission between RDBMS, NoSQL, AND OLAP Data sources. It provides many data transmission methods such as data migration, real-time data subscription and real-time data synchronization. Compared with third-party data flow tools, data transmission service DTS provides more diversified, high-performance, secure and reliable transmission links. It also provides many convenient functions, greatly facilitating the creation and management of transmission links.
It is a message queue, which will push you its wrapped SQL objects. You can make a service to parse these SQL objects.
Eliminating expensive deployment and maintenance costs. DTS ADAPTS to Aliccloud RDS (online relational database) and DRDS to solve the subscription high availability problem in scenarios such as Binlog recovery, active/standby switchover, and VPC network switchover. At the same time, targeted performance optimization is carried out for RDS. It is recommended for stability, performance and cost.