1. Kingbus profile
1.1 What is Kingbus?
Kingbus is a distributed MySQL binlog storage system based on raft strong consistent protocol implementation. It can act as a MySQL Slave to synchronize binlogs from real masters and store them in a distributed cluster. It also acts as a MySQL Master to synchronize binlogs from the cluster to other slaves. Kingbus has the following features:
- Compatible with the MySQL replication protocol, the Master can synchronize the binlog from the kingbus using Gtid, and the slave can pull the binlog from the Kingbus using Gtid.
- Data replication across regions kingbus supports data replication across regions through RAFT protocol. The binlog data written to the cluster is strongly consistent across multiple nodes, and the binlog sequence is exactly the same as that of the master.
- High availability because Kingbus is built on Raft strong consistent protocol, it can make the binlog pull and push service highly available when more than half of the nodes in the cluster are alive.
1.2 What problems can Kingbus solve?
- Kingbus reduces network traffic for the Master. In the replication topology with one Master and multiple slaves, the Master needs to send binlogs to each slave. If there are too many slaves, the network traffic may reach the upper limit of the Master network adapter. For example, deleting a large table or online DDL on the Master may cause a large number of binlog events to be generated instantly. If 10 slaves are attached to the Master, the traffic of the network adapter on the Master will be magnified by 10 times. If the master uses a gigabit network adapter, it is likely that the network adapter will be full if it generates more than 10MB/S of traffic. By connecting the master with kingbus, the slave can be distributed to multiple machines to equalize traffic.
- Simplify the Master Failover process. Only one Slave connected to the Kingbus is promoted to Master and the Kingbus is redirected to the new Master. Other slaves are still connected to the Kingbus and the replication topology remains unchanged.
- Saves space for storing binlog files for the Master. MySQL typically has expensive SSDS, and if the binlog file takes up a lot of space, MySQL will have to store less data. You can reduce the number of binlog files stored on Master by storing them all in kingbus
- Supports heterogeneous replication. Alibaba connects to Kingbus through Canal, which is open source of Alibaba, and Kingbus continuously pushes binlog to Canal. Canal receives binlog and then pushes it to Kafka message queue, which is finally stored in HBase. Business departments directly write SQL to realize real-time business analysis through Hive.
2. Kingbus overall architecture
The overall structure of Kingbus is as follows:
- Storage is responsible for storing raft log entries and Metadata. In Kingbus, raft log and mysql binlog are combined together. In this way, there is no need to store the two types of logs separately, saving storage space. This is because kingbus needs to store meta information such as raft node voting information and the specific content of some special binlog events (FORMAT_DESCRIPTION_EVENT).
- Raft replicates the Kingbus cluster’s Lead election, log replication, and more using the EtCD Raft Library.
- Binlog Syncer, which only runs on the Lead node of the Raft cluster, with only one syncer for the whole cluster. Syncer, disguised as a slave, establishes a Master/slave replication connection with the Master. The Master filters binlog events that syncer has received based on the executed_gtid_set syncer sends. Send only binlog events that syncer does not receive. This replication protocol is fully compatible with the MySQL master-slave replication mechanism. When syncer receives a Binlog event, it does some processing based on the binlog event type and then encapsulates the binlog event as a message to submit to the RAFT cluster. Using the RAFT algorithm, this binlog event can be stored on multiple nodes with strong consistency.
- The binlog server is a Master that implements the replication protocol. The real slave can connect to the port that the binlog Server listens to. The binlog Server will send the binlog event to the slave. The whole process of sending binlog events is implemented according to the MySQL replication protocol. If no Binlog event is sent to the slave, the Binlog Server periodically sends a heartbeat event to the slave to keep the replication connection alive.
- API Server, responsible for the management of the entire Kingbus cluster, including the following:
- Raft Cluster membership You can view cluster status, add a node, remove a node, and update node information
- Binlog syncer Operations: Start a binlog syncer, stop a binlog syncer, and view the binlog syncer status.
- Binlog Server Related operations: Start a binlog Server, stop the binlog Server, and view the binlog Server status. Exceptions in the Server layer do not affect the Raft layer. The Server can be thought of as a plug-in that starts and stops on demand. When kingbus is extended in the future, only the server that implements the related logic is needed. For example, if a server implements a Kafka protocol, messages in Kingbus can be consumed through kafka Client.
3. Kingbus core implementation
3.1 Core implementation of storage
There are two log forms in a storage: Raft log (called raft log), which is generated and used by raft algorithm, and user log (mysql binlog event). Storage In the design, two types of logs are combined into one Log Entry. It’s just differentiated by different headers. Storage consists of data files and index files, as shown in the following figure:
- The segment is of a fixed size (1GB) and can only be appended. The name is first_raft_index-last_raft_index, indicating the raft index range of this segment.
- Only the last segment is writable and its file name is first_raft_index-inprogress. Other segments are read-only.
- Read-only segments and index files are written and read using Mmap.
- The index content of the last segment is stored on both disk and memory. Reading an index is only done in memory.
3.2 Use of etCD Raft library
The Etcd Raft Library processes applied logs, committed entries, and so on single-threaded. Refer to the link for details of this function, which should take as short a processing time as possible. If the processing time exceeds raft election time, the cluster will be re-elected. This requires special attention.
3.3 Binlog Syncer core implementation
Binlog Syncer:
- Pull binlog event
- Parse and process binlog events
- Submit binlog Events to the RAFT cluster. It is obvious that the processing speed of the whole process can be improved through the pipeline mechanism. Kingbus uses a separate Goroutine for each stage, connecting different stages through the pipeline. Since the binlog syncer is received one by one according to the binlog event, the syncer cannot guarantee the integrity of the transaction. It is possible that the last transaction may be incomplete after the syncer has been suspended and the Master needs to be reconnected. Binlog Syncer requires the ability to discover transaction integrity. Kingbus implements the function of transaction integrity parsing, completely referring to MySQL source code implementation.
3.4 Core implementation of binlog Server
The Binlog Server functions as a master. When the slave establishes a replication connection with the Binlog Server, the slave sends related commands, and the Binlog Server responds to these commands. Finally, a binlog event is sent to the slave. For each slave, the Binlog Server will start a Goroutine to read the RAFT log and remove the header information, which becomes a Binlog event and then sends it to the slave.
4. To summarize
This article briefly introduces the overall architecture, core components and processes of Kingbus. Through this article, readers are expected to have a more comprehensive understanding of Kingbus.
Welcome Java engineers who have worked for one to five years to join Java Programmer development: 854393687 group provides free Java architecture learning materials (which have high availability, high concurrency, high performance and distributed, Jvm performance tuning, Spring source code, MyBatis, Netty, Redis, Kafka, Mysql, Zookeeper, Tomcat, Docker, Dubbo, multiple knowledge architecture data, such as the Nginx) reasonable use their every minute and second time to learn to improve yourself, don’t use “no time” to hide his ideas on the lazy! While young, hard to fight, to the future of their own account!