Be a positive person, the harder you work, the luckier you get!

This article has been included in GitHub:github.com/dingwpmz/Ja…

Kafka has released version 2.8 and officially removed its dependency on Zookeeper. What is the design philosophy behind Kafka? Does ** just reduce an external dependency?

The answer is obviously not that simple, so let me explain it slowly.

Before answering why, I felt it was important to explain the classic usage scenario of Zookeeper.

1. Classic Usage scenarios of Zookeeper


Zookeeper is accompanied by the rise of big data and distributed fields. A very important issue in big data is how to use many cheap machines to achieve reliable storage.

The so-called cheap machine is the probability of failure is very big, but the cost is also very low, a single, distributed domain that you want to use multiple machines to form a cluster, the data is stored on multiple machines (copy), in order to convenient for data consistency, usually need to choose from a duplicate set a master node users to process the data read and write, other nodes copies data from the master node, When the primary node is down, it needs to be automatically reelected to achieve high availability.

In the above scenario, there is a very important function of Leader election, which is how to elect a primary node and automatically trigger a re-election when the primary node goes down, so as to realize the automatic switchover between the primary node and the secondary node and achieve high availability.

Using the temporary sequential nodes and event monitoring mechanism provided by Zookeeper, the Leader election can be easily implemented.

The above T1 and T2 can be understood as multiple members in an organization, which can provide the same service. However, in order to achieve the cold standby effect (that is, only one member provides services externally at the same time), we call it the Leader. When the Leader breaks down or stops service, other members in the organization will compete for the Leader again. And continue to provide services).

As shown in the preceding figure, Zookeeper is deployed in a cluster to avoid single points of failure and provide strong data consistency within the cluster.

When members need to compete for the Leader, the implementation of Zookeeper is to create two child nodes to a data node in Zookeeper (in the example, /app/order-service/ Leader), and they are sequential temporary nodes.

The client checks whether the serial number of the node to be created is the lowest in /app/order-service/leader. If so, the node becomes the leader and provides services externally.

Once the process represented by the Leader breaks down and its session with Zookeeper becomes invalid, the temporary node associated with it will be deleted. Once the node created by the Leader is deleted, the subsequent nodes will be notified, thus triggering master selection again. A new Leader is elected to continue to provide external services and ensure the high availability of services.

Review the above scenarios, Zookeeper can be very easy to achieve master selection, to improve the availability of the application to bring simplicity, mainly using Zookeeper several features:

  • Temporary node A temporary node is associated with a session. After the session that created the temporary node ends, the temporary node is automatically deleted without manual deletion.

  • Order node

  • Event mechanism With the event mechanism, Zookeeper notifies the remaining application nodes in time to trigger elections again, making automatic primary/secondary switchover very simple.

Kafka’s urgent need for Zookeeper


There are many Leader elections in Kafka. If you are familiar with Kafka, you should know that a topic can have multiple partitions (data shards), and each data shard can be configured with multiple copies. How to ensure the consistency of data in a partition among multiple copies becomes an urgent requirement.

The implementation of Kafka is multiple copies of a partition, from which a Leader is elected to undertake the read and write requests of the client. The Leader node copies the content from the master node. The Leader node makes a decision to determine whether the data is successfully written to the copy.

Partition distribution diagram of topic in Kafka:

Therefore, Leader election is required, which can be easily realized based on Zookeeper. Since then, a “honeymoon journey” has been started.

3. The Achilles heel of Zookeeper


Zookeeper is deployed in a cluster. As long as more than half of the nodes in the cluster are alive, Zookeeper can provide services. For example, if one Zookeeper node is allowed to break down, the cluster can still provide services. A Zookeeper with 5 nodes allows 2 nodes to go down.

However, The design of Zookeeper is a CP model, that is, to ensure strong consistency of data, it is necessary to make sacrifices in terms of availability.

The Leader node is responsible for writing data, and the Leader node and the slave node can accept read requests. However, the Zookeeper internal node cannot provide services during the election. Of course, under normal circumstances elections can be very fast, but under exceptional circumstances, such as a Full Gc on the Zookeeper node, the impact can be devastating.

If Full Gc occurs frequently on the Zookeeper node, the session with the client will time out. As it cannot respond to the heartbeat request (Stop World) from the client at this time, the temporary nodes associated with the session will be deleted. Note that all temporary nodes will be deleted at this time. The event notification mechanism that Zookeeper relies on becomes invalid, and the election service in the entire cluster becomes invalid.

From a high availability point of view, the availability of A Kafka cluster is not only dependent on itself, but also restricted by external components, which is clearly not an elegant solution in the long run.

With the continuous improvement of related technologies in the distributed domain, the idea of decentralization is gradually rising, and the voice of de-ZooKeeper is getting more and more high. In this process, a very good algorithm emerged: Raft protocol.

Two important components of the Raft protocol: Leader election, log replication, and log replication provide strong data consistency for multiple copies. A significant feature is that Raft node is a decentralized architecture that does not rely on external components, but embedded in the application as a protocol cluster, that is, integrated with the application itself.

Using the distribution of Kafka topics as an example, here is an example of a diagram referencing Raft protocol:

This article doesn’t go into the Raft protocol in depth, but it provides an alternative to the Raft protocol that doesn’t have to rely on third party components, so why not? Finally, Kafka officially scrapped Zookeeper and embraced Raft in version 2.8.

If you’re interested in Raft protocols, I recommend reading my series on Raft protocols:

  1. Raft Protocol

  2. Implementation principle of Leader protocol selection for Raft protocol


Well, this article is introduced here, three even (attention, praise, message) is the biggest encouragement to me, of course, you can add the author wechat: DINGwPMZ, pull you into the technical exchange group, common progress.

Finally, share a core RocketMQ ebook with me and you will gain experience in the operation and maintenance of billions of message flows.

Access: wechat search “Middleware Interest circle”, reply RMQPDF can be obtained.

Middleware interest circle

RocketMQ Technology Insider author maintenance, mainly into the system analysis of JAVA mainstream middleware architecture and design principles, to build a complete Internet distributed architecture system, help break the workplace bottleneck.