(Reprinted in this article)
The role of the ZooKeeper
ZooKeeper is an open source distributed coordination service framework, which you can also think of as a distributed (small) storage system that ensures consistency. It is especially good for storing some common configuration information, some metadata of the cluster, and so on.
It has persistent nodes and temporary nodes, which are useful in conjunction with the Watcher mechanism. When the client that created the temporary node disconnects from ZooKeeper, the temporary node disappears, and the client that subscribed to the node status change is notified of the node status change.
Therefore, a service in the cluster can be detected when it goes online or offline. Therefore, it can be used to realize service discovery as well as failover listening mechanism.
Kafka is heavily dependent on ZooKeeper, and Kafka would not run without ZooKeeper. ZooKeeper provides metadata management for Kafka, such as Broker information, topic data, partition data, and so on.
As each Broker starts, it interacts with ZooKeeper so that ZooKeeper stores all the themes, configurations, replicas, and other information in the cluster.
Other mechanisms, such as elections and capacity expansion, also rely on ZooKeeper.
For example, controller elections: Each Broker startup attempts to register a temporary /controller node in ZooKeeper to campaign for the controller, and the first Broker that creates the /controller node is designated as the controller.
Nodes that fail to compete also rely on the Watcher mechanism to listen for the node. If the controller fails, other brokers will continue to fight for the failover of the controller.
ZooKeeper is important to Kafka.
So why did you abandon ZooKeeper? The software architecture is always evolving, and the reason why you changed it must be because of bottlenecks.
Let’s take a look at the operational level.
First of all, being one middleware and having to rely on another middleware feels a little strange.
If you want to rely on Netty, that’s fine. But Kafka runs with a ZooKeeper cluster, which is a bit odd.
If your company is going to go Kafka, it has to go with ZooKeeper, which passively increases the complexity of operations. For example, if you go to the mall to buy clothes and want to buy a coat, the waiter says not only to sell, but to buy a set, is it too much money?
So the operations people have to take care of not only the Kafka cluster, but also the ZooKeeper cluster.
Now look at the performance side of things.
One feature of ZooKeeper is consistency.
If the data of one node in the ZooKeeper cluster changes, the other ZooKeeper nodes are notified to perform the update at the same time, and they have to wait until all of them (more than half of them) finish writing, which results in poor write performance.
Generally speaking, ZooKeeper is only suitable for storing simple configuration or cluster metadata. It is not a real storage system.
If the amount of data written is too large, ZooKeeper performance and stability may deteriorate, which may cause Watch delay or loss.
When the Kafka cluster is large and the number of partitions is large, ZooKeeper stores a large amount of metadata, resulting in poor performance.
Also, ZooKeeper is distributed and requires an election, which is not fast and does not provide services during the election period!
Zookeeper-based performance issues Kafka was previously upgraded.
For example, previously, the displacement data of consumers were stored on ZooKeeper, so when submitting or acquiring displacement, ZooKeeper needed to access ZooKeeper, which was too large for ZooKeeper to handle.
Therefore, the displacement Topic (__consumer_offsets) was introduced later to treat the submission and acquisition of the displacement as messages and store them in the log to avoid the poor performance of frequent visits to ZooKeeper.
Some large companies, such as LeByte, may need to support megabytes, which is not stable under the current Kafka single-cluster architecture, where the number of partitions a single cluster can carry is limited.
So Kafka needs to go to ZooKeeper.
So what’s Kafka like without Zookeeper?
Kafka without Zookeeper stores metadata internally, using the old Log storage mechanism to store metadata.
Just like the shift topic mentioned above, there will be a metadata topic, and the metadata will be stored in the Log like normal messages.
So the metadata is the same as the previous shift, using the existing message storage mechanism to achieve the function of a little modification. Perfect!
And then there was KRaft to implement Controller Quorum.
This protocol is based on Raft, so I won’t expand the details, but it can solve the election of Controller Leader and make all nodes reach a consensus.
The previous ZooKeeper-based implementation of a single Controller had a problem when the number of partitions was too large and failover was too slow.
When a Controller changes, all metadata needs to be reloaded onto the new Controller and synchronized to all brokers in the cluster.
On the other hand, the Leader election switch in Controller Quorum is very fast because the metadata has been synchronized with the Quorum, that is, the Quorum Broker has all the metadata, so there is no need to reload the metadata!
And other brokers already store some metadata based on Log, so incremental updates are required, not the full amount.
This wave resolves the problem of excessive metadata and supports more partitions!
The last
Some of you might look at this and say, well, why didn’t you do that in the first place?
Because ZooKeeper is a powerful and proven tool, it’s easy to implement some features early on without having to implement them yourself.
If the ZooKeeper mechanism had not caused this bottleneck, this transformation would not have been possible.
That’s the software. There’s no need to reinvent the wheel.
BV1Bf4y1W7B6