preface

ZooKeeper is no stranger. But do you really know what ZooKeeper is? If someone/interviewer asked you to tell them what ZooKeeper is, how far would you go?

I used ZooKeeper as the registry of Dubbo, and I used ZooKeeper as the management tool of Solr cluster when I set up a Solr cluster. A few days ago, while summing up my project experience, I suddenly asked myself what ZooKeeper really is. “①Zookeeper can be used as a registry. Zookeeper is a member of the Hadoop ecosystem. (3) When building a Zookeeper cluster, use an odd number of servers.” It can be seen that my understanding of Zookeeper is only superficial.

Therefore, through this article, I hope to give you a little more detailed understanding of ZooKeeper. If you haven’t learned ZooKeeper before, this article will be your stepping stone to ZooKeeper. If you’re already familiar with ZooKeeper, this article will take you through some of the basic concepts behind ZooKeeper.

Finally, this paper only covers some concepts of ZooKeeper, but does not involve the use of ZooKeeper or the establishment of ZooKeeper cluster. There are articles on the Internet that introduce the use of ZooKeeper and the construction of ZooKeeper cluster. You can refer to them if you need.

What is ZooKeeper

They are the origin of the

From Paxos to Zookeeper, chapter 4, Section 1.

Zookeeper was originally developed by a research group at Yahoo Research. At that time, researchers found that many large systems in Yahoo basically rely on a similar system for distributed coordination, but these systems often have distributed single point problems. So yahoo’s developers tried to develop a universal distributed coordination framework with no single point of problem to allow developers to focus on business logic.

There’s actually an interesting story about the name of the ZooKeeper project. In the early days of the project, yahoo engineers wanted to name the project after an animal, considering that many internal projects had been named after animals (the famous Pig project, for example). Raghura Marishnan, the institute’s chief scientist at the time, joked: “At this rate, we’ll be a zoo!” Yet a, you have said is called Zookeeper 111 for each named after animals distributed components together, yahoo’s whole distributed system looks like a big zoo, and they are just used for the distributed environment of coordination so they name it was born.

1.1 an overview of the ZooKeeper

ZooKeeper is an open source distributed coordination service. The ZooKeeper framework was first developed in Yahoo! Built to access their applications in a simple and robust manner. Later, Apache ZooKeeper became the standard for organized services used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data. ZooKeeper is designed to encapsulate complex and error-prone distributed consistency services into an efficient and reliable set of primitives that can be delivered to users in a series of easy-to-use interfaces.

Primitives: Operating system or computer network language category. A process consisting of several instructions used to perform a function. Indivisible · That is, the execution of primitives must be continuous and cannot be interrupted during execution.

ZooKeeper is a typical distributed data consistency solution. Distributed applications can implement functions such as data publishing/subscription, load balancing, naming service, distributed coordination/notification, cluster management, Master election, distributed lock, and distributed queue based on ZooKeeper.

** One of the most common usage scenarios for Zookeeper is to act as a registry for service producers and service consumers. ** Service producers register their services with the Zookeeper center. Service consumers first search for services in Zookeeper during service invocation, and then invoke the content and data of service producers after obtaining the detailed information of service producers. As shown in the figure below, Zookeeper acts as the registry in the Dubbo architecture.

Dubbo

1.2 ZooKeeper in combination with personal use

In my own projects, I mainly used ZooKeeper as the Dubbo registry (Dubbo officially recommends using the ZooKeeper registry). In addition, WHEN building a Solr cluster, I used ZooKeeper as the management tool of solr cluster. ZooKeeper provides the following functions: 1. Cluster management, fault tolerance, and load balancing. 2. Centralized management of configuration files. 3. Cluster entrance.

I personally think it’s better to use the cluster version of ZooKeeper than the standalone version. The architecture diagram on the official website shows a cluster version of ZooKeeper. A ZooKeeper cluster usually consists of three servers.

Why is it best to use an odd number of servers to form a ZooKeeper cluster?

Zookeeper fault tolerance means that when several ZooKeeper servers are down, the remaining number of ZooKeeper servers must be greater than the number of ZooKeeper servers down before the whole ZooKeeper server can still be used. If we have n ZooKeeper servers in our cluster, then the number of remaining services must be greater than N /2. So just to make a conclusion, 2n and 2n-1 have the same tolerance, they’re both n-1, and you can think about it for yourself, and this should be a very simple math problem. For example, if we have 3 ZooKeeper servers, the maximum allowed downtime is 1. If we have 4 ZooKeeper servers, only 1 zooKeeper server is allowed to downtime. If we have 5 zooKeeper servers, the maximum allowed to fail is 2. If we have 6 ZooKeeper servers, only 2 zooKeeper servers are allowed to fail.

So why add an unnecessary ZooKeeper?

Some important concepts about ZooKeeper

2.1 Summary of important concepts

  • ZooKeeper itself is a distributed application (ZooKeeper works as long as half or more of its nodes are alive).

  • To ensure high availability, it is best to deploy ZooKeeper in a cluster, so that as long as most of the machines in the cluster are available (and can tolerate certain machine failures), ZooKeeper itself is still available.

  • ZooKeeper stores data in memory, which ensures high throughput and low latency (but memory limits the amount of data that can be stored are a further reason to keep the amount of data stored in ZNode small).

  • ZooKeeper is high-performance. This is particularly high performance in applications that are “read” rather than “write” because “write” causes state synchronization across all servers. (” read “over” write “is a typical scenario for coordinating services.)

  • ZooKeeper has the concept of temporary nodes. The transient node always exists when the client session that created the temporary node remains active. When the session terminates, the instantaneous node is deleted. A persistent node means that once a ZNode is created, it will remain on Zookeeper until it is removed.

  • In fact, ZooKeeper only provides two functions: ① Manage (store, read) the data submitted by user programs; ② Submit data node listening service for user program.

Sessions, ZNodes, versions, Watcher, and ACLs are all summarized in Paxos to Zookeeper, chapter 4, Section 1, and Chapter 7, Section 8.

2.2 Sessions

Session refers to the Session between the ZooKeeper server and the client. In ZooKeeper, a client connection is a TCP long connection between a client and a server. When the client is started, a TCP connection is established with the server. From the first connection established, the client session life cycle begins. Through this connection, the client can maintain a valid session with the server through heartbeat detection, send requests to and receive responses from the Zookeeper server, and receive Watch event notifications from the server through this connection. The Session sessionTimeout value is used to set the timeout period of a client Session. If the client is disconnected due to server pressure, network failure, or client disconnection, the session created before can be reconnected to any server in the cluster within the time specified by sessionTimeout.

Before creating a session for a client, the server first assigns a sessionID to each client. The sessionID is an important identifier of a Zookeeper session, and many session-related operating mechanisms are based on the sessionID. Therefore, ensure that the sessionID assigned to the client by any server is globally unique.

2.3 Znode

When we talk about distribution, we usually refer to each machine that makes up a cluster. However, in Zookeeper, “nodes” are divided into two categories. The first category also refers to the machines that constitute the cluster, which we call machine nodes. The second type is the data unit in the exponential data model, which is called data node – ZNode.

Zookeeper stores all data in memory. The data model is a Znode Tree. The path divided by a slash (/) is a Znode, for example, /foo/path1. Each holds its own data content, along with a set of attributes.

** In Zookeeper, nodes can be divided into persistent nodes and temporary nodes. A persistent node means that once a ZNode is created, it will remain on Zookeeper until the ZNode is removed. Temporary nodes, on the other hand, have a life cycle that is tied to a client session. Once the client session fails, all temporary nodes created by the client are removed. In addition, ZooKeeper allows users to add a special property for each node: SEQUENTIAL. Once a node is marked with this property, when the node is created, Zookeeper automatically appends its node name to an integer that is an increment maintained by the parent node.

Version 2.4

As mentioned above, each ZNode of Zookeeper stores data. For each ZNode, Zookeeper maintains a data structure called Stat, which records the three data versions of this ZNode. The values are version (current ZNode version), cversion (current ZNode child node version), and cversion (current ZNode ACL version).

2.5 Watcher

Watcher (event listener) is an important feature in Zookeeper. Zookeeper allows users to register some Watcher on the specified node, and when some specific events are triggered, the Zookeeper server will notify the event to the interested client. This mechanism is an important feature of Zookeeper to implement the distributed coordination service.

2.6 the ACL

Zookeeper uses AccessControlLists (acls) to control permissions, similar to the permission control of UNIX file systems. Zookeeper defines the following five permissions.

In particular, the CREATE and DELETE permissions are permissions on child nodes.

Three characteristics of ZooKeeper

  • Sequential consistency: Transaction requests from the same client are eventually applied to ZooKeeper in a strict order.

  • Atomicity: The processing results of all transaction requests are applied consistently across all machines in the cluster, that is, a transaction is either successfully applied across all machines in the cluster or not applied at all.

  • Single system image: The client sees the same server-side data model no matter which ZooKeeper server it is connected to.

  • Reliability: Once a change request is applied, the result of the change is persisted until overwritten by the next change.

Four ZooKeeper design Objectives

4.1 Simple data model

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace, similar to a standard file system. Namespaces are made up of data registers in ZooKeeper – called ZNodes, which are similar to files and directories. Unlike typical file systems designed for storage, ZooKeeper data is kept in memory, which means ZooKeeper can achieve high throughput and low latency.

4.2 Cluster Construction

To ensure high availability, it is best to deploy ZooKeeper in a cluster, so that as long as most of the machines in the cluster are available (and can tolerate certain machine failures), ZooKeeper itself is still available. When using ZooKeeper, the client needs to know the list of cluster machines and establish a TCP connection with a machine in the cluster to use the service. The client uses the TCP connection to send requests, obtain results, obtain listening events, and send heartbeat packets. If this connection is abnormally disconnected, the client can connect to another machine.

Architecture diagram provided by ZooKeeper:

Each Server in the preceding figure represents a Server on which the Zookeeper service is installed. The servers that make up the ZooKeeper service maintain the current server state in memory and communicate with each other. Data consistency is maintained between clusters using the Zookeeper Atomic Broadcast (Zab) protocol.

4.3 Sequential Access

ZooKeeper assigns a globally unique increment number to each update request from the client, which reflects the order of all transactions. Applications can use ZooKeeper to implement higher-level synchronization primitives. This number is also called the timestamp — zxID (Zookeeper Transaction Id)

4.4 the high performance

ZooKeeper is high-performance. This is particularly high performance in applications that are “read” rather than “write” because “write” causes state synchronization across all servers. (” read “over” write “is a typical scenario for coordinating services.)

5 Roles of the ZooKeeper cluster

Typical cluster mode: Master/Slave mode. In this mode, the Master server serves as the Master server to provide write services, while other Slave servers obtain the latest data from the Master server to provide read services through asynchronous replication.

However, ZooKeeper does not choose the traditional concept of Master/Slave, but introduces the Leader, Follower, and Observer roles. As shown in the figure below

All the machines in the ZooKeeper cluster select a Leader through a Leader election process. The Leader can provide both write and read services for clients. In addition to the Leader, followers and Observers can only provide read services. The only difference between followers and observers is that Observer machines do not participate in the Leader election process or the “half-write success” policy for write operations. Therefore, Observer machines can improve the read performance of the cluster without affecting the write performance.

When the Leader server has network interruption, crash, exit and restart, ZAB protocol will enter the recovery mode and elect a new Leader server. The process goes something like this:

  1. Leader election (election stage) : All nodes are in the election stage at the beginning. As long as one node gets more than half of the votes, it can be elected as a quasi-leader.

  2. Discovery: During this phase, the Followers communicate with the would-be leader to synchronize the transaction proposals that the Followers have recently received.

  3. Synchronization: The Synchronization phase synchronizes all replicas in the cluster using the latest proposal history obtained by the leader in the previous phase. The would-be leader becomes a real leader only after synchronization is complete.

  4. Broadcast: In this phase, the Zookeeper cluster can officially provide transaction services and the Leader can Broadcast messages. At the same time, if a new node is added, the new node needs to be synchronized.

ZooKeeper, ZAB, and Paxos

6.1 ZAB protocol &Paxos Algorithm

The Paxos algorithm is the soul of ZooKeeper. However, ZooKeeper does not completely adopt Paxos algorithm, but uses ZAB protocol as its core algorithm to ensure data consistency. In addition, ZooKeeper official documents also pointed out that ZAB protocol is not a general distributed consistency algorithm like Paxos algorithm, it is a crash recovery atomic message broadcast algorithm specially designed for ZooKeeper.

6.2 ZAB Protocol Introduction

ZAB (ZooKeeper Atomic Broadcast Atomic Broadcast) is an Atomic Broadcast protocol specially designed for distributed coordination service ZooKeeper to support crash recovery. ZooKeeper relies on the ZAB protocol to implement distributed data consistency. Based on this protocol, ZooKeeper implements a system architecture in active/standby mode to maintain data consistency among replicas in the cluster.

6.3 ZAB supports two basic modes: crash recovery and message broadcast

ZAB protocol includes two basic modes, crash recovery and message broadcast. When the whole service framework is in the startup process, or when the Leader server has network interruption, crash exit and restart, ZAB protocol will enter the recovery mode and elect a new Leader server. When a new Leader server is elected and more than half of the machines in the cluster complete state synchronization with the Leader server, the ZAB protocol exits the recovery mode. Among them, the so-called state synchronization is data synchronization, which is used to ensure that more than half of the machines in the cluster can keep the data state consistent with that of the Leader server.

When more than half of the Follower servers in the cluster have completed status synchronization with the Leader server, the entire service framework can enter the message broadcast mode. When a server that also complies with ZAB protocol is added to the cluster after startup, if there is already a Leader server in the cluster for message broadcast, the newly added server will automatically enter the data recovery mode: Find the Leader’s server, synchronize data with it, and then participate in the message broadcast process together. As mentioned in the introduction above, ZooKeeper is designed to allow only a single Leader server to process transaction requests. After receiving the transaction request from the client, the Leader server will generate the corresponding transaction proposal and initiate a round of broadcast protocol. If other machines in the cluster receive a transaction request from the client, these non-Leader servers will first forward the transaction request to the Leader server.