1. Zookeeper Overview
Zookeeper is an open source distributed coordination service currently maintained by Apache. Zookeeper implements common distributed system functions such as publish/subscribe, load balancing, command service, distributed coordination/notification, cluster management, Master election, distributed lock, and distributed queue. It has the following features:
- Sequence consistency: Transaction requests initiated from a client are applied to Zookeeper in strict accordance with the order in which they are initiated.
- Atomicity: The processing results of all transaction requests are consistent across all machines in the cluster; There is no case where the transaction is applied on one part of the machine and not on the other;
- Single view: All clients see the same server-side data model;
- Reliability: Once a server successfully applies a transaction, the changes it causes persist until they are changed by another transaction;
- Real-time: Once a transaction is successfully applied, Zookeeper can ensure that the client can immediately read the data of the latest status of the changed transaction.
2. Design objectives of Zookeeper
Zookeeper aims to provide a high performance, high availability, and strict sequential access control distributed coordination service for large distributed systems with high throughput. It has the following four objectives:
2.1 Objective 1: Simple data model
Zookeeper stores data in a tree structure consisting of a series of data nodes called ZNodes, similar to a common file system. However, unlike common file systems, Zookeeper stores all data in memory to achieve high throughput and reduce access latency.
2.2 Goal 2: Build a cluster
A Zookeeper cluster can be formed by a group of Zookeeper services. In the cluster, each machine maintains its own state in the memory and communicates with each other. As long as half of the machines in the cluster can work properly, the whole cluster can provide services normally.
2.3 Objective 3: Sequential access
For each update request from the client, Zookeeper assigns a globally unique incremental ID that reflects the order of all transaction requests.
2.4 Goal 4: High performance and high availability
ZooKeeper stores all data in memory to maintain high performance and implements high availability through service clusters. Because all updates and deletes of ZooKeeper are based on transactions, ZooKeeper has high performance in application scenarios with more reads and fewer writes.
3. Core Concepts
3.1 Cluster Roles
Machines in the Zookeeper cluster are classified into the following roles:
- Leader: Provides read and write services for clients and maintains the cluster status. Leader is elected by the cluster.
- Follower: provides read and write services for the client and periodically reports its node status to the Leader. At the same time, I also participate in the strategy of “write half success” and the election of Leader.
- Observer: Provides read and write services for clients and periodically reports its node status to the Leader. However, it does not participate in the write success policy or the Leader election. Therefore, the Observer can improve the read performance of the cluster without affecting the write performance.
3.2 session
The Zookeeper client connects to the service cluster over the TCP long connection. A Session is established from the first connection, and then the heartbeat detection mechanism is used to maintain the valid Session status. Through this connection, clients can send requests and receive responses, as well as receive notifications of Watch events.
Another core concept about sessions is sessionTimeOut. When a connection is disconnected due to a network fault or a client disconnection, the session created before is still valid as long as the connection is re-established within the timeout period.
3.3 Data Nodes
The Zookeeper data model is a node tree consisting of a series of basic data units (ZNodes), where the root node is /. Each node stores its own data and node information. Zookeeper nodes are classified into two types:
- Persistent node: Once created, a node persists unless it is actively deleted.
- Temporary node: All temporary nodes created by the client that created the node are deleted once the client session that created the node expires.
Both temporary and persistent nodes can add a special attribute: SEQUENTIAL, which indicates whether the node has an incremental attribute. If you specify this property, when the node is created, Zookeeper automatically appends its node name to an increasing number maintained by the parent node.
3.4 Node Information
As each ZNode stores data, it maintains a data structure called Stat, which stores all state information about the node. As follows:
3.5 Watcher
A common feature in Zookeeper is Watcher, which allows users to register listeners on specified nodes for events of interest. When an event occurs, the listener is triggered and pushes the event information to the client. This mechanism is an important feature of Zookeeper to implement distributed coordination service.
3.6 the ACL
Zookeeper uses Access Control Lists (ACLs) to Control permissions, similar to the permission Control of UNIX file systems. It defines the following five permissions:
- CREATE: allows the creation of child nodes.
- READ: allows fetching data from a node and listing its children;
- WRITE: Data can be set for a node.
- DELETE: allows child nodes to be deleted.
- ADMIN: Allows you to set permissions for a node.
Iv. ZAB Agreement
4.1 ZAB protocol and data consistency
ZAB protocol is an atomic broadcast protocol specially designed by Zookeeper to support crash recovery. Zookeepe uses a master-slave system architecture to maintain data consistency among replicas in a cluster. Details are as follows:
Zookeeper uses a single master process to receive and process all transaction requests from clients, and uses the atomic broadcast protocol to broadcast data state changes to all replica processes in the form of transaction proposals. The diagram below:
The specific process is as follows:
All transaction requests must be processed by a unique Leader service, which converts the request into a transaction Proposal and distributes the Proposal to all Follower services in the cluster. If half of the Follower services respond correctly, the Leader will send a Commit message to all the followers again, asking for the previous Proposal to be submitted.
4.2 Content of ZAB Agreement
ZAB protocol includes two basic modes, namely crash recovery and message broadcast:
1. Crash recovery
When the entire service framework in the process of start, or when the Leader server is abnormal, ZAB agreement will enter recovery mode, by more than half of new Leader election mechanism, the other machine will sync from the new Leader, when there is more than half of the machine after completion status synchronization, quit recovery mode, enter the news broadcast model.
2. Message broadcasting
The message broadcast process of ZAB protocol uses atomic broadcast protocol. In the whole message broadcast process, the Leader server will request each transaction to generate a corresponding Proposal, and assign it a globally unique increasing transaction ID(ZXID), and then broadcast it. The specific process is as follows:
The Leader service assigns a separate queue to each Follower server, queues transaction proposals in turn, and sends messages based on a FIFO(first-in, first-out) policy. After receiving the Proposal, the Follower service writes it to the local disk as a transaction log and sends an Ack response to the Leader after the Proposal is successfully written. After receiving more than half of the Ack responses from followers, the Leader broadcasts a Commit message to all followers to inform them to Commit the transaction, after which the Leader himself completes the transaction. Each Follower completes the transaction after receiving the Commit message.
Typical Application scenarios of Zookeeper
5.1 Publish/subscribe data
A publish/subscribe system for data, often also used as a configuration center. In a distributed system, you may have tens of thousands of service nodes. If you want to change the configuration of all the services, because there are too many data nodes, you cannot change the configuration of each service node. You should use a unified configuration center at design time. After that, the publisher only needs to send the new configuration to the configuration center, and all service nodes can automatically download and update the new configuration. In this way, the configuration can be centrally managed and dynamically updated.
Zookeeper uses the Watcher mechanism to publish and subscribe data. All service nodes in a distributed system can register listening to a ZNode. After writing the new configuration to the ZNode, all service nodes will receive the event.
5.2 Naming Service
In distributed systems, a globally unique name is usually required, for example, to generate a globally unique order number. Zookeeper can generate a globally unique ID using the sequential node feature to provide naming services for distributed systems.
5.3 Master the election
An important mode of distributed system is Master/Salves mode, and Zookeeper can be used for Matser election in this mode. All service nodes can compete to create the same ZNode. Zookeeper cannot have zNodes with the same path. Therefore, only one service node can be successfully created, and the service node can become the Master node.
5.4 Distributed Lock
Distributed locks can be implemented by temporary nodes and Watcher mechanism of Zookeeper. Here, exclusive locks are used as an example:
All service nodes in a distributed system can compete to create a temporary ZNode. Because Zookeeper cannot have a ZNode with the same path, only one service node can be successfully created. In this case, the node is considered to be locked. Other service nodes that do not acquire locks can register listeners on the ZNode to compete for the locks when they are released. The lock can be released in the following two ways:
- After the service logic is executed, the client deletes the temporary ZNode, and the lock is released.
- When the client that acquired the lock goes down, the temporary ZNode is automatically deleted and the lock is considered to have been released.
When the lock is released, other service nodes are competitively created again, but only one service node can acquire the lock at a time, which is called an exclusive lock.
5.5 Cluster Management
Zookeeper also solves problems in most distributed systems:
- For example, you can create temporary nodes to establish a heartbeat detection mechanism. If a service node of the distributed system goes down, the session it holds times out, the temporary node is deleted, and the corresponding listening event is fired.
- Each service node of the distributed system can also write its own node state to the temporary node to complete the status report or node work progress report.
- Zookeeper can also decouple modules and schedule tasks for distributed systems by subscribing and publishing data.
- Through the monitoring mechanism, the service nodes of the distributed system can be dynamically connected up and down, so as to realize the dynamic expansion of services.
The original link: https://www.cnblogs.com/danrenying/p/11073414.htmlCopy the code