Distributed coordination service
Zookeeper Application Scenarios
Suitable for scenarios where you read more and write less
-
Unified Naming Service
-
Unified Configuration Management
-
Distributed Cluster Management (Registry)
-
A distributed lock
-
Load balancing
Ii. Internal structure of Zookeeper
A ZooKeeper node is similar to a Unix file system
Each subdirectory entry (path) is called a Znode. Like a file system, we can add and delete zNodes freely, and add and delete subZNodes under a ZNode. The only difference is that zNodes can store data.
Each ZNode in the ZooKeeper data model maintains a STAT structure that provides metadata consisting of version numbers, operation control lists (ACLs), timestamps, and data lengths.
-
Version Number The version number increases whenever the data associated with zNode changes
-
Operation Control List (ACL) An ACL is basically an authentication mechanism for accessing a ZNode. It manages all ZNode read and write operations
-
Time stamp The time elapsed after a ZNode is created and modified. ZooKeeper identifies each zNode change from the transaction ID (ZXID)
-
Data length A maximum of 1 MB data is stored
The node type
- Persistent order
- Persistent disorder
- Temporary ordered node
- Temporary unordered node
Zookeeper provides primitive services
- Create /path data: Create a znode named /path with data (excluding failed creation)
- Delete /path: Deletes the Znode named /path
- Exists /path: Checks whether a node named /path exists
- SetData /path data: Sets the data of znode named /path to data
- GetData /path: Returns the data of the node named /path
- GetChildren /path: Returns the list of children of the /path node
Zookeeper distributed lock
- The client client_1 attempts to acquire the lock by calling the create() method to create a temporary sequential node node_1 under the Locker node
- The client calls getChildren(“/locker”) to get all the byte points under the Locker node and registers the Watcher for child node change notifications on that node
- If you find that the node you created in Step 1 has the lowest ordinal number of all nodes, then the client is considered to have acquired the lock
- If you find that you are not the smallest child node in Step 3, it indicates that you have not obtained the lock. Therefore, you wait until the next child node changes and then obtain the child node to determine whether to obtain the lock
- To release the lock, you can delete the child node that you created
The above process has a problem: first, in a distributed lock, there will be a lot of the client to try to acquiring a lock, in judging their node is not the result of the minimal and the smallest node is deleted, the client will be informed by a large number of invalid – herding effect our focus is: each node only pay close attention to the condition that is smaller than his serial number of the nodes.
Improved distributed locks:
-
The client client_1 attempts to acquire the lock by calling the create() method to create a temporary sequential node node_1 under the Locker node
-
The client calls getChildren(“/locker”) to get all the byte points under the Locker node, where Watcher is registered with the node before it;
-
If you find that the node you created in Step 1 has the lowest ordinal number of all nodes, then the client is considered to have acquired the lock
-
If you find that you are not the smallest child node in Step 3, it indicates that you have not obtained the lock. Therefore, you wait until the next child node changes and then obtain the child node to determine whether to obtain the lock
-
Client_1 gets the lock, Client_2 listens on node1, and Client_3 listens on Node2, forming a wait queue, somewhat like AQS
-
To release the lock is to delete the child node created by oneself, and only the neighboring nodes will receive notification to determine whether they are the smallest and obtain the lock.
Zookeeper is easy to implement as a distributed lock, but the efficiency of adding and deleting nodes is low.
5. Zookeeper Distributed cluster
1. Roles of Zookeeper:
- The Leader is responsible for initiating and deciding votes and updating system status
- They are learners, including followers and observers. Followers are used to receive client requests and return results to the client. They participate in voting during the master selection process. The Observer can accept client connections and forward write requests to the Leader. Improve read speed
2. Check the running status of the Zookeeper Server
Looking: election status Following: the Leader is elected, and the current Server is synchronized with it. Leading: status of the primary node
3. Working principles of Zookeeper
Zookeeper core: Atomic broadcast, using the Zab protocol to implement two modes of Zab: message broadcast and crash recovery
Message broadcast:
-
Make a write request to Follwer in the Client
-
Follwer forwards requests to the Leader
-
The Leader receives the vote and notifies Follwer to vote. Note: The Observer does not participate
-
Follwer sends the results of the vote to the Leader
-
The Leader aggregates the results, broadcasts to inform all followers to write data,
-
The Follower sends an ACK message to the Leader after the write operation is complete. The Leader receives more than half of the ACK messages successfully (This may cause data inconsistency when Zookeeper reads data. More than half of the data is new and less than half is old. To read data, call sync(), commit, and broadcast; Note: For ZooKeeper, it implements availability in A, fault tolerance in P partitions, strong write consistency in C, and loss of read consistency in C
-
Follwer returns the request result to the Client
Zookeeper uses increasing ZXID to identify each proposal in order to maintain the sequence of transactions. The ZXID is a 64-bit number, and the higher 32 bits epoch indicates the Leader state, which is the Leader ruling state. Each time a new Leader is elected, a new epoch will be generated.
Zookeeper main selection process:
Zookeeper performs the main selection process for the first time:
The Zookeeper cluster has five Server voting processes: Server1 is connected to Zookeeper and votes for himself to become the Leader. Server2 is connected to Zookeeper and votes for himself and server1. The zxID of Server2 is new. At this time, more than half of the servers vote for Server3,Server3 is elected as the Leader, Server4 and server5 are connected, even if the zxID is the latest, they will not be elected as the Leader.
The Zookeeper election algorithm has two modes:
- Basic Paxos
- Fast Paxos The Fast Paxos is used by default.
Election process:
-
The election thread is the thread that initiates the election by the current Server. Its main function is to count the voting results and select the recommended Server
-
During the election phase, all nodes in the cluster are in the Looking state, and they will vote to other nodes. The voting content includes (server ID, zxID).
-
The node will receive the votes of other nodes at the same time. It will compare its zxID with other ZXids and select the latest ZXID to vote again.
-
The thread that initiated the election counted the votes, judged the more than half of the nodes that got the vote, and set it as Leader, whose state changed to Leading, and the state of other nodes changed to Following
-
At this time, if the previous Leader does not die, but the communication is delayed due to network problems, and the communication is resumed now, the two leaders will be split in the brain. So enter the discovery phase: The Leader will collect all the ZXids of the Follwer. The Leader will select the maximum Zxid, add 1 to the epoch based on the Zxid, and broadcast the generated Zxid to all followers. After each Follower receives a new Zxid, it will send an ACK to the Leader. With the largest zxID and the history transaction log, the Leader selects the largest ZXID and updates its own history log. In this case, brain splitting can be avoided
-
The Leader synchronizes the latest transaction logs to all the followers, and when half of the followers are successfully synchronized, the Leader begins his reign