Apache ZooKeeper is dedicated to developing and maintaining open source servers for highly reliable distributed coordination.

Distributed problem: 1. Distributed cooperative algorithm is very complex and difficult to implement; 2. Resource competition or deadlock is more likely to occur in distributed systems. 3. Implementation of distributed collaboration by applications will lead to difficulties in deployment;Copy the code

1. Introduction

1.1 an overview of the

  • Zookeeper is a high-performance, distributed, open source collaboration service.
  • It provides a series of simple functions on which distributed applications can implement such as data publication/subscription, load balancing, naming service, distributed coordination/notification, cluster management, Leader election, distributed lock and distributed queue, etc.

1.2 origin

  • Zookeeper was originally developed by a research group at Yahoo Research. At that time, researchers found that many large systems in Yahoo basically rely on a similar system for distributed coordination, but these systems often have distributed single point problems. So yahoo’s developers tried to develop a universal distributed coordination framework with no single point of problem to allow developers to focus on business logic.

  • There’s actually an interesting story about the name of the ZooKeeper project. In the early days of the project, yahoo engineers wanted to name the project after an animal, considering that many internal projects had been named after animals (the famous Pig project, for example). Raghura Marishnan, the institute’s chief scientist at the time, joked: “At this rate, we’ll be a zoo!” Yet a, you have said is called Zookeeper 111 for each named after animals distributed components together, yahoo’s whole distributed system looks like a big zoo, and they are just used for the distributed environment of coordination so they name it was born.

1.3 features

  • High availability: It can avoid single point of failure and work when n-1 nodes fail at most.
  • Sequential consistency: Updates from the client are written to ZooKeeper in the order they are sent.
  • Atomicity: Update operations either succeed or fail, with no intermediate state;
  • Single system image: No matter which ZooKeeper server the client is connected to, the server data view displayed by the client is not the old one.
  • Real-time: ZooKeeper ensures that the client will receive updates or server failures within an interval. However, due to network delay, ZooKeeper cannot guarantee that the two clients can obtain the newly updated data at the same time. If the two clients need the latest data, it is better to call sync() interface before reading the data.

1.4 Overall Architecture

The Zookeeper service can be deployed on a single node or in a cluster (2 +1 services allow N failures).

  • Each Server stores a piece of data in memory and maintains the current service status. Services communicate with each other.
  • When Zookeeper starts, it elects a Leader (Paxos protocol) from the instance.
  • The Leader provides the write and read services, and the Follower provides the read services.
  • Zookeeper Atomic Broadcast (Zab) ensures data consistency between clusters.
  • An update succeeds if and only if most servers successfully modify data in memory;

1.5 the role

Zookeeper provides three roles: Leader, Follower, and Observer. The Leader provides both write and read services. In addition to the Leader, followers and Observers can only provide read services.

1.6 Data Model

Zookeeper uses a multi-level directory structure, similar to the data structure of a file system. The Zookeeper structure consists of nodes, and data storage is also based on nodes. This node is called ZNode. ZNode data is kept in memory, which means Zookeeper can achieve high throughput and low latency.

  • Data: Data information stored in a ZNode.
  • ACL: Records the access permissions of a ZNode, that is, who and WHICH IP addresses can access the node.
  • Stat: Contains various metadata for ZNode, such as transaction ID, version number, timestamp, size, and so on.
  • Child: The Child reference of the current node, similar to the left Child and right Child of a binary tree.

2. Install

2.1 to prepare

  • In cluster deployment, prepare three servers with IP addresses 192.168.100.101, 192.168.100.102, and 192.168.100.103

  • Mapping host names to IP addresses prevents IP address changes and requires ZooKeeper to restart.

2.2 download

Access to the Apache Zookeeper’s official website to download address: zookeeper.apache.org/releases.ht…

2.3 Modifying The Configuration

  • Gz file and copy zoo_sample. CFG to zoo.cfg in /conf
  • The configuration description (with notes) is as follows:

2.4 Creating a Data Directory

  • Create a data directory for ZooKeeper

  • In the dataDir directory, create a file named myID and write the machine’s numeric value, which Is used by ZooKeeper to identify the cluster machine

2.5 Starting the Service

Note: Repeat steps 2.1 to 2.5 to install Zookeeper on the other two machines.

2.6 Viewing service Status

Command to check the Zookeeper service status and roles

Note: 1. Zookeeper starts in zoo-1 -> zoo-2 -> zoo-3. 2. When the zoo-1 server is started, an exception occurs in the logs. Because other services in the cluster are not started, this exception can be ignored.Copy the code

Use 3.

3.1 the script

3.2 command

3.3 Four-word Commands

3.4 a ZooKeeper Client Library

Provides a rich and intuitive API for programs to call, the following are some commonly used apis:

  • Create (path, data, flags): creates a ZNode, where path is the path and data is the data to be stored on the ZNode. PERSISTEN, PERSISTENT_SEQUENTAIL, EPHEMERAL, EPHEMERAL_SEQUENTAIL
  • Delete (path, version): Deletes a ZNode. You can use version to delete the specified version. If version is -1, all versions are deleted
  • Exists (path, watch): checks whether the specified ZNode exists and sets whether to watch the ZNode. If you want to set Watcher, Watcher is specified when the ZooKeeper instance is created. If you want to set a specific Watcher, you can call an overridden version of exists(path, Watcher). The following apis with the Watch parameter are similar
  • GetData (path, watch): Reads the data on the specified ZNode and sets whether to watch the ZNode
  • SetData (path, watch): Updates the data of the specified ZNode and sets whether to watch the ZNode
  • GetChildren (path, watch): getChildren(path, watch): getChildren(path, watch): getChildren(path, watch): getChildren(path, watch): getChildren(path, watch): getChildren(path, watch): getChildren(path, watch
  • Sync (path): Synchronizes all update operations before sync to ensure that each request takes effect on more than half of the ZooKeeper servers. The path parameter is not currently available
  • SetAcl (path, acl): Sets the ACL information of a specified ZNode
  • GetAcl (path) Obtains the Acl information of the specified ZNode

Principle 4.

4.1 the Session Session

  • Session refers to the Session between the ZooKeeper server and the client.

  • In ZooKeeper, a client connection is a TCP long connection between a client and a server.

  • Through this connection, the client can maintain a valid session with the server through heartbeat detection, send requests to and receive responses from the Zookeeper server, and receive Watch event notifications from the server through this connection. The Session sessionTimeout value is used to set the timeout period of a client Session. If the client is disconnected due to server pressure, network failure, or client disconnection, the session created before can be reconnected to any server in the cluster within the time specified by sessionTimeout.

  • Before creating a session for a client, the server first assigns a sessionID to each client. The sessionID is an important identifier of a Zookeeper session, and many session-related operating mechanisms are based on the sessionID. Therefore, ensure that the sessionID assigned to the client by any server is globally unique.

4.2 the ZNode node

  • Zookeeper stores all data in memory. The data model is a ZNode Tree. The path divided by a slash (/) is a ZNode, for example, /foo/path1. Each ZNode stores its own data content, along with a set of attributes.

  • Znodes can be divided into persistent nodes and temporary nodes. A persistent node means that once a ZNode is created, it will remain on Zookeeper until the ZNode is removed. Temporary nodes, on the other hand, have a life cycle that is tied to a client session. Once the client session fails, all temporary nodes created by the client are removed.

  • ZooKeeper also allows users to add a special property for each node: SEQUENTIAL. Once a node is marked with this property, when the node is created, Zookeeper automatically appends its node name to an integer that is an increment maintained by the parent node.

  • The ZNode size is limited to 1 MB. Configuration information and metadata information are usually stored. Large data is not recommended.

ZNode node information:

  • CZxid: This is the transaction ID that caused the change to create zNode.
  • MZxid: This is the transaction ID of the last znode change.
  • PZxid: This is the transaction ID of the ZNode change used to add or remove child nodes.
  • Ctime: indicates the time when a ZNode is created in milliseconds from 1970-01-01t00:00:00 Z.
  • Mtime: indicates the time of the last zNode modification in milliseconds starting from 1970-01-01t00:00:00 Z.
  • DataVersion: Indicates the number of changes made to the zNode data.
  • Cversion: This represents the number of changes made to the children of this ZNode.
  • AclVersion: indicates the number of times that the ACL of a ZNode is changed.
  • EphemeralOwner: If the ZNode is an Ephemeral node, this is the session ID of the ZNode owner. If zNode is not the Ephemeral node, this field is set to zero.
  • DataLength: This is the length of the ZNode data field.
  • NumChildren: This represents the number of zNode children.

4.3 Watcher listener

Watcher (event listener), Zookeeper allows users to register some Watcher on a specified node. When calling the Create, delete, and setData methods, the listener registered on ZNode will be triggered to send an asynchronous notification to the client.

4.4 ACL Permission Control

Zookeeper uses AccessControlLists (acls) to control permissions, similar to the permission control of UNIX file systems. Zookeeper defines the following five permissions.

  • CREATE (c) creates child nodes
  • DELETE (d) can DELETE child nodes (only lower-level nodes)
  • READ (r) reads node data and displays a list of child nodes
  • WRITE (w) You can set node data
  • ADMIN (a) can set the permission of the node access control list

Permission mode:

  • World: indicates the default node permission. Anyone is the ACL ID, representing all people.
  • Auth: indicates user password authentication. Run addauth to add user information. If multiple users are added, they have the same permission.
  • Username :password Is used to set the ACL. Password requires ciphertext base64(sha1(password)). When setting the password, you do not need to add user information by addauth.
  • IP: use the host IP address of the client as the ACL ID.

Set permissions in the [Scheme: ID :permissions] format as follows:

setAcl /test/child world:anyone:crwa
Copy the code

4.5 ZAB atom broadcast

ZAB (ZooKeeper Atomic Broadcast Atomic Broadcast) effectively solves the problems of ZooKeeper cluster crash recovery and primary/secondary data synchronization.

The ZAB protocol defines three node states:

  • Looking: Election status.
  • Following: Follower Indicates the status of the slave node.
  • Leading: Status of the Leader node (primary node).

Maximum ZXID Concept: The maximum ZXID is the latest transaction id of a node.

4.5.1 Recovery from collapse

If the current primary Zookeeper node fails, the cluster will perform crash recovery. ZAB’s crash recovery is divided into three phases:

  1. Leader Election

  1. Discovery stage

The discovery phase is used to discover the latest ZXID and transaction logs from the node. One might ask: since the Leader is chosen as the primary node and already has the latest data in the cluster, why do we need to look for the latest transactions from the node?

This is to prevent some unexpected situations, such as the situation where multiple leaders are generated in the previous stage due to network reasons, and the split brain scenario.

Therefore, at this stage, the Leader gathers ideas and receives the latest epoch values from all the followers. The Leader selects the largest epoch and then adds 1 to generate a new epoch and distributes it to each Follower.

Each Follower receives the new epoch and returns an ACK to the Leader, along with its maximum ZXID and historical transaction log. The Leader selects the maximum ZXID and updates its history log.

3.Synchronization

Synchronize the latest historical transaction logs collected by the Leader to all followers in the cluster. Only if half of the followers have synchronized successfully can the quasi Leader become the official Leader. Since then, fault recovery is officially completed.

4.5.2 Broadcast process

When Zookeeper updates data, the Leader broadcasts it to all followers. The process is as follows:

  1. The client issues a write data request to any Follower.
  2. The Follower forwards write requests to the Leader.
  3. The Leader adopts the two-stage submission mode and sends the Propose broadcast to the followers first.
  4. The Follower receives a Propose message, writes the LOG successfully, and returns an ACK message to the Leader.
  5. The Leader receives more than half of the ACK messages, returns a success message to the client, and broadcasts a Commit request to the Follower.

5. Application scenarios

5.1 Name Service

  • Distributed applications usually require a complete set of naming rules, which can not only generate unique names but also facilitate people to identify and remember. In general, tree name structure is an ideal choice. Tree name structure is a hierarchical directory structure, which is friendly to people and does not repeat.
  • Name Service is a built-in function of Zookeeper. You can implement it by invoking the Zookeeper API. You can easily create a directory node by calling the CREATE interface.

5.2 Configuration Center

  • Create a node to store configuration information
  • The client sets the watcher to listen on the node
  • The client is notified of configuration changes
  • The client updated the configuration information. Procedure

5.3 Cluster Management

  • All the machines in the cluster go to the /GroupMembers directory to create temporary nodes with self-numbered numbers and listen to the /GroupMembers directory.
  • The one with the smallest number becomes the Leader.
  • When a machine in the cluster goes down, the node is removed from /GroupMembers, the other active nodes receive listening messages, and the node with the smallest number becomes the Leader.

5.4 Distributed Lock

The Locks /Locks already exist, and all clients create temporary sequentially numbered directory nodes under them, just as they do with master.