What is a ZooKeeper
ZooKeeper is a top-level project of Apache that provides efficient and highly available distributed coordination services for distributed applications, providing distributed infrastructure services such as data publishing/subscription, load balancing, naming services, distributed coordination/notification, and distributed locks. ZooKeeper is widely used in large distributed systems such as Hadoop, HBase, Kafka, and Dubbo due to its convenient usage, excellent performance, and good stability.
Zookeeper supports three operating modes: single-machine deployment, pseudo-cluster deployment, and cluster deployment.
- Stand-alone mode: This mode is generally suitable for development and test environments, on the one hand we do not have so many machine resources, and on the other hand, the normal development and debugging does not need excellent stability.
- Cluster mode: A ZooKeeper cluster usually consists of a group of machines. Generally, more than three machines can form an available ZooKeeper cluster. Each machine that makes up the ZooKeeper cluster maintains the current server state in memory, and each machine communicates with each other.
- Pseudo cluster mode: This is a special cluster mode where all the servers in the cluster are deployed on one machine. ZooKeeper allows you to start multiple Instances of the ZooKeeper service on a single machine by starting different ports on the same machine. In this case, you can start multiple instances of the ZooKeeper service on the same machine by starting different ports.
ZooKeeper
Roles in Zookeeper
- Leader: responsible for initiating and deciding votes and updating system status
- Followers: receive client requests and return results to the client. They vote in the master election process
- Observer: Can accept client connections and forward write requests to the leader, but the Observer does not participate in the process of voting, only to expand the system and improve read speed.
Data model for Zookeeper
- Hierarchical directory structure, named according to the general file system specification, similar to Linux
- Each node is called a ZNode in ZooKeeper and has a unique path identifier
- A Znode can contain data and child nodes, but EPHEMERAL nodes cannot have child nodes
- Data in a Znode can have multiple versions. For example, data in a path can have multiple versions. In this case, data in this path needs to be queried with the version
- Client applications can set up monitors on nodes
- The node does not support partial read/write, but full read/write at one time
Node features of ZooKeeper
ZooKeeper nodes are lifecycle, depending on the type of node. In ZooKeeper, nodes can be classified as PERSISTENT, EPHEMERAL, SEQUENTIAL, and out-of-order (by default, out-of-order) according to their duration.
Once a persistent node is created, it will remain in Zookeeper (it will not disappear because the session of the client that created the node fails) unless it is actively removed
Application scenarios of Zookeeper
ZooKeeper is a highly available distributed data management and system coordination framework. Based on the implementation of Paxos algorithm, the framework ensures the strong consistency of data in the distributed environment, and it is based on this feature that Makes ZooKeeper solve many distributed problems.
It is worth noting that ZooKeeper is not inherently designed for these application scenarios. Instead, it is a typical usage method developed by many developers based on the characteristics of its framework and a series of API interfaces (or primitives) provided by ZooKeeper.
Data Publishing and subscription (Configuration Center)
Publish and subscribe model, also known as the so-called configuration center, just as the name implies that publishers publish data to ZK nodes for subscribers to dynamically obtain data, to achieve centralized management and dynamic update of configuration information. For example, global configuration information, service address list of service-oriented service framework is very suitable for use.
Some configuration information used in applications is centrally managed on ZK. In this scenario, the application will actively obtain a configuration at startup, and register a Watcher on the node so that every configuration update will be notified to the subscribing client in real time, always for the purpose of obtaining the latest configuration information. In the distributed search service, the meta information of the index and the node state of the server cluster machine are stored in some designated nodes of ZK for each client to subscribe to. Distributed log collection system. The core of the system is to collect logs distributed across different machines. Collector is usually collected assignment according to application unit, so you need on the ZK creates an application name as path node P, all the IP to the machine, and applies the registration in the form of child node to the node P, thus can realize the machine change, can real-time task allocation adjustment notification to the collector. Some information in the system needs to be acquired dynamically, and there will be a manual to modify this information. This is usually done by exposing an interface, such as a JMX interface, to get some runtime information. With the introduction of ZK, you don’t need to implement a solution yourself, you just need to store the information on the designated ZK nodes. Note: In the application scenarios mentioned above, there is a default premise: the data volume is small, but the data is likely to update quickly.
Load balancing
Load balancing refers to soft load balancing. In a distributed environment, to ensure high availability, multiple applications or service providers are deployed to achieve peer services. The consumer needs to choose one of these peer servers to execute the relevant business logic, and the typical one is the producer of message middleware, consumer load balancing.
Naming Service
Naming services are also a common scenario in distributed systems. In distributed system, by using naming service, client application can obtain the address, provider and other information of resource or service according to the specified name. Named entities can be machines in a cluster, addresses of services provided, remote objects, and so on — all of which we can collectively call names. A common one is a list of service addresses in some distributed service frameworks. By calling the node creation API provided by ZK, it is easy to create a globally unique PATH that can be used as a name.
In Dubbo, the open source distributed service framework of Alibaba Group, ZooKeeper is used as its naming service to maintain the global service address list. In the Dubbo implementation, the service provider writes its URL address to the ZK node/Dubbo /${serviceName}/providers directory at startup to publish the service. When the service consumer starts, it subscribes to the provider URL in the /dubbo/${serviceName}/providers directory and writes its OWN URL to the /dubbo/${serviceName}/ consumers directory. Note that all addresses registered with ZK are temporary nodes, ensuring that service providers and consumers are automatically aware of resource changes.
In addition, Dubbo monitors service granularity by subscribing to all provider and consumer information in the/Dubbo /${serviceName} directory.
Distributed notification/coordination
ZooKeeper has a unique watcher registration and asynchronous notification mechanism, which can well realize notification and coordination between different systems in a distributed environment, and realize real-time processing of data changes. The use method is usually that different systems register the same ZNode on ZK, monitor the changes of ZNode (including the content of zNode itself and its children), one of the systems update ZNode, then the other system can receive notification, and make corresponding processing.
Another heartbeat detection mechanism: The detection system is not directly associated with the detected system, but is associated with a node on the ZK, greatly reducing the system coupling. Another system scheduling mode: a system consists of a console and a push system. The responsibility of the console is to control the push system to carry out corresponding push work. Some of the actions that administrators make on the console actually change the state of some nodes on ZK, and ZK notifies them of the changes to the client that signed up for Watcher, the push system, and then makes push tasks accordingly.
Another work report mode: some are similar to task distribution system. After the sub-task is started, register a temporary node to ZK and report its progress regularly (write the progress back to the temporary node), so that the task manager can know the progress of the task in real time.
A distributed lock
Distributed lock, this mainly benefits from ZooKeeper for us to ensure strong data consistency. Locking services can be divided into two categories, one for holding exclusivity and the other for controlling timing.
The so-called hold exclusive is that all the clients that attempt to acquire the lock, only one can succeed in obtaining the lock. A common approach is to treat a ZNode on a ZK as a lock and implement this by creating a zNode. All clients create the /distribute_lock node, and the client that is successfully created owns the lock. The control timing, that is, the client that all views acquire the lock, is eventually scheduled to be executed, but there is a global timing. This is similar, except that the /distribute_lock already exists, under which the client creates a temporary ordered node (this can be specified by the node’s property control: createmode.ephemeral_sequential). The parent node of Zk (/distribute_lock) maintains a sequence, which ensures that the child nodes are created sequentially, and thus the global sequence of each client.
- The sub-node names of the same node cannot be the same. If a ZNode is created on a node, the zNode is locked successfully. A listener is registered to listen on the ZNode, and whenever the ZNode is deleted, other clients are notified to lock it.
- Create temporary sequential nodes: a node is created under a node, and a node is created for each request. Since it is sequential, the one with the smallest sequence number obtains the lock, and when the lock is released, the next sequence number is notified of the lock.
Distributed queue
In terms of queues, there are simply two kinds, one is the conventional first-in, first-out queue, and the other is to wait for the queue members to gather up before executing in order. For the first queue, the basic principle is the same as for the control sequence scenario in the distributed lock service described above, which will not be described here.
The second queue is actually an enhancement of the FIFO queue. It is usually possible to create a /queue/num node under the znode /queue and assign n (or n directly to /queue) to indicate the size of the queue, and then determine whether the queue size has been reached each time a member joins the queue and decide whether to start executing. A typical scenario for this usage is A distributed environment where A large Task, Task A, needs to be completed (or conditioned) by many sub-tasks. At this point, when one of the sub-tasks is complete, it creates its own temporary sequential node under /taskList. When /taskList finds that its sub-nodes satisfy the specified number of sub-nodes, You can proceed to the next step in order to process.
Build the cluster using dokcer-compose
Above we have introduced so many application scenarios about ZooKeeper, so next we will first learn how to build a ZooKeeper cluster and then practice the above application scenarios.
The directory structure of the file is as follows:
1├ ─ ─docker-compose.yml
Copy the code
Write the docker-comemess. yml file
Docker-comemess. yml file contains the following contents:
1version: '3.4'
2
3services:
4 zoo1:
5 image: zookeeper
6 restart: always
7 hostname: zoo1
8 ports:
9 - 2181:2181
10 environment:
11 ZOO_MY_ID: 1
12 ZOO_SERVERS: server1.=0.0. 0. 0:2888:3888;2181 server2.=zoo2:2888:3888;2181 server3.=zoo3:2888:3888;2181
13
14 zoo2:
15 image: zookeeper
16 restart: always
17 hostname: zoo2
18 ports:
19 - 2182:2181
20 environment:
21 ZOO_MY_ID: 2
22 ZOO_SERVERS: server1.=zoo1:2888:3888;2181 server2.=0.0. 0. 0:2888:3888;2181 server3.=zoo3:2888:3888;2181
23
24 zoo3:
25 image: zookeeper
26 restart: always
27 hostname: zoo3
28 ports:
29 - 2183:2181
30 environment:
31 ZOO_MY_ID: 3
32 ZOO_SERVERS: server1.=zoo1:2888:3888;2181 server2.=zoo2:2888:3888;2181 server3.=0.0. 0. 0:2888:3888;2181
Copy the code
In this configuration file, Docker runs three ZooKeeper images, binding local ports 2181, 2182, and 2183 respectively to port 2181 of the corresponding container through the ports field.
ZOO_MY_ID and ZOO_SERVERS are two environment variables required to set up a Zookeeper cluster. ZOO_MY_ID Specifies the service ID. The value is an integer ranging from 1 to 255 and must be unique in the cluster. ZOO_SERVERS is a list of hosts in the cluster.
Docker-compose up: docker-compose up: docker-compose up: docker-compose up: docker-compose.
Connect the ZooKeeper
After starting up the cluster, we can connect to ZooKeeper for node related operations.
- First we need to download ZooKeeper. ZooKeeper download address.
- Will it
- Access to its
conf
Directory, willzoo_sample .cfg
tozoo.cfg
Configuration File Description
1# The number of milliseconds of each tick
2# tickTime: indicates the number of heartbeat communication between CS
3# The interval between Zookeeper servers or between clients and servers to maintain a heartbeat, i.e. one heartbeat per tickTime. TickTime is measured in milliseconds.
4tickTime=2000
5
6# The number of ticks that the initial
7# synchronization phase can take
8# initLimit: indicates the initial communication duration of LF
9# Maximum number of heartbeats (ticktimes) that can be tolerated during the initial connection between the follower server (F) and the leader server (L) in the cluster.
10initLimit=5
11
12# The number of ticks that can pass between
13# sending a request and getting an acknowledgement
14# syncLimit: LF synchronization communication time limit
15# Maximum number of heartbeats (ticktimes) that can be tolerated between the follower server and the leader server in the cluster.
16syncLimit=2
17
18# the directory where the snapshot is stored.
19# do not use /tmp for storage, /tmp here is just
20# example sakes.
21# dataDir: data file directory
22By default, Zookeeper also stores log files that write data in this directory.
23dataDir=/data/soft/zookeeper-3.4.12/data
24
25
26# dataLogDir: log file directory
27# Zookeeper directory for saving log files.
28dataLogDir=/data/soft/zookeeper-3.4.12/logs
29
30# the port at which the clients will connect
31# clientPort: Client connection port
32Zookeeper listens to this port and accepts client access requests.
33clientPort=2181
34
35# the maximum number of client connections.
36# increase this if you need to handle more clients
37#maxClientCnxns=60
38#
39# Be sure to read the maintenance section of the
40# administrator guide before turning on autopurge.
41#
42# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
43#
44# The number of snapshots to retain in dataDir
45#autopurge.snapRetainCount=3
46# Purge task interval in hours
47# Set to "0" to disable auto purge feature
48#autopurge.purgeInterval=1
49
50
51Server name and address: cluster information (server number, server address, LF communication port, election port)
52# this configuration item is written in a special format, as follows:
53
54# server.N=YYY:A:B
55
56N indicates the server number, YYY indicates the IP address of the server, and A indicates the LF communication port, which indicates the port through which the server exchanges information with the leader in the cluster. B is the election port, indicating the port used by the servers to communicate with each other when the new leader is elected. (When the leader fails, other servers communicate with each other to select a new leader.) In general, port A is the same for each server in the cluster, as is port B for each server. However, in A pseudo cluster, the IP addresses of ports A and B are the same.
Copy the code
The default zoo. CFG configuration is ok. Then run the./ zkcli. sh -server 127.0.0.1:2181 command in the bin directory after decompression to connect to the zoo. CFG file.
1Welcome to ZooKeeper!
22020-06-01 15:03:52.512 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1025] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
3JLine support is enabled
42020-06-01 15:03:52.576 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@879] - Socket connection established to localhost/127.0.0.1:2181, initiating session
52020-06-01 15:03:52.599 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100001140080000, negotiated timeout = 30000
6
7WATCHER::
8
9WatchedEvent state:SyncConnected type:None path:null
10[zk: 127.0.0.1:2181(CONNECTED) 0]
Copy the code
Next we can use the command to view the node
-
Run the ls command to view the contents in ZooKeeper
Command: ls /
1[Zk: 127.0.0.1:2181 (CONNECTED) 10] ls /
2[zookeeper]
Copy the code -
Creates a new ZNode node “zk” and its associated string
Command: create /zk myData
1[Zk: 127.0.0.1:2181 (CONNECTED) 11] create /zk myData
2Created /zk
3[Zk: 127.0.0.1:2181 (CONNECTED)] ls /
4[zk, zookeeper]
5[Zk: 127.0.0.1:2181 (CONNECTED) 13]
Copy the code -
Obtain the ZK of the Znode
Run the get /zk command
1[zk: 127.0.0.1:2181(CONNECTED) 13] get /zk
2myData
3cZxid = 0x400000008
4ctime = Mon Jun 01 15:07:50 CST 2020
5mZxid = 0x400000008
6mtime = Mon Jun 01 15:07:50 CST 2020
7pZxid = 0x400000008
8cversion = 0
9dataVersion = 0
10aclVersion = 0
11ephemeralOwner = 0x0
12dataLength = 6
13numChildren = 0
Copy the code -
Example Delete node ZK
Run the delete /zk command
1[Zk: 127.0.0.1:2181 (CONNECTED) 14] delete /zk
2[Zk: 127.0.0.1:2181 (CONNECTED) 15] ls /
3[zookeeper]
Copy the code
Due to limited space, the next article will implement ZooKeeper application scenarios one by one in code.
ZooKeeper Docker configuration file
ZooKeeper Docker configuration file
ZooKeeper Docker configuration file
You can pull the project directly from the top, and it only takes two steps to launch RocketMQ
- Pull projects from GitHub
- Run this command in the ZooKeeper folder
docker-compose up
The command
Refer to the article
- Jm.taobao.org/2011/10/08/…
- ZZCKM. Making. IO / 2019/04/25 /…
- www.cnblogs.com/cyfonly/p/5…
- Linfuyan.com/docker-zook…
- Maizitoday. Making. IO/post/zookee…