From: juejin. Cn/post / 684490…
Zookeeper
- Zookeeper: one designed for distributed applications
Distributed coordination service
- Zookeeper design objectives
- Zookeeper
The data model
andHierarchical namespace
- Zookeeper
The default node
andTemporary node
- Zookeeper
Conditional updates
andmonitoring
(watches) - Zookeeper
ensure
(Guarantees) - Zookeeper
Simple API
- Zookeeper
Realize the principle of
- Zookeeper
use
- Zookeeper
performance
- Zookeeper
reliability
- Zookeeper project
Zookeeper: A distributed coordination service designed for distributed applications
ZooKeeper is an open source distributed coordination service for distributed applications. It contains a simple set of primitives on which distributed applications can implement synchronization services, configuration maintenance, naming services, and so on. Designed to be easy to program, Zookeeper uses a data model similar to a file system’s directory tree structure and runs in Java with Java and C bindings.
Coordinating services is very difficult to implement correctly. They are particularly prone to errors such as race conditions, deadlocks, and so on. The motivation behind ZooKeeper is to make it easier for distributed applications to implement coordinated services from scratch.
Design goals
Zookeeper is very simple.
ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace organized like a standard file system. The namespace is made up of data records called ZNode, which, in ZooKeeper parlances, are very similar to files and directories in a standard file system. Unlike typical file systems for storage, ZooKeeper data is kept in memory, which means ZooKeeper can achieve high throughput and low latency.
The implementation of Zookeeper focuses on high performance, high availability, and strict sequential access. The performance aspects of ZooKeeper mean that it can be used in large distributed systems. The reliability aspect prevents it from causing a single point of failure. Strict sorting means that complex synchronization primitives can be implemented on the client side.
Zookeeper is replicated
Like the distributed processes it coordinates, Zookeeper itself replicates between groups of hosts called “ensembles.”
Each server that forms the Zookeeper Service must know each other. They maintain an in-memory state map, as well as a persistent transaction log and snapshot. As long as most of these servers are available, the entire ZooKeeper service is available.
The client connects to any ZooKeeper server. The client maintains a TCP connection that sends requests, gets responses, gets monitoring events, and sends heartbeats. If the TCP connection to the server breaks, the client will connect to a different server.
Zokeeper is sequential
ZooKeeper marks each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use this order to achieve higher levels of abstraction, such as synchronization primitives.
ZooKeeper is fast
It is especially fast when dealing with “read” loads. The ZooKeeper application runs on thousands of machines and performs best when reads are more common than writes, by a ratio of about 10:1.
Zookeeper data model and hierarchical namespace
ZooKeeper provides namespaces that are very similar to standard file systems. A path is a series of elements separated by a slash /. Each node in the ZooKeeper namespace is identified by a path.
Default and temporary Zookeeper nodes
Unlike standard file systems, each node in the ZooKeeper namespace can have data associated with it as well as child nodes. This is like having a file or a directory in a file system. ZooKeeper is designed to store relevant coordination data, such as status information, configuration information, location information, and so on, so the data stored on each node is usually very small, in the range of bytes to kilobytes. We use the term ZNode to make it clear that we are talking about ZooKeeper data nodes.
Znode maintains a status (STAT) structure that contains version numbers and timestamps for data changes, access control list (ACL) changes, and can be used for cache validation and coordinated updates. Each time a ZNode’s data changes, the version number increases. For example, whenever a client retrieves data, the client also receives version information for that data.
Data stored on each node in the namespace is read and written atomically. Reading a ZNode gets all of its data, while writing replaces all of its data.
ZooKeeper also has the concept of temporary nodes. The transient node always exists when the client session that created the temporary node remains active. When the session terminates, the instantaneous node is deleted.
Zookeeper condition updates and watches
ZooKeeper supports the concept of watches. The client can set up a watch on ZNode. When zNode changes, the watch is triggered and removed. When watch is triggered, when Watch is triggered, the client receives a packet describing the zNode changes. If the client is disconnected from the Zookeeper server, the client will receive a local notification.
Zookeeper guarantee (Guarantees)
Zookeeper is very fast and very simple. However, since it is intended as a basis for building more complex services such as “synchronization,” it provides a set of guarantees:
- Sequential consistency – Change requests from clients will be applied in the order in which they were sent.
- Atomicity – Changes either succeed or fail. There is no partial success or partial failure.
- Single system image – Clients see the same view of the Zookeeper service regardless of which server they are connected to
- Reliability – Once a change request is applied, the result of the change is persisted until overwritten by the next change.
- Timeliness – The system view seen by the client is always up to date within a certain time frame.
Simple API
One of ZooKeeper’s design goals is to provide a very simple programming interface. Therefore, it only supports these operations:
-
create
Creates a node at a specific address in the tree.
-
delete
Example Delete a node.
-
exists
Check whether the node exists in a path.
-
get data
Get node data.
-
set data
Writes data to a node.
-
get children
Retrieves a list of children of a node.
-
sync
Wait for data propagation to complete.
Implementation principles of Zookeeper
ZooKeeper Components displays the advanced Components of the ZooKeeper service. With the exception of the Request Processor, each server that makes up the ZooKeeper service makes a copy of each component.
replicated database
Is an in-memory database that contains the entire data tree. Data writes are serialized to disk before being applied to an in-memory database.
Each Zookeeper server provides services to a client, which connects to an exact Zookeeper server to submit requests. The read request is retrieved from a local copy of the server database. Requests to change the Zookeeper service status and write requests are processed through a consistency protocol.
As part of the protocol, all write requests from the client are forwarded to a separate server, called the Leader. The rest of the servers, called followers, receive the message proposal from the leader and agree on the delivery of the message. The message layer maintains update replacement and synchronization between the leader and followers when the leader fails.
Zookeeper uses a custom atomic message protocol. Because the message layer is atomic, Zookeeper ensures that local replicas are not inconsistent. When the leader receives a write request, it calculates the state the system is in and when the write request is applied, and converts this into a transaction containing the new state.
use
Zookeeper’s programming interface is intentionally simple. However, higher-order operations such as synchronization primitives, member grouping, ownership, and so on are possible through these programming interfaces.
performance
Zookeeper is designed for high performance. But is it? Research by the Zookeeper team at Yahoo! ‘s r&d center suggests this is the case. (See the following figure: Zookeeper throughput changes with read/write ratio). This is particularly high performance in applications that are “read” rather than “write” because “write” causes state to be synchronized across all servers. (” read “over” write “is a typical scenario for coordinating services.)
Zookeeper throughput changes with read/write ratio
Figure “Zookeeper throughput changes with Read/Write Ratio” is the result of Zookeeper3.2 running on Dual 2Gh Xeon + two 15K RPM SATA disk drives. One drive is used as the log device dedicated to Zookeeper. The snapshot is written to the operating system drive. A write request is a write of 1K data and a read request is a read of 1K data. Servers indicates the size of the Zookeeper Ensemble, that is, the number of Servers that make up the Zookeeper service. About 30 other servers are used as emulated clients. The Zookeeper Ensemble is configured not to allow clients to connect to the Leader.
Note: Read/write performance in version 3.2 is up to 2 times better than before version 3.1.
Benchmarking also demonstrated the reliability of Zookeeper. Figure “Reliability in the Event of an error” shows how Zookeeper responds to various failures. The events marked in the figure are as follows:
-
A Follower fails and then recovers.
-
A different Follower fails and then recovers.
-
Leader failure.
-
Both followers fail and then recover.
-
The other Leader fails.
reliability
Several important results can be drawn from this graph. First, if followers fail and quickly recover, Zookeeper is able to maintain high throughput despite the failure. But perhaps more importantly, the Leader election algorithm allows the system to recover quickly enough to avoid an overall drop in throughput. From the observations, Zookeeper took less than 200 milliseconds to elect a new leader. Third, once the followers recover, Zookeeper’s throughput can again rise to the level it was when it first started processing requests.
About ZooKeeper project
Zookeeper has been used successfully in many industrial applications. At Yahoo, Zookeeper is used as a coordination and failure recovery service for Yahoo messaging middleware, a highly scalable publish-subscribe system that manages thousands of topic replication and data distribution. Zookeeper is also used in Yahoo’s crawler crawl service to manage failure recovery. Many Of Yahoo’s advertising systems also use Zookeeper as a reliable service.