Zookeeper

Distributed information sharing center

Based on CP model in CAP theorem

I. Application scenarios

  • Master the election

    • The temporary node becomes Master after being created. The temporary node is monitored after being created

  • A distributed lock

    • Exclusive lock: If the temporary node is successfully created, the lock is obtained. If the temporary node is not successfully created, monitor the temporary node

    • A Shared lock:

      • Use temporary order nodes

      • If it is a read request and all nodes with a smaller sequence number are read requests, it indicates that the shared lock is obtained

      • If the request is a write request, if the serial number of the request is not the smallest, wait

        To avoid herding, each node should focus only on the node with its own number 1 smaller

  • Publish/subscribe common configuration

    • Machine list

    • Switch configuration

    • Database Configuration

      Small amount of data, dynamic change of content, consistent configurations on all machines

  • Cluster management

    • Statistics the number of machines in a cluster

    • Get the machine on and off line

    • Monitor machine status

  • The naming service

    • Get unique ID

Second, the role of

  • Leader
    • Provides read/write services
    • A unique scheduler and handler of transaction requests to ensure the sequence of transaction processing
  • Follower
    • Provide read service
    • Forward transaction requests to the Leader
    • Participate in the Proposal vote
    • Participate in Leader election
  • Observer
    • Provide read service
    • Do not participate in the Proposal and Leader election

Session

  • A TCP long connection between a client and a server that maintains a valid session using heartbeat detection

  • The client can send requests and receive responses to the server

  • The client can receive notifications of Watch events from the server

Data node Znode

There are two meanings: 1. machine node 2. data unit

  • The smallest data unit in Zookeeper. A Znode can be attached to a Znode to form a Znode Tree. – In the Znode Tree, the/split path is used

  • Responsible for saving their own data content, attribute information

  • classification

    • Persistent node
    • Ephemeral is the temporary node
    • Sequential nodes Sequential
  • The specific type

    • Persistent node
      • Exists until manually deleted
    • Persistent order node
      • Nodes are followed by numeric suffixes to indicate order
    • Temporary node
      • Nodes that will be automatically deleted
      • Bound to a client session. When the session ends, the node is deleted
      • Cannot create child nodes
    • Temporary sequence node
      • A temporary node with an order, followed by a numeric suffix, to indicate the order

  • Attribute information

    • CZxid: transaction ID when the node is created

    • Ctime: time when a node is created

    • MZxid: indicates the transaction ID of the node that was last modified

    • Mtime: indicates the time when the controller was last modified

    • PZxid: transaction ID of the child node list that was last changed

    • Cversion: indicates the version number of a child node

    • DataVersion: Content version number

    • AclVersion: indicates the ACL version number

    • EphemeralOwner: The session ID of the temporary node – 0 if it is a persistent node

    • DataLength: indicates the dataLength

    • NumChildren: indicates the number of direct child nodes

      Grandson nodes don’t count

  • version

    • Version: indicates the current Znode version

    • Cversion: indicates the version of a Znode child node

    • Uncomfortable: The ACL version of the current Znode

5. Event listener Watcher

○ Push-pull mode

○ The server sends the Watcher event notification to the client. After receiving the notification, the client proactively obtains the latest data from the server

  • Used to implement the publish/subscribe function

  • Clients can register Watcher with a specified Znode, and when a particular event is triggered (when the Znode content or its child node list changes), the server notifies interested clients of the event

  • The registration process

    1. The client sends a registration request to Zookeeper
    2. The client stores the Watcher object in its own WatcherManager
  • The notification process

    1. Zookeeper triggers the Watcher event and sends a notification to the client

    2. The client thread pulls the corresponding Watcher object from the WatcherManager to run the callback logic

6. Permission control policy ACL

Access Control Lists

1. Permission Scheme

  • IP
    • Permissions are controlled by IP address granularity
    • Authorization Object: IP address or IP address segment
  • Digest
    • Configure the permission tag in the format of “username:password”
    • Authorization object: user-defined, usually username:BASE64(SHA-1(username:password)). For example, zm: sdfndsllndlksfn7c =
  • World
    • The most open mode of permission control has little effect
    • To whom: Only one, Anyone
  • Super
    • Super user, can operate on any Znode
    • Authorized object: super user

2. The access Permission

  • CREATE: Permission to CREATE child nodes
  • READ: Permission to obtain node data and child node list
  • WRITE: permission to update node data
  • DELETE: deletes the permission of the child node
  • ADMIN: Sets the ACL permission of a node

ZAB Agreement

○ ZAB has only one Proposer compared to Paxos, so there is no phase in which a Proposer has more than one proposal, so it does not vote on the proposals and only needs to determine whether the Follower is alive and can run a transaction

○ Followers send a Commit message to run the transaction, which is similar to a two-phase Commit of 2PC

1. The transaction

  • Operations that can change the Zookeeper server status: Create, delete, and update a Znode
  • For each transaction request, a transaction ID(ZXID) is assigned to represent a -64 digit number that identifies the global order of the requests

2. Message broadcast process

  1. The Leader receives the client request, generates a Proposal for the transaction, and sends it to followers
  2. The Follower receives the Proposal, writes it to the hard disk as a transaction log, and sends an ACK to the Leader
  3. After receiving more than half of the ACK responses, the Leader broadcasts a Commit message to all followers
  4. The Follower receives the Commit message and commits the transaction

Crash recovery

1. The Leader election

Ensure that the new Leader elected has the maximum ZXID

  1. Transactions that have been committed on the Leader server are eventually committed by all servers

  2. Discard transactions that are committed (not committed) only on the Leader server

2. Data synchronization process

  1. The Leader prepares a queue for each Follower

  2. The Leader sends a Proposal message to the followers if the transaction is not synchronized by the followers

  3. The Leader sends a Commit message immediately after each Proposal message

  4. After the followers have synchronized all unsynchronized transactions from the Leader, the Leader adds the followers to the list of available followers