Win high availability and distributed coordination system design at a stroke

This article has been sorted out and sent to my Github address, welcome everyone to star support

preface

inAbove,In, we learned that Canal can provide incremental data subscription and consumption by subscribing to binlog logs. In this way, real-time database backup and real-time index construction can be realized

Let’s look at how it works in detail

As shown in the figure, each server will start multiple instances, and each instance will subscribe to the binlog of different tables. The instance is mainly responsible for parsing the binlog into the program readable structured data (including the primary key of the change record, the value of the changed field, etc.). The Canal Client receives structured data and then synchronizes it to other DB, MQ, etc. Only one Instance of the binlog of a certain table can be subscribed to at the same time, in order to ensure the sequence of binlog processing

Now the question is what to do if Canal Server fails, in other words, how to achieve High Availability (HA) for the Server.

The easiest thing to think of, of course, is to prepare a few backup Canal servers (the secondary machine for short), so that when the primary Canal Server (the working Server for short) goes down, the secondary machine can take over. So there are two questions here

If there are three machines (one serving as the host and two as the standby), which Canal Server is the host after startup?
If a host fails, how can the standby host discover and take over the work of the host in a timely manner

Obviously, these two problems involve multi-process communication, so it is better to introduce an intermediate layer to deal with them. We need to design a distributed coordination system to fulfill these two requirements. Let’s give this system a name, let’s call it Zookeeper, or ZK for short.

Now let’s look at how ZK needs to be designed to meet these two requirements

Design a distributed lock to solve the problem of primary and secondary roles

Let’s start with problem number one

If three or more machines are started, which Canal Server is the host?

Obviously, distributed lock can meet this requirement. Three machines take the initiative to obtain the distributed lock after startup. The successful one acts as the host, and the unsuccessful one acts as the standby one.

Therefore, ZK must have the function of distributed lock. Some people may say that I can introduce Redis and MySQL, because both of them have the function of distributed lock. However, this is equivalent to introducing additional components into a system, increasing the complexity of the system, and ensuring the high availability of newly introduced components, which is not desirable. So this time we decided to take a different approach to designing such distributed locks directly in ZK, starting with designing a data structure for ZK.

We use a tree structure like a Linux file system as the data structure for this distributed system.

The root node is /, and the name of the child node under each node is unique (which means that all nodes under the root node have unique names). In this case, after the three Canal Servers are started, they will first try to create a node with the same name under the root node (suppose /lock1), and since this node name is unique, Only one /lock1 node can be created. Other nodes will fail to be created. Therefore, the first node to be created will be the primary node, and the rest will be the standby node.

How can the standby server discover that the host is down

Let’s look at the second question

How does the standby server know that the host is down?

As mentioned above, the host will create a node /lock1 in ZK, and the host will be created successfully. Before the creation of the node, the host will first establish a connection with the distributed coordination system through TCP, and this connection will be kept alive for a long time. The host will send heartbeat periodically to let ZK feel its existence, so that ZK will know that the host is still alive. If ZK does not receive a heartbeat from the host within a specified period of time (for example, 2s), the host is considered down and a notification is sent to the standby machine.

ZK is not only for Canal, but also for many other services that may need to create nodes on ZK. Different services may have different nodes to create, such as /lock1 for Canal and /lock2 for inventory. It is unlikely that the main canal service is down and the secondary inventory service should be notified. We should notify only those machines that failed to create /lock1 at that time.

So we need to create a mapping list of /lock1 and the corresponding machine, as follows

Each secondary host whose /lock1 node fails to be created is placed in the list corresponding to /lock1

This process is called registration, so that each node’s corresponding machine can be found. If the host that created the /lock1 node is down, ZK can simply notify /lock1 of the corresponding standby machine list through the mapping list. The distributed lock is released. We call this mechanism that the standby machine can sense when a node is deleted the Watch mechanism, and it operates as follows

1. Registration: Client creates a node/lock12. Store: Save this watcher in watcherManager on the client. We can simply think of watcher as the following format

Note: To simplify the process, we’ve added a path field to Watcher. Watcher doesn’t have one, but the principle is the same

Path is the node we listen to, such as /lock1 in this example, keepState is the event state, such as connection success, connection disconnection, etc., and EventType is the event corresponding to the node, such as creation, deletion, child node change, etc.

3. Notice: ZK notifies the client of the node and its deletion/creation event. The client can use this information to find the corresponding Watcher in watchManager and perform corresponding processing. For example, when the node is deleted, the client can sense that the host is down. It can then try to create /lock1, which succeeds and the standby becomes the host.

It is not difficult to design our high availability scheme through the above working mechanism of Watch, as follows:

Suppose that there are two Canal Servers A and B now. In order to become the primary nodes, they first apply to ZK for creation/lock1node
After being successfully created, USER A becomes the active node and starts to work. User B fails to create A node again and acts as the standby node. However, user B uses the Watch mechanism to listen/lock1Node (register machine B into ZK/lock1In the corresponding machine list,/local1After being deleted, B can sense that A is down through the list of machines)
Once A becomes unavailable, ZK is deleted/lock1After receiving the notification, B finds the watcher through the local watchManager and learns that it is the watcher/lock1The node will be attempted again after the deletion event/lock1The node, once successfully created, starts to work as the master node, which the Canal Client also notices/lock1The Canal Client starts with the watch mechanism/lock1Node creation event),lock1The node can store the Address of the Canal Server, so that when the Canal Client is notified, it can pass the address of B, and the Canal Client will establish a connection to the machine.

Scare effects and solutions

According to the above design scheme, Canal’s highly available design has been satisfied, but the distributed lock implemented by the ZK system we currently designed has two problems

Assume that there are dozens of standby servers. When the host is down, these standby servers will try to create VMS/lock1This node, but only one machine to create success and becomes a host, and dozens of other machines will be immediately after creating the node failure in a wait state, this is what we call jing group effect (also called herd behaviour), it is not difficult to find that surprised group effect will cause the waste of resources, then after downtime can only inform a standby node response, without other standby nodeGroup of surprised at firstCan greatly save resources.
The current distributed lock is an unfair lock, which leads to starvation. Some standby nodes may never have the chance to acquire the lock. How to make it a fair lock, so that every standby node has the chance to acquire the lock so that they all have the chance to become hosts?

Solutions are as follows:

Every machine will be there/lock1Next create a child node, the number of child nodes will increase according to the order of application, the node with the smallest number means that its corresponding machine holds a distributed lock, other machines will only listen to the node one level lower than it, so that when a node goes down, only this monitoring machine will be notified, to avoid the scare effect

As shown, the working mechanism is as follows:

Initially, multiple machines are created under the /lock1 nodesub-xxxNodes in ascending order, suppose a machine now creates onesub-000001, because its serial number is the smallest, it represents the distributed lock it holds, and other machines will create it in turnsub-000002.sub-000003Nodes in such ascending order.
The machine that corresponds to each node watches only nodes one size smaller than it
This way, if a distributed lock is released, only nodes one size larger than it are notified, such assub-000001When the host corresponding to the node is down, only machine B is notified (ZK also sends a message to machine Bsub-000001The temporary node is deleted) so that B has the smallest node number and holds the distributed lock.

In this way, we can also notice that the machine obtains locks in the same order as the nodes it creates, thus achieving fair locking.

Introduction of ZK

The above is a simple case of using ZK to achieve distributed lock, of course distributed lock is only one of its many functions, as a distributed coordination service, it also has configuration management, name service, distributed synchronization and cluster management and other functions. Let’s take a quick look at some of the concepts of ZK.

node

We already know that ZK uses a tree structure similar to that of a Linux file system. Each node in the tree is called a Znode, and Znode nodes fall into the following four categories

Temporary nodes: We mentioned earlier after a host outage/lock1The nodes that are deleted when the host disconnects from ZK are called temporary nodes
Temporary sequential nodes: nodes that increase in order of request that we created earlier to implement fair locks
Permanent node: The node remains after the client disconnects from ZK, even after ZK restarts
Permanent sequential node: a node on which functions are added in increasing order of application

Each Znode can store data. The default value is 1 MB. In this way, the Znode can be used as the configuration manager to store important data

Watcher listens for events

As mentioned above, the backup machine can use watcher mechanism to listen for the existence of nodes, so that it can respond to these events in time. Besides listening for the existence of nodes, ZK also provides the following events

public enum EventType {
     None (-1), // The client receives the None event when the connection state changes
     NodeCreated (1), // Node creation event
     NodeDeleted (2), // Node deletion event
     NodeDataChanged (3), // Node data changes
     NodeChildrenChanged (4); // The child node is created. Deletion triggers this event
}
Copy the code

ZK uses these Watcher events to implement distributed coordination services. Let’s look at two applications of ZK in production

ZK application

As the Dubbo registry

Suppose you now have two services, the user service and the order service. For high availability, the order service will deploy multiple machines. Each machine of the order service will be registered in ZK, and each node will be a temporary node, as follows

The caller (user service) gets all the child nodes under/Orders, that is, the list of all machines, and also listens for the NodeChildrenChanged (child nodes created or deleted) event of the/Orders node, and then selects one to connect through some load-balancing algorithm, If one of the machines, such as 192.168.11.1, goes down, the temporary node corresponding to it will be deleted, and the user service will receive the watch event and retrieve the child node of/Orders. At this time, the child node corresponding to the outage will be deleted. So only the child nodes 192.168.11.2 and 192.168.11.3 will be obtained, thus avoiding the connection to the unavailable machine 192.168.11.1

As the configuration center

On production environment, we often need to configure some changes frequently, need immediate effect data, such as a feature online, we need to do gray for some users, the gray scale is gradually expanding, we will need to configure the percentage for each machine, then can make the ZK as configuration center, Have each machine on the line listen for the NodeDataChanged event of the configuration node, so that whenever the node data changes, other machines can be notified immediately for the change to take effect. Our company uses 360 open source distributed configuration management system QConf is based on ZK development.

conclusion

Through this article believe that everyone is not difficult to understand the working mechanism of ZK, mainly to understand its tree structure, node and watch the work mechanism, grasp the can understand ZK as configuration center, distributed lock, and the principle of domain name service, of course, if you want to more deeply understand the ZK, master these is not enough light, such as ZK if only one, There will be a single point of failure, it is necessary to configure the ZK cluster, since it is a cluster, that how to ensure the consistency of data, you need to understand the ZAB protocol, election mechanism, etc., we recommend that you read the “ZooKeeper Distributed process collaboration technology in detail” this book, will let you have a deeper understanding of ZK.

Welcome everybody welcome my public number “code sea”, common progress