Zhang Dabang’s company has developed quite well in recent years, with business surging and staff expanding rapidly. Suddenly, Zhang Dabang has become a “senior” employee of the company. More importantly, after years of unremitting efforts, he finally sat on the throne of architect.
However, Big Fat soon found that it was really not easy to be an architect. The technical selection and architecture design, especially the technical difficulties that we couldn’t handle, eventually had to carry them on our own. Communication, persuasion, compromise, and even arguments are common, which is much harder than when I was just developing.
The company’s IT system has long shifted from stand-alone to distributed, and distributed systems pose great challenges. This Monday just went to work, Zhang Big fat’s mailbox has been filled with urgent mail.
The email from Xiao Liang mentioned an RPC call problem. The architecture group of the company developed an RPC framework for each group to use, but each development group complained that this RPC framework did not support dynamic service registration and discovery.
In order to support high concurrency, OrderService is deployed in four batches. Each client keeps a list of service providers, but this list is static (written dead in the configuration file). If the service provider changes, for example, some machines go down, Or a new instance of OrderService could be added without the client knowing about it, perhaps foolishly trying the already broken instance.
To get the latest list of service provider urls, you have to manually update the configuration file, which is really inconvenient.
Big Fat immediately realized that there was a tight coupling between the client and the service provider.
To decouple this, you must add an intermediate layer!
First, name these services (orderService, for example), and then register all of those OrderServices in this registry. The client can query them in this registry. Just give the name orderService, and the registry can give a URL that can be used. No longer afraid of the dynamic increase or decrease of service providers.
Unconsciously or not, Zhang designed the data structure of the registry into a tree structure:
/orderService represents the concept of a service, and each node below represents an instance of the service. For example, the second instance of the Order Service represented by /orderService/node2, the URL of the instance can be recorded on each node so that it can be queried.
Of course, the registry must be able to communicate with each service instance, and if a service instance is down, it must be removed from the tree so that the client can’t query it.
Well, you can establish sessions directly between the registry and each service instance, and have each service instance send heartbeat periodically. If the heartbeat is not received after a certain period of time, the service instance is considered dead, the Session expires, and it is removed from the tree structure.
Zhang Big fat replied his idea to the small beam, then see the mail of small king.
What Xiao Wang mentioned in his email is the coordination of three Batch jobs, which are deployed on three machines, but only one Batch Job can be run at any time. If one of them is down unfortunately, the remaining two need to be elected. The Batch Job needs to carry on the legacy and continue working.
In fact, this is a Master election problem, Zhang Big fat saw the essence at a glance.
Only in order to elect Master, these three Batch jobs need to cooperate with each other, which is troublesome!
How about a database table? Create a Batch Job that inserts the same data into the same table. The first Batch Job that inserts the same data into the same table is the Master Job.
But if the Batch Job that captured the Master fails, it will never be captured by others! Because the record already exists, other Batch jobs cannot insert data.
If the Master is not updated within a period of time, it is considered dead. Other Batch jobs can continue to be updated….. But what a bother!
Instead, have them go to a registry and say, “I’m master!” The loudest voice is Master.
Each of the three Batch jobs is assigned to the registry to create a tree node (e.g., /master). The one who succeeds in creating a tree node becomes the master. The other two Batch jobs monitor this node, and if this node is deleted, they start a new race to create the /master node.
When will the node be deleted? The current Master machine is down. Obviously, the registry also needs to communicate with individual machines to see if they are alive.
If machine 1 is not dead, but is not connected to the registry for a long time, the registry will notice that the Session has timed out and delete the /master created by machine 1. Let machine 2 and machine 3 fight over each other. If machine 3 becomes the master and starts Batch jobs, machine 1 does not know that it has been removed from the master Job and is still trying to run Batch jobs, which conflicts.
Machine 1 must be able to sense that it is disconnected from the registry and stop Batch Job. When machine 1 is connected to the registry again, it will know that it is no longer the master.
No matter what kind of scheme, implementation is very troublesome, this damn distributed!
Give xiao Wang a reply first. Then read CAI’s email.
The problem mentioned in CAI’s email is more troublesome, with multiple different systems (on different machines, of course!). To operate on the same resource.
This could have been done on a single machine using a language lock, such as Java’s synchronized, but now it’s distributed, with programs running in different processes on different machines. Synchcronized is no longer useful.
This is a distributed lock problem.
Can you think about the Master election problem and let people grab it? If a /distribute_lock node is created in the registry first, the lock is taken over and the /distribute_lock node is deleted.
But it’s not fair that a system might grab it more than once.
If you had each of these systems create child nodes under /distribute_lock in the registry, and then give each system a number, it would look like this:
Each system then checks its own number and considers the one with the smaller number to have the lock, such as system 1.
Process_01 is deleted and a new node (process_04) is created:
Process_02 is the smallest system in the system. System 2 holds the lock and can operate on the shared resource. Once you’re done, delete the process_02 node and create a new one. Process_03 is now minimal and can hold the lock.
And so the cycle goes on…… Distributed locks can be implemented!
Look, I designed this centralized tree structure is very good, can solve all kinds of problems! Zhang Big fat is not satisfied.
Ok, tell xiao CAI this idea first, and we will hold a meeting this afternoon to discuss the details.
As he was about to reply to CAI, Big Fat suddenly realized that he had missed an important point, that is, the high availability of the registry. If the registry only had one machine, once it failed, the whole system would be dead.
The registry also needs to have multiple machines for high availability, and the tree structure that I’m so proud of needs to be synchronized across multiple machines. What if one of them dies? What if the communication times out? How can data in a tree structure be consistent across machines?
The original problem of Xiao Wang, Xiao Liang and Xiao CAI has not been solved, and this registration center alone will be killed. It’s Mission Impossible to create a registry like this with our own technical strength!
Big fat quickly search the Internet to see if there is a similar solution, so big fat feel extremely lucky, as expected, there is a, called Zookeeper!
The tree structure used by Zookeeper is very similar to what I expected. More importantly, the tree structure data can be replicated reliably across multiple machines to achieve data consistency across multiple machines. And if some of these machines are down or can’t be connected due to network reasons, the whole system can still work.
Zookeeper’s key concepts and APIS
1. The Session: Indicates the connection session between a customer system (for example, Batch Job) and ZooKeeper. After the Batch Job is connected to ZooKeeper, it periodically sends heartbeat information. If Zookeepr cannot receive heartbeat information within a specified period, The Batch Job is considered dead and the Session ends.
2. znode : Each node in the tree structure is called a ZNode and can be classified by type as a permanent ZNode (always present unless actively deleted), Temporary zNodes (deleted when Session ends) and sequential ZNodes (process_01,process_02….. in tsai’s distributed lock) .
3. Watch: A customer system (such as Batch Job) monitors ZNodes and notifies Batch jobs of changes (such as data deletion and modification) of ZNodes. In this way, The Batch jobs can take corresponding actions, such as creating nodes.
Well, these concepts and interfaces should meet our requirements. That’s it. Let’s call a meeting this afternoon and start learning about Zookeeper.
Postscript: This article describes what Zookeeper does from a user’s perspective, but how it works internally is a Big topic for another time.
(after)
What you have seen is just the tip of the iceberg. For more exciting articles, please go to “2016 Article Highlights” or “2017 Article Highlights”.
Code farmers turn
Use storytelling techniques