The background,
I am a small virus, other viruses call me small B, I look like the picture below.
I’m now 100 nm in size, and I have a lot of tentacles, which humans call crowns, so I got my academic name: coronavirus. For this academic name, I have not been satisfied, how can use appearance to name it, this is to take poison by appearance.
I was born in a bat body, every night, this animal everywhere foraging, its favorite is in the forest foraging, but recently the forest area sharply reduced, it had to come to the human living city foraging, looking at the colorful lights, I was infatuated.
This bat carries more than 100 viruses, like Ebola and MERS, which I play with every day. SARS virus is my close relative, in 2002, it also caused a not small infectious disease epidemic, sensational the whole virus community, the human called it SARS.
Second, the accident
That night, clouds overcast, the air was cloudy, the bat flew to the depths of the forest, was suddenly hit by an electric current, suddenly fell to the ground, a large net captured the bat, followed by a black coat of human put the bat into a cage.
For the next few days, the bats were kept in cages until they were taken to an area where wild animals were everywhere, but they were either tied up with thick ropes or kept in cages. A fat man walked into the bat, gave the man in black a pile of money he had saved, and took the bat away.
The bat probably got too close to the fat man, and I was passed to the man’s hand, and then to his mouth, and then to his body, through the food he touched.
Seed node
Once inside the human body, I tricked the human immune system into entering the human cells, releasing my RNA and using the RNA polymerase of normal cells to replicate my RNA, thus making more copies of the virus and attacking the human lungs.
A few days later, the human developed a fever and a cough, and my companion sneezed into other people and began copying himself in large numbers.
Thousands of people in this prefecture have been infected by the virus. I was called the seed node, and the human was called Patient Zero.
Gossip protocol
Normal cells in my body are very interested in my infectivity.
Normal cell: “Coronal brother, how do you spread so fast?”
Me: “I actually took advantage of the Gossip protocol.”
Me: Gossip has three main functions: direct mail, anti-entropy and rumor spreading.
Just like Gossip, this protocol spreads information to the entire network in a random and contagious way, and makes all nodes in the system consistent within a certain period of time. This is the protocol that achieves the final consistency.
4.1 Direct Mail
Data that needs to be updated is directly sent to other nodes. If the data fails to be sent, the data is cached and retransmitted. As shown in the figure below, human A directly transmitted the virus to C and E, while B and D were not infected and needed to be re-infected.
Advantages: Easy implementation and timely data synchronization.
Disadvantages: Data may be lost because the cache queue of retries is full. Final consistency cannot be achieved.
So how do you achieve final consistency? That’s where the second function comes in: anti-entropy.
4.2 the entropy
What about the term anti-entropy?
Entropy refers to the degree of confusion, and anti-entropy is to eliminate the data differences between different nodes and improve the similarity of data between nodes.
Anti-entropy process:
- (1) Nodes in the cluster are randomly selected at regular intervals.
- (2) Exchange all their data to eliminate the differences between them.
- (3) Achieve the final consistency of data.
Let’s take an example of viral transmission to illustrate anti-entropy.
First, human A was infected with two viruses, namely virus T and virus R, as shown in the figure below:
Human E is infected with three viruses, namely virus T, virus S and virus Y, as shown in the figure below:
Human A transmitted the virus T and R to human E, who was already carrying the virus T, so she eventually became infected with four viruses: T, S, Y and R. In other words, the missing virus R in human E was repaired by anti-entropy. As shown below:
In fact, there are three main ways to reverse entropy: push, pull, push and pull.
2 push
Push: Push your copy data to an object, repairing entropy in the other object’s copy.
As shown below, human A transmits virus R to human E, which contains all of A’s viruses.
4.2.2 pull
Pull: pull all copies of the other party’s data, repair the entropy in your own copy.
As shown in the figure below, human A only has viruses T and R. After an active pull, viruses S and Y from human E are synchronized to human A. Finally, human A carries four viruses: T, R, S and Y.
Holdings of push-pull
Push and pull: Fixes entropy in both own and object copies.
As you can see below, human A and HUMAN E both end up being infected by the same four viruses: T, R, S, and Y.
4.2.4 Disadvantages of anti-entropy
(1) By pushing, pulling and pulling above, we can mainly realize that anti-entropy requires nodes to exchange and compare all their data in pairs. In this case, the communication cost is very high, so it is not recommended to frequently implement anti-entropy in practical scenarios.
So is there a way to reduce the number of anti-entropy cases?
The answer is yes, we can reduce the amount of data and communications that need to be compared by introducing mechanisms such as checksums.
(2) When anti-entropy is implemented, all relevant nodes are known, and the number of nodes should not be too large. If the nodes change dynamically or there are too many nodes, anti-entropy is not appropriate.
Is there a way to solve for dynamic, multi-node ultimate consistency?
The answer is yes, which brings us to the Gossip protocol’s third spreading function, rumor spreading or epidemic spreading.
4.3 Epidemic transmission
This process
The Gossip protocol’s third transmission function, epidemic transmission, is the widespread spread of the virus.
A infects B and E, B infects C and D, and D infects F and G. Finally ABCDEFG was infected.
In a distributed system, when a node has new data, the node becomes active and periodically contacts other nodes to send new data to it until all nodes have stored the data, which can be understood as the way of pushing in anti-entropy mentioned before. As shown below:
4.3.2 shortcomings
The mode of epidemic transmission has the following shortcomings:
- Time randomness: the probability that all nodes reach consistency is random. The anti-entropy process can be improved by using closed-loop repair.
- Message redundancy: The same node will receive the same message for many times, increasing the pressure of message processing. Each communication will load the network bandwidth and CPU resources, thus affecting the time to reach the final consistency.
- Byzantine problem: If a malicious node appears, then other nodes also have problems. Therefore, you need to repair the faulty node first.
4.3.3 advantages
- Support dynamic, multi-node: allow to dynamically add or subtract nodes, support very many nodes.
- Most nodes: Final consistency can be achieved without most nodes functioning properly
- Fault tolerance: Any node restart or downtime will not affect the operation of Gossip protocol, natural distributed system fault tolerance.
- Decentralization: Nodes are all peers, no special nodes. The failure of any node does not prevent other nodes from continuing to perform anti-entropy.
- Fast: Because each node can propagate, the speed is exponential, just like the Novel Coronavirus now.
Because the Gossip protocol is a fault-tolerant algorithm with redundancy, it is an algorithm to ensure final consistency. Although the time point at which all nodes reach the same is not clear, it can also be predicted by improving the execution process of anti-entropy, such as closed-loop anti-entropy (not discussed in this paper).
Five, the summary
This article explains the Gossip protocol through a story of a bat infected with the coronavirus.
-
The Gossip protocol is an asynchronous repair protocol that implements final consistency. Anti-entropy is preferred.
-
Antientropy is used more often in memory components. For example, Cassandra and InfluxDB.
-
Rumour spreading (epidemic spreading) is contagious and nodes infect each other. Suitable for dynamically changing distributed systems. For example, Cassandra dynamically manages cluster node states.
-
In actual scenarios, direct mail must be realized, with the lowest performance loss. Data inconsistencies can be fixed by sending update data or cache retransmission.
-
In storage components where nodes are known, inconsistencies in data copies are fixed by anti-entropy.
-
When the cluster node changes or there are too many nodes, the rumor propagation method is adopted to synchronize and update the data of multiple nodes to achieve the final consistency.
-
Gossip’s three functions are actually anti-entropy. The first uses message queuing, the second uses push and pull messages, and the third uses spreading rumors.
-
If a node is faulty, rectify the fault first.
About the author: 8 years of Internet experience, good at architecture design, distributed, micro services. Public account: Wukong chat architecture, with stories to explain distributed, micro services. Author of 7 Experiments to Master JVM Performance Tuning. I wrote a SpringCloud practical tutorial and independently developed PMP and Java brush applets. Concern public account: Wu Kong chat structure, reply PDF to receive.