Hi, I’m crooked.
New Year’s day when I saw a particularly outrageous rumor ah, what is the specific content I will not say, I am afraid of dirty everyone’s eyes.
However, I saw a group of that called a vivid, we discussed the wind of water, as if we were on the scene.
I would have laughed it off. But after a while I suddenly slapped my thigh: this is material.
I could talk to you about a consensus algorithm.
When it comes to consensus algorithms, the first things that come to mind are Raft, Paxos, And Zab algorithms that are very consistent and difficult to understand.
But there is another weak consensus algorithm that is easier to understand, the Gossip protocol.
Gossip girl: Gossip girl, Gossip girl, Gossip girl.
Here’s a quick look at what this “gossip” is all about.
Gossip protocols
The Gossip protocol was first proposed in a 1987 paper published in Epidemic Algorithms for Database Maintenance.
Bitsavers.trailing-edge.com/pdf/xerox/p…
When I first saw the title of this paper, I was stunned: there is no word for Gossip.
Mainly, Epidemic Algorithms, which I happen to know.
Uh, Algorithms. Nothing to say.
Epidemic is what?
Sticking to current events:
So Epidemic Algorithms translates as Epidemic Algorithms.
Therefore, the scientific name of Gossip is also called “epidemic Algorithm”, but people prefer to call it Gossip. After all, don’t you like to hear “gossip”?
Before you talk about your thesis, set a brief tone.
What do you think is the most basic, core, and important action of a conformance protocol?
Is it data update?
In order to ensure the data consistency of each node, it is necessary to involve the data update operation.
Therefore, the introduction of the paper describes three methods to update data:
- Direct mail
- Anti-entropy
- Rumor mongering
Direct Mail
Without talking nonsense, let’s go straight to the last picture:
What does this picture mean?
It’s a total of eight dots, assuming each represents a server, and they’re all on equal terms. There’s no central node, no master, no slave.
The red node at the top indicates that the node has changed its data and notifies the remaining nodes of the changed data directly.
The same goes for data changes on other nodes.
It can be simply understood as a loop traversal. Every time a data change occurs, a loop traversal is required to maintain the consistency of the data.
The advantages of this scheme are clear: simple, blunt and straightforward.
But the disadvantages are as obvious as the advantages. Let’s look at what the paper says:
Look at the but section:
First, it is not completely reliable, because this requires that each site must know that all sites exist. But the reality is that some sites don’t always know all the other sites.
Then, messages (mail) sometimes get lost, and once lost, even the final consistency is not guaranteed, the whole thing is cool.
As a matter of fact, Direct mail is not the main scheme discussed in the paper. It is written in the first place to serve as a primer.
We’ll focus on anti-entropy and Rumor Mongering.
Set the tone:
- The anti-entropy propagates all the data on the node
- Rumour -Mongering is a new arrival or change in data at a transmission node
To put it bluntly, one is full and one is incremental.
Anti-entropy
Some students may be confused by the word “anti-entropy”, but in fact, they do not know what “entropy” is.
To put it bluntly, entropy is colloquially known as chaos.
Your room, for example, will become more and more entropy if you don’t organize it. And the process of tidying your room is called anti-entropy.
This thing you can simply understand first, I will not be clear for you in a moment, this thing to talk about to rise to the height of the universe and philosophy.
My main fear was that you wouldn’t understand.
In the paper, the anti-entropy mode is described as follows:
Each server regularly randomly selects the other, and the two smooth out any differences between them by exchanging their content. This scheme is very reliable.
But you need to check the full contents of each server, which means that there is a bit more data, so you can’t use it too often.
Experiments show that anti-entropy, while reliable, propagates updates much more slowly than direct mail.
If they are not synchronized, then the data between the two are more and more different, which is more and more entropy.
The goal of synchronization is to narrow differences and achieve final consistency, which is anti-entropy.
The definition is that definition.
Rumor mongering
Rather than anti-entropy, rumour-spreading is literally easy to understand.
For example, AS a college student, I don’t know the whole school. But the students in the school are inextricably linked with each other.
Suppose that one day, I happened to meet the campus beauty walking alone on the road, so I went up to her and discussed with her the consensus algorithm and other related issues in the computer field. We had an in-depth discussion on these issues and exchanged our understanding and views.
From our side, the whole process became more and more heated, and I didn’t know how to walk to the lover’s slope.
I guess every college has a place called Lover’s Slope.
And then the other girls saw him. She said to her bestie: do you know crooked? Yeah, you know, the freshman, the hottie. I’d see him hanging out on Lover’s Slope with the school sweetheart.
And then it goes on and on and on. The news was known to the whole school.
“Crooked and schoolgirl strolling in lover’s slope” this message through gossip mode, to achieve the final consistency.
The difference between “rumor spreading” and “anti-entropy” is that only new information or changed information is transmitted, rather than the full amount of information.
In the example above, all you need to do is synchronize the latest news about “Crooked and schoolgirl strolling on lover’s Slope.”
You don’t need to synchronize information about who the crooked is, who the campus beauty is, where the lovers are, and so on.
When it comes to “rumour-spreading” and “anti-entropy”, the paper also has this definition:
Because there are simple epidemics
Now, in this pattern, there are two states: infective, which means contagious, and susceptible to being susceptible.
A node in the infective state has updates and needs to share them with other nodes.
A node in the susceptible state indicates that it has not received updates from other nodes (it is not infected).
So when I say “infection” later on, you should know that I’m seeing it here, not making it up.
As for “spreading rumors” and “anti-entropy”, I would like to borrow the serious description in Teacher Zhou Zhiming’s Phoenix Structure:
Icyfenix. Cn/distributio…
The time it takes to achieve consistency is opposed to the redundancy of messages in network propagation, where improving one worsens the other.
As a result, Gossip has devised two possible patterns for spreading news: anti-entropy and rumour-mongering, both literary names.
Entropy is a rare concept in life but a common one in science. It represents how chaotic things are.
Anti-entropy means anti-chaos. The goal is to improve the similarity between nodes in the network. Therefore, in the anti-entropy mode, all data of nodes will be synchronized to eliminate the differences between nodes, and the goal is complete consistency of nodes in the whole network.
However, under the premise that the nodes themselves will change, the number of messages in the whole network will be very large, which will bring huge transmission overhead to the network.
However, the rumour mode is aimed at spreading messages and only sends the data of newly arrived nodes, that is, only sends the change information externally. In this way, the amount of message data will be significantly reduced and the network overhead is relatively small.
A web site
It’s a showdown. I actually saw this website and decided to write this article.
Because this site has a very realistic animation simulating the synchronization process of the Gossip protocol, a GIF is worth a thousand words.
The address is put here, you can visit to play:
Flopezluis. Making. IO/gossip – simu…
To give you a glimpse of how it works:
If you don’t get it, at least it looks good.
Here’s how it works:
First let’s look at Nodes and Fanout here.
Nodes is the number of Nodes. The 40 here represents the number of small circles below. For example, if I am 18 years old, I change it to 18 and it looks like this:
Basically what is this Fanout?
In the rotation image at the head of the page, the first image looks like this:
The answer lies in Learn More.
Managementfromscratch.wordpress.com/2016/04/01/…
What is a Fanout? It also briefly introduces the basic working principle of the Gossip protocol.
It says that the Gossip protocol is very simple in concept and very simple in coding. The basic idea behind them is this:
A node wants to share some information with other nodes in the network. It then periodically selects a random node from the set of nodes and exchanges information, and the nodes that receive the information do the same. This information is periodically sent to N targets, known as fanouts.
So, the previous Fanout=4 means that a node synchronizes the information it wants to share with the other four nodes in the cluster each time.
It should look like this in the simulator:
In the diagram above you can see that there are many lines, but they all start with a red node.
This red node is you click at will with the mouse small circle in one or many can, the mouse one click will become red, it is over calf, red code, said “be infected”.
How do you get that line up there?
Once you have a small red circle, click “Show Paths” above to see the Paths:
But what about Fanout=4? Why are there so many paths?
Because, although this node knows all these other nodes, it will only choose four of them to infect.
The image above is still a bit complicated, so I’ll turn down all the parameters to make it look cleaner:
There is a node in the cluster whose information is updated. The node is aware of the existence of the other five nodes, but it will only push the information to two of them. Clicking the Send Message button will look something like this:
You can see that there are already three red nodes in the diagram above, and two paths are thicker, meaning propagated from this path.
The entire cluster will eventually complete the “infection” to reach the final consistency:
Also, the Gossip protocol is fault-tolerant:
As indicated on the page, it is possible to Delete a portion of a path using the “Delete” button, such as the following:
Deleting two paths means that the nodes are unreachable, but eventually the cluster will all be infected.
After the path is deleted, the node will no longer communicate with the corresponding node, but the whole cluster will still converge:
You can also go to the website to play, and there is a tip like this:
Click the Play button and you can pause it at any time, making it easier to watch the whole propagation process.
Finally, there’s one key thing that’s not said about this diagram, and that’s this formula:
In Learn More also have mentioned this formula, it is the complexity of the gossip protocols, O (logN) :
For example, if Fanout=4 is set each time, the relationship between the number of nodes and the estimated propagation rounds looks like this:
- 40 nodes, 2.66 rounds
- 80 nodes, 3.16 rounds
- 160 nodes, 3.66 rounds
- 320 nodes, 4.16 rounds
- 640 nodes, 4.66 rounds
- .
It can be seen that as the number of nodes doubles, the number of propagation rounds does not increase significantly.
This is the word Scalable mentioned earlier in the Learn More screenshots
This is a level 4 word, will test, remember, is “scalable” meaning.
Scalable clusters using the Gossip protocol are very nice.
Other Points for attention
On this site, the most important thing is the GIF simulation function, but don’t neglect the description of other parts of the site.
For example, this paragraph, I think is very important.
There are two questions mentioned in this paragraph, and I will say them one by one.
First, it says that during the site simulation, all nodes appear to send messages synchronously, as if there were a global loop.
We do this in the simulation because it looks more intuitive.
However, in a true gossip protocol, each node has its own cycle and there is no or no synchronization between them at all.
What does it mean?
To be blunt, each node synchronizes messages out on its own cycle, say every 10 seconds. It doesn’t matter when other nodes trigger the synchronization operation, just take care of yourself.
The second question I think is very important:
How do the nodes know about each other?
How do nodes know the existence of other nodes?
One way is that when a node joins a cluster, it must know information about a node in the cluster. As we know from the previous GIF, if a node is known by another node, it will also be infected eventually.
This raises the question: how does a new node know about a node in the cluster when it joins?
Very simple. One solution I know of is manual assignment.
The Redis cluster uses the Gossip protocol to exchange information. When a new node is added to the cluster, a meet command is used.
www.redis.cn/commands/cl…
This thing is manual assignment.
One other thing to note is this:
Here is another example of a mock website:
www.serf.io/docs/intern…
It can control these parameters to measure the time for the cluster to reach consensus.
The figure above shows the situation where the information exchange frequency (GOSSIP INTERVAL) is 0.2s, the number of Fanout nodes is 3, the total number of nodes is 30, and the packet loss rate and node failure rate are 0.
In this case, the corresponding time graph to reach final consistency looks like this:
It’s basically done in a second.
You can also modify the parameters to see how the time map changes.
For example, if I just changed the number of nodes from 30 to 3000, the time graph would look like this:
Convergence is completed at about 1.75s.
The nodes are expanded 100 times, but the time is increased by less than 1s, which is really excellent.
It’s all well and good, but here’s an exciting one to get a sense of the scale of the horror:
As can be seen from the GIF, the first one or two transmissions were ok, at least the eyes could see roughly. However, in the later rounds, most nodes were infected, but they continued to spread information.
The amount of information is simply mind-numbing.
Six degrees of separation
Finally, there’s something interesting called the “six degrees of separation theory” :
In 1967, Stanley Milgram, a professor of psychology at Harvard University, wanted to describe a network of connections between people and communities. I did a chain letter experiment and discovered the “six degrees of separation” phenomenon. Simply put: “There are no more than six people between you and any stranger, which means that you can’t know any stranger with more than six people.
Six degree division theory, also known as small world theory. This has a lot to do with the Gossip protocol.
I saw a related video on the small broken station. I think the explanation is quite clear. If you are interested, you can go to see it:
www.bilibili.com/video/BV1b7…
In the video, there is an image like this:
Boy, isn’t this a copy of our previous website? It looks so friendly.
When the theory was first put forward, it was “six people at most and you can meet any stranger”.
But with the explosion of social networking in recent years, the earth has been reduced to a village.
So the number is gradually decreasing:
And if you narrow it down to, say, a programmer, it’s even smaller.
Sometimes you get a business group, you go in and see guys and ex-colleagues, and you tell me how big that group can get.
This article has been included in the personal blog, welcome to play.
www.whywhy.vip/