Understand the FLP-IMpossibility paper

The original address: www.inlighting.org/archives/un…

The FLP paper plays an important role in the distributed world, and of course, it’s hard to understand. This is the first distributed paper I read down every word, it is very difficult, here to record, and even possible to write simple, I hope to help new people into distributed computing to understand the FLP paper more easily. Of course, no matter how simple, mathematical symbols can not run, but do not be afraid to read a word down.

The name of the original paper is Impossibility of Distributed Consensus with One Faulty Process. The simple name is FLP-IMpossibility.

If something is wrong in this article, please comment and correct it.

preface

The FLP theorem puts a coffin on the distributed consistency algorithm, proving that it is impossible to implement a true consistency algorithm.

Of course, before we start, let’s explain what the real consistency algorithm is:

Validity: Validity. If there are only 0 and 1 types of data in all nodes, the final decision must be one of them. It is impossible to say that the algorithm somehow reached a consistent value such as -1.

-Leonard: It’s an Agreement between all the nodes.

Termination: Termination. All nodes that are running normally are finally able to make decisions.

Note that consistent algorithms you know like Paxos or Raft are not really consistent because they don’t satisfy all three of the above conditions.

Terminology term

The terminology below is for this paper only, of course you can skip this section and come back to it later when you see related terminology.

Raft, Paxos, etc.

Byzantine Generals problem: The exponential data is tampered with during transmission. Consensus algorithms generally assume no Byzantine general problem, that is, that the data will not be tampered with in transit (a node will not receive a falsified data). Because in general, network protocols such as TCP guarantee that the data you get is most likely correct, and the consistency algorithm usually runs on the Intranet, which is relatively safe.

Process: In this paper you can think of it as a node in a distributed system. Each process PPP has an input register xpX_pXP, which can be understood as the input value. There are only 0 and 1. In addition, process PPP also has an output register, yby_byb, whose values are B, 0, and 1. Xp ∈ {0, 1}, yp ∈ {b, 0, 1} x_p \ in \ {0, 1 \}, y_p \ \ {b, 0, 1 \} in xp ∈ {0, 1}, yp ∈ {b, 0, 1}.

Message System: Think of this as a network that connects all the processes in the cluster. All processes in the whole system share this message system, through which processes communicate with each other, that is, send messages to each other.

Asynchronous system: Asynchronous system. The asynchronous system requires that (1) you do not know the relative processing speed of each process, i.e. you do not know which process will finish first. (2) Messages in Message systems may be delayed (and you don’t know how long it will be delayed, but it will be reached). (3) There is no single synchronized clock in the whole system, which means you can’t design algorithms based on time. (4) You do not have the ability to detect the dead (crash) of one process, and one process does not notify other processes in advance of dead.

Nonfaulty: No fault occurs.

Internal state: A process’s input register xpX_pXP, output register yBY_byb, Internal storage space, and so on constitute the Internal state of a process.

Configuration: A Configuration contains the internal states of all processes in the cluster plus the Message System.

Step: The process from one configuration to another is called Step. For example, a process receives a piece of data from the Message System and then changes its output register to send a message to the Message System. The whole system forms a new configuration.

Atomic step: An Atomic step that includes processes that receive data from message System, process it locally, and send data to other processes via Message System.

Decision state: The state in which a Decision is reached, such as when the process reaches a Decision and determines its value to be 0 or 1.

Initial state: all processes are fixed except the input register, where output register yb= by_B =byb= B.

Initial Configuration: All processes are Initial state and message System is empty.

So initial state, Initial Configuration, Configuration. Initial Configuration does not really refer to the configuration at the start of the consistency algorithm, but to the Initial configuration at the start of each round. For example, when we start the first round of resolution, it is an initial configuration. During the resolution process, our configuration will be changed every step until all processes are unified. A new Initial Configuration is formed. In the second round, our algorithm will start to decide again, and the result of the first round of decision will be the initial configuration of the second round, and so on.

This is why initial State allows the input register of the process to be different, since everyone’s input register is inherited from the previous round of resolution.

Run: A Run contains multiple steps.

Deciding Run: In this run, if some processes can reach the decision state, the run is called Deciding Run. (Note that some process is required in the paper).

Admissible Run: There is only one Process dead. All other normal processes receive their own messages from the Message system.

Event: e(p,m)e(p,m) E (p,m) E (p,m) indicates process PPP Receives data MMM and processes it.

Schedule: A series of events constitute a Schedule. Schedule is represented by sigma \sigma sigma. We use σ(C)\sigma(C)σ(C) to represent the result of schedule σ\sigmaσ on Configuration CCC. σ(C)\sigma(C)σ(C) is reachable from CCC. If a configuration can be reachable from some initial configuration, the configuration is accessible. Just change the word to distinguish one from initial configuration and one from configuration).

Accessible Configuration: See the instructions in schedule.

The Decision value: If some processes are already in the decision state in a configuration CCC, its output value yb= vy_B =vyb= V, Then we consider the decision value of this configuration as VVV. Note: there is no need for all the processes to be in decision state, because the conditions are relaxed in the paper. As long as part of the processes can make a decision, the whole process can make a decision.

Note that the decision state is a statement for a process, while the decision value is a statement for the entire Configuration.

Partially correct: It can be called Partially correct if it meets either of the following conditions. (1) All no accessible configuration contains two or more decision values. In other words, all accessible configurations have only one decision value. To put it bluntly, all the processes in the whole cluster, or you are not yet in the decision state, and all the processes in the decision state have the same decision value. (2) suppose the decision value v∈{0,1}v\in \{0,1\}v∈{0,1}. There are some accessible configurations that have decision value VVV. Note only some, because some accessible configuration may not make a decision value.

Totally correct: In a case where there was only one Process dead, it was partially correct, and each Admissible Run was deciding run. That’s totally correct.

Partially correct and totally correct. You can see that partially correct is quite loose, so long as you have the ability to make a decision, it will probably make a decision a few times out of ten, even though you partially correct. For example, a bunch of nodes, start the first round of negotiation, fail, forget it, and then go to the second round. Then the second round of negotiations, with luck, produced a resolution. So it doesn’t correct.

For Totally Correct, he demanded that every round of negotiations be a deal.

The study proved that, in all consensus algorithms, admissible Run wasn’t deciding run. In plain English, no algorithm can guarantee that every negotiation will lead to a resolution. (If you think about it here, if the algorithm can’t guarantee consensus every time, then every time we choose the process that can’t reach consensus, then it’s not always impossible to reach consensus.)

Bivalent: If a configuration contains two decision values, some of which have decision states of 0 and some of which have decision states of 1, it can be called Bivalent.

Univalent: If there is only one decision value in the Configuration, it is called Univalent.

0-valent: univalent, whose decision value is 0.

1-valent: univalent, whose decision value is 1.

Model

Here is a detailed description of the model established by the paper.

The distributed environment assumed in this paper is as follows:

Don’t think about Byzantine failures.
Assume that the Message System is reliable, does not lose data, and does not send data exactly once to the process. Note, however, that the data sent in the message system can be out of order, that is, there is no guarantee of the order in which the data is transferred, and sometimes nothing (∅\emptyset∅) is returned.
Suppose a process is dead at an inappropriate time.

Under the above three loose conditions, the paper proves that there is no consensus algorithm that can guarantee consistency.

For simplicity, the paper assumes that the input register xp of all processes ∈{0,1}x_p \in \{0,1\}xp∈{0,1}. All nonfaulty processes enter the decision state with a output register yp∈{0,1}y_p \in \{0,1\}yp∈{0,1}. Normally, since it is a consensus algorithm, it is necessary to ensure that all nonfaulty processes choose the same value in the decision state, such as 0 or 1. However, the conditions are relaxed again in this paper, which only requires that part of nonfaulty Process be enough to make a decision and reach the decision state.
As mentioned earlier, message system may return ∅\emptyset∅, but this paper assumes that if a process keeps receiving messages from the Message system, it will eventually receive the message it was intended to send. This simulates the latency of the network, but the message must arrive. The message system simulates the delay by sending ∅\emptyset∅.

Consensus protocol $P$

Assume a consensus PPP algorithm running in an asynchronous system that ensures the number of processes N≥2N\geq2N≥2. Each process input register xpx_pxp can only be one of {0,1}\{0,1\}{0,1 \}{0,1}, xp∈{0,1}x_p \in \{0,1\}xp∈{0,1}, The output register ypy_ yp is {p} {b, 0, 1}, {b, 0, 1 \} {b, 0, 1} one of yp ∈ {b, 0, 1} y_p \ \ {b, 0, 1 \} in yp ∈ {b, 0, 1}. When a process is in initial state, its output register yp=by_p=byp=b, yBY_BYB will be changed to 0 or 1 by transition function PPP. When the process reaches the decision state, then the transition function PPP cannot change its output register. The output register is write-once.

Transition function PPP is a process that receives a message, calls its own transition function, and changes its internal state.

Communication between processes is achieved through the Message System. The format of message is (p,m)(p,m)(p,m), PPP is the target process, and MMM is the message content. The entire Message system can be thought of as a buffer (no message order is guaranteed).

Message System supports two operations:

Send (p,m)send(p,m) Send (p,m) Send (p,m). (P,m)(p,m) is stored in the Message System buffer.
Receive (p) Receive (p) Receive (p) Fetches messages belonging to Process PPP from message System. If it succeeds, the Message System will remove the message. It is also possible to make an ∅\emptyset∅ and return an ∅\emptyset∅.

By doing these two things, Message System simulates the network uncertainty of an entire distributed system (but at least ensures that data is not lost, which is much better than in real life).

Assume that the configuration of the entire system is CCC. E (C) E (C) E (C) indicates the result of an event occurring on a cluster whose configuration is CCC.

Lemma1 “Commutativity” property of schedules

Suppose that from some configuration CCC, the schedules σ1\ sigMA_1 σ1, σ2\ SIGMA_2 σ2 lead to configuration C1C_1C1, C2c2c2, respectively. If the sets of processes take steps in σ1\sigma_1σ1 and σ2\sigma_2σ2 respectively, If σ2\ sigMA_2 σ2 can be applied to C1C1C1 and σ1\ SIGMA_1 σ1 can be applied to C2C2C2, then σ2\ sigMA_2 σ2 can be applied to C1C1C1 and σ1\ SIGMA_1 σ1 can be applied to C2C2C2, And both lead to the same configuration C3C_3C3

We assume that σ1\sigma_1σ1 and σ2\sigma_2σ2 have no intersection, σ1\sigma_1σ1 alters only the values of P1,p2p_1, p_2P1, and P2, and σ2\sigma_2σ2 alters only the values of p3p_3p3. As you can see, no matter which sigma \sigma sigma is executed first, the result is the same. This is the commutative law of schedule.

Main Result

First the conclusion:

No consensus protocol is totally correct in spite of one fault.

As long as one node fails, no consensus algorithm can reach Totally correct.

This paper proves that a consensus algorithm PPP is total correct by contradiction method. The idea of reduction has two steps: (1) Initial configuration was bivalent. (2) Consensus algorithm was derived from a Bivalent intial configuration. There is always an Admissible run that prevents the system from making decisions.

Lemma2

$P$ has a bivalent initial configuration.

Similarly, we used contradiction to assume that THERE was no Bivalent Initial configuration of PPP. So there are only 0-valent and 1-valent PPP. You can’t say there’s only one of 0-valent and 1-valent, because if so, what’s the point of your consensus algorithm? Anyway, we all agree that it’s the same value, so we don’t have to change it.

As shown in the figure above, if only one process value of two adjacent Initial configurations is different, the two initial configurations are called Adjacent Initial Configuration. In addition, if the rule of our consensus algorithm was minority to majority: Process = iProcess = IProcess = I had the largest number of processes, the configuration would be I-valent.

On the left is the initial Configuration change process when no node fails.

P2p_2p2 on the right is dead at the wrong time. Notice that C0C_0C0 on the right is the same as C1C_1C1 (with p2p_2p2 removed). So, are they 0-valent or 1-valent? C0C_0C0 was originally 0-valent (when P2P_2P2 normally operates), but is now the same as C1C_1C1, which was originally 1-valent. Same thing for C1C_1C1. So C1, C2C_1, C_2C1, C2 can now be either 0-valent or 1-valent, which can’t make a decision at all, which means they’re bivalent. However, we assumed that all initial configuration was univalent, which is inconsistent, so Lemma 2 is correct.

Lemma3

Let CCC be a bivalent configuration of PPP, and let e=(p,m)e=(p,m)e=(p,m) be an event that is applicable to CCC. Let C\mathbb{C}C be the set of configurations reachable from CCC without applying eee, And let D = e (C) = {(e) ∣ e ∈ e C} \ mathbb {D} = e (\ mathbb {C}) = \ {e (e) | e \ \ mathbb in {C} \} D = e (C) = {(e) ∣ e ∈ e C} and eee is applicable to EEE. Then, D\mathbb{D}D contains a bivalent configuration.

Again, we use contradiction, and we assume that D\mathbb{D}D does not have bivalent configuration. Each configuration D∈DD \in \mathbb{D}D∈D is univalent.

I ∈{0,1} I \in \{0,1\} I ∈{0,1}. The existence of EiE_iEi could be 0,1, because CCC was bivalent (Lemma2 proved). There exists Fi∈DF_i \in \mathbb{D}Fi∈D, i.e. FiF_iFi must be univalent. There are two cases :(1) eee occurs after EiE_iEi, then Ei∈CE_i \in \mathbb{C}Ei∈C, there exists Fi=e(Ei)∈DF_i=e(E_i) \in \mathbb{D}Fi=e(Ei)∈D. (2) EEE occurs before EiE_iEi. There exists Fi∈DF_i \ in-mathbb {D}Fi = E (bivalent)∈CF_i = E (bivalent) \ in-mathbb {C}Fi= E (bivalent)∈C. Then you can change from FiF_iFi to EiE_iEi. You can see the picture below to understand more.

Neighbors is a relationship that defines two configurations that change through a step.

Suppose there are two neighbor relations in the configuration C0,C1∈CC_0, C_1 \in \mathbb{C}C0,C1∈C C1 = ‘e’ (C0) C_1 = e (C_0) C1 = ‘e Di remains the same (C0) = e (Ci) ∈ DD_i (C_i) = e \ \ mathbb in {D} Di = (Ci) ∈ e D. Note that DiD_iDi is univalent (we hypothesized earlier). E = (p, m), e ‘) = (p, ‘m’) e = (p, m, e ‘= (p,’ m ‘) e = (p, m, e ‘) = (p, ‘m’).

Case 1: if p’ ≠pp’ \neq pp’ =p, then according to Lemma1 commutative law, D1=e ‘(D0)D_1= E ‘(D_0)D1=e ‘(D0), because there is no intersection between Eee and E ‘e ‘, we can use commutative law. You can see that this violates the assumption that DiD_iDi is univalent, so you went from 0-valent D0D_0D0 to 1-valent D1D_1D1, so that’s not a variant meaning that D0D_0D0 is univalent.

Note that once the configuration becomes i-valent, the decision has been made and can’t be changed.

Case 2: If p’=pp ‘=pp ‘=p, there is intersection, can’t use commutative law. Here we consider a deciding run from C0C_0C0. In this deciding run, we assume PPP is dead.

Consider A deciding run σ\sigmaσ (Process PPP dead), A=σ(C0)A= sigma(C_0)A=σ(C0).

Since PPP in σ\sigma is dead, there is no intersection with PPP in e,e’e,e ‘e,e ‘, and the commutative law can be used.

σ\ Sigma σ is a deciding run, so AAA should be univalent. However, you can see that AAA can become EITHER E0E_0E0 0-VALENT or E1E_1E1 1-valent. That’s not a disguised explanation of AAA being bivalent. So Lemma 3 is true.

In combination with

We can now combine Lemma2 with Lemma3. According to Lemma1, there was a Bivalent Initial configuration C0C_0C0. Also according to Lemma 2, there exists an Event Eee that makes C1= E (C0)C_1= E (C_0)C1= E (C0), and C1C_1C1 is univalent. Then think about it, if we happen to send eEE that can’t make decisions every time, then the configuration will always be univalent and can never reach consensus, that is, consensus algorithm has no termination. Therefore, it does not meet the requirement of consensus algorithm in real sense.

conclusion

There is no consensus algorithm, but people have come up with many ways to make it work. For example, randomness is introduced in Raft and Paxos to avoid repeating the EEE that can’t reach the resolution to a certain extent, so that the algorithm can satisfy termination. FLP Impossibility only proves the existence of a worst case, and the probability of occurrence is not very high in reality. Of course, some people play around with the asynchronous system definition, such as setting up an atomic clock in a computer room to implement asynchronous clock. By understanding the FLP theorem, you can know that there is no eternal panacea for distributed algorithms, there is always a trade-off, which also lays a foundation for the CAP theorem later.

reference

www.the-paper-trail.org/post/2008-0…

Resources. Mpi – inf. MPG. The departments/DE…

www.cnblogs.com/firstdream/…

danielw.cn/FLP-proof

Blog.csdn.net/chen77716/a…

zhuanlan.zhihu.com/p/36325917

Loopjump.com/flp_proof_n…