Burrows, the designer and developer of Google’s coarse-grained locking service Chubby, once said that “all consistency protocols are essentially Either Paxos or a variant of it.”

In the process of learning ZooKeeper, you can often see this sentence. Since this algorithm is so magical, of course, it is necessary to study it well. There are many articles on the web explaining the Paxos algorithm, but the quality is uneven. Therefore, after reading relevant papers, encyclopedias, blogs and other materials, the author tries to clearly describe the Paxos algorithm through simple language.

What is Paxos

Paxos algorithm is a highly fault-tolerant consistency algorithm based on message passing. It is recognized as one of the most effective algorithms to solve distributed consistency problems.

Paxos algorithm is a distributed consistency algorithm based on message passing proposed by Master Lamport, which won the 2013 Turing Award.

Since the advent of Paxos has continued to monopolize distributed consistency algorithms, Paxos is almost synonymous with distributed consistency. Many of Google’s large distributed systems, such as Chubby, Megastore, and Spanner, use the Paxos algorithm to solve the distributed consistency problem. The open-source ZooKeeper and MySQL Group Replication, introduced in MySQL 5.7 to replace the traditional master-slave Replication, have adopted the Paxos algorithm to solve the distributed consistency problem. However, it also has two obvious disadvantages: 1. It is difficult to understand, 2.

The background of the problem

In a common distributed system, there are always problems such as machine downtime or network anomalies (including messages delayed, lost, repeated, out of order, and network partitions). The problem that Paxos algorithm needs to solve is how to quickly and correctly reach agreement on the value of a certain data within the cluster in a distributed system where the above exceptions may occur, and ensure that no matter the occurrence of any of the above exceptions, the consistency of the whole system will not be destroyed.

Note: the value of a piece of data is not just a number in the narrow sense. It can be a log or a command. The value of a certain data has different meanings according to application scenarios.

Ii. Related concepts

In the Paxos algorithm, there are three roles:

  • Proposer
  • Acceptor (NPC deputy)
  • Learners (广大群众)

It is important to note that in the implementation of a specific algorithm, not one process can only play one role, it may play multiple roles at the same time. For example, a process is either Proposer or Acceptor or Learner.

There is also a very important concept called a Proposal. The final value to be agreed upon is in the proposal.

Note:

  • What does the proposal involve? Does it only include an informational value? Let’s continue to read. At present, we think it is only an ordinary value.

For the first time meet

The Paxos algorithm process is very similar to the legislative process in China (four stages are proposed, deliberated, voted and published), and the so-called proposal is the newly enacted law.

A Proposer can propose a Proposer. Accoptor accepts proposals; If a proposal is chosen, then the value in the proposal is chosen.

If a Proposer, Acceptor, or Learner agrees on a value for a data numbered with a request, they each agree that the same value is chosen. When does a Proposer, Acceptor, or Learner consider a value selected?

  • Proposer: if a Proposer sends a proposal that is accepted by acceptors (if it requires a majority of acceptors to accept it), then it considers the value of the proposal to have been selected.
  • An Acceptor considers a value to be selected if an Acceptor accepts a proposal.
  • Learner: As a Learner, an Acceptor tells a Learner which value is selected, and the Learner assumes that value is selected.

3. Problem description

Suppose there is a set of processes (proposer’s team) that can propose a value, a consistency algorithm needs to ensure that only one value of the same value is chosen among so many values proposed. That is to say, either no value is proposed, as long as a value is proposed and selected, the value that everyone eventually learns must be consistent. For the consistency algorithm, the security (SAFaty) requirements are as follows:

  • Only the proposed value can be selected.
  • Only one value is selected.
  • If a process thinks a value is selected, then the value must be the selected value.

The goal of Paxos is to ensure that eventually a value is selected, and that when a value is selected, the process eventually retrieves the selected value.

As the saying goes, where there is demand, there are bad problems. If you assume that different roles can communicate by sending messages, then:

  • Each role communicates with execution at an arbitrary speed, during which execution can be stopped or restarted for various reasons. When a value is selected, roles that recover due to a fault are unable to determine the selected value because they have lost some important information.
  • Messages may be delayed for any length of time during delivery, may be repeated, and may be lost. But messages cannot be corrupted, that is, message content cannot be tampered with (Byzantine general problem).

The above are all possible problems, how to solve??

4. Derivation process

The simplest solution is to have only one Acceptor

If there is only one Acceptor (it can have more than one Proposer), the proposal is selected and the value in the proposal is the value selected if the Acceptor accepts the first proposal it receives. This ensures that only one value will be selected.

However, if this unique Acceptor fails, the entire system cannot work!

Therefore, one Acceptor is not feasible, multiple acceptors are required!

Multiple Acceptor

If there are more than one acceptors, how to ensure that a value is selected if there are more than one acceptors

You can think about it for yourself.

First, our ultimate goal is that no matter how many proposers present proposals, one and only one value is selected.

So, we can define a constraint:

P1: An Acceptor must accept the first proposal it receives.

If each Proposer presents a different proposal with a different value and sends a proposal to a different Acceptor. According to the P1 constraint, each Acceptor accepts the first proposal it receives, and different values are selected, resulting in inconsistencies.

The value of a proposal is selected as long as it is accepted by an Acceptor. Therefore, we need to add a rule:

Rule: a proposal selected must be accepted by more than half of acceptors

If a proposal is accepted by a majority of acceptors, “an Acceptor must be able to accept more than one proposal!” Otherwise, no value may be selected. Like the picture above. V1, V2, and V3 were not selected because they were accepted by only one Acceptor and not by more than half of acceptors.

If a Proposer sends multiple proposals to an Acceptor, it uses a numbered number to identify the order in which the proposals are submitted. Now [proposal = Proposal number +value].

Although multiple proposals are allowed to be selected, you must ensure that all selected proposals have the same value. Otherwise there will be inconsistency again.

P2: If a proposal with value v is selected, then the value of each selected proposal with a higher number must also be V.

A proposal can only be selected if an Acceptor accepts it, so we can rewrite the P2 constraint P2a to the proposal accepted by an Acceptor.

P2a: If a proposal with value v is selected, then the value of each proposal with a higher number that is accepted by acceptors must also be V.

If you satisfy P2a, you satisfy P2.

However, consider the following situation: In the background of the legislative process, suppose that there are a total of five (acceptors) deputies to the National People’s Congress. The people’s Court (Proposer2) put forward [M1,V1] proposal, NPC deputies 2-5 (more than half) accepted the proposal, so for NPC deputies 2-5 and the people’s court, they all think that V1 proposal is selected. At this time, NPC Deputy 1 also participated in it after finishing other affairs (before, NPC Deputy 1 had not received any proposal). At this time, The Supreme People’s Procuratorate (another proposer Proposer1) sent the proposal of [M2,V2] to NPC Deputy 1 (V2≠V1 and M2>M1). For NPC Deputy 1, This is the first proposal it has received. According to P1 (an Acceptor must accept the first proposal it receives. , NPC deputy 1 must accept the proposal! At the same time, NPC deputy 1 thinks THAT V2 has been selected. This raises two questions:

  1. NPC representative 1 thinks V2 is selected, NPC Representative 2-5 and the People’s Court think V1 is selected. There is an inconsistency.
  2. V1 is selected, but the proposal [M2,V2] with a higher number accepted by NPC Representative 1 has a value of V2, and V2≠V1. This contradicts P2a (if a proposal with value v is selected, then any proposal with a higher number that is accepted by acceptors must also have value V).

So, we need to strengthen the P2a constraint!

P2a is a constraint to Acceptor accept proposal, but the proposal was put forward by Proposer, all we can to constraint the proposal put forward by the Proposer. Get P2b:

P2b: If a proposal with value v is selected, then any subsequent proposals with a higher numbered numbered must also have value V.

If a Proposer with a value of V is selected, how do you ensure that the proposals with a higher number are numbered with a value of V?

As long as it satisfies P2c:

P2c: For any N and V, if the proposal [N, V] is proposed, there exists a set S of more than half of acceptors that satisfies either of the following two conditions:

  • No Acceptor in S has accepted a proposal numbered less than N.
  • The value of the proposal with the largest number accepted by an Acceptor in S is V.

A Proposer generates a proposal

To satisfy P2b, there is an important idea: if a Proposer generates a proposal, it should “learn” a value that has been selected or is likely to be selected, and then use that value as the value of its proposal. If no value is selected, the Proposer can determine the value itself. That’s how we reach an agreement. This learning phase is achieved through a “Prepare request” **.

Thus, we obtained the following proposal generation algorithm:

  1. A Proposer selects a new proposal number N, and then sends a request to a set of acceptors (more than half) asking each of those acceptors to make the following response.

    (a) A promise to a Proposer that it will not accept any more proposals numbered less than N.

    (b) If an Acceptor has accepted a proposal, it responds to a proposal with a maximum number less than N that it has accepted.

    We call this a Prepare request numbered N.

  2. If the Proposer receives a response from more than half of the acceptors, it generates a proposal numbered N and Value V. Here V is the Value of the proposal with the largest number of all the responses. If there is no proposal in any of the responses, then V can then be selected by the Proposer itself (usually the current proposal).

    When a Proposer generates a proposal, it sends it to more than half of the acceptors it expects to accept it. We call this request an Accept request. (Note: The Acceptor set that accepts the Accept request is not necessarily the same set of acceptors that responded to the Prepare request.)

Acceptor accepts the proposal

Acceptors can ignore any requests, including Prepare and Accept, without compromising the security of the algorithm. Therefore, we will discuss when acceptors can respond to a request.

We impose the following restrictions on Acceptor acceptance of proposals:

P1a: An Acceptor may accept a Prepare request numbered N as long as it has not yet responded to any Prepare request numbered greater than N.

If an Acceptor receives a Prepare request with number N, it has previously responded to a Prepare request with number greater than N. According to P1a, this Acceptor cannot accept a proposal numbered N. Therefore, the Acceptor can ignore the Prepare request with number N. Of course, it can also respond with an error to let a Proposer know as early as possible that its proposal will not be accepted.

Therefore, an Acceptor only needs to remember: 1. The maximum number of proposals accepted 2. The maximum number of requests that have been responded to.

Paxos algorithm description

After the above derivation, we summarize the process of Paxos algorithm.

The Paxos algorithm is divided into two stages. Details are as follows:

  • Phase 1:

    (a) a Proposer selects a proposal number N and sends a Prepare request numbered N to more than half of the acceptors.

    (b) If an Acceptor receives a Prepare request numbered N that is greater than the number of Prepare requests it has responded to, it sends the highest-numbered proposal (if any) it has accepted as a response. The Acceptor also promises not to accept any proposal numbered less than N.

  • Stage 2:

    (a) If the Proposer receives a response to a Prepare request numbered N from more than half of its acceptors, then it sends an Accept request to the proposal numbered N to more than half of the acceptors (not necessarily the same as the previous acceptors). Note: V is the value of the highest-numbered proposal in the response received. If the response does not contain any proposals, then V is decided by the Proposer itself.

    (b) If an Acceptor receives an Accept request for a proposal numbered N, it accepts the proposal as long as the Acceptor has not responded to a Prepare request numbered greater than N.

5, Learner is learning the selected value

There are three ways to learn the selected value:

Solution a:

Acceptor receives a proposal and sends it to all Learners.

  • Advantages: Learner can quickly obtain the selected value

  • Disadvantages: The number of communication is M*N (M is the number of proposals, N is the number of Learner)

Scheme 2:

When an Acceptor accepts a proposal, it sends the proposal to the primary Learner, who in turn notifies other learners

  • Advantages: Reduced communication times (M+ n-1) (M is the number of proposals, N is the number of Learner, M proposals are sent to the main Learner, and then the main Learner notifies N-1 Learner)

  • Disadvantages: single point of failure (main Learner may fail)

Solution 3:

When an Acceptor accepts a proposal, it sends the proposal to a Learner group, which notifies other learners

  • Advantages: It solves the problem of single point failure of scheme two, and has good reliability

  • Disadvantages: complex implementation, high complexity of network communication

Vi. How to ensure the activity of Paxos algorithm

A primary Proposer is selected to ensure the activity of the Paxos algorithm. A Proposer selects a primary Proposer and states that only the primary Proposer can submit a proposal. If a principal Proposer reports a numbered proposal with more than a majority of acceptors, that proposal will be approved if the principal Proposer does so, and the whole Paxos algorithm will remain active by selecting a principal Proposer. So far, we have obtained a distributed consistency algorithm that can guarantee both security and activity — Paxos algorithm.

Seven,

Here, we have made a detailed elaboration on what Paxos algorithm is, its characteristics and the specific derivation process of the algorithm. Paxos algorithm is the variation of many consistency algorithms, which is worth learning. Interested friends can focus on like, thank you.

The resources

  • Encyclopedia of the Paxos

  • Paxos Made Simple

  • Principle and derivation of Paxos algorithm