Distributed theory: Consistency algorithm Paxos

What problem does Paxos solve?

In a common distributed system, there are always situations such as machine downtime or network exceptions. The problem that Paxos algorithm needs to solve is how to do in a distributed system where the above exceptions may occur:

Quickly and correctly agree on the value of a piece of data within the cluster, and ensure that any of the above exceptions do not break the consistency of the entire system

Here’s an example:

Distributed systems use multiple copies to store data. If the execution sequence of multiple copies is not controlled, multiple copies perform update operations. Data in each copy is inconsistent due to faults such as network delay timeout. We want the execution sequence of each copy to be [OP1 op2 op3…. opn] unchanged, the same. Paxos determines the value of immutable variable OPI once. After determining OPI each time, each copy performs OPI operation and so on.

Problems are introduced, and everything starts with a small thing

In A clustered environment, the state on all machines is required to be consistent, and two machines want to change the state. Machine A wants to change the state to A, and machine B wants to change the state to B.

Isn’t it simple? No, it’s like 2PC, 3PC bringing in a facilitator, who gets there first, who listens

So what if the coordinator jumps?

So you need to make backups of the coordinator as well as the cluster. At this point, the question arises, so many coordinators, who to listen to?

To solve this problem, we need to use the PaxOS algorithm

Paxos related concepts

What is the Paxos algorithm

Paxos was first disclosed by Lamport in his paper The Part-time Parliament in 1998. The initial description used Paxos, an island in Greece, as a metaphor to describe The process of passing resolutions in Paxos, and named this algorithm after it, but few people could understand it. And he refused to use mathematics to prove his algorithm. Then Microsoft’s Butlet Lampson offered to re-examine the paper. Later, in 2001, Lamport Made a concession to Paxos Made Simple, but still didn’t use an algorithm to prove his algorithm

Basic Concept – Proposal

The final value to be agreed upon is in the proposal

The Proposal information includes the Proposal ID and Value.

Basic Concepts -4 Roles

  • Client: indicates the Client

    • The client issues a request to the distributed system and waits for a response. For example, a write request to a file in a distributed file server.
  • Proposer: the Proposer that initiates a proposal

    • The proposer advocates client requests, tries to persuade acceptors to agree on them, and acts as a mediator to move the agreement forward in the event of a conflict
  • Acceptor: a decision maker who can approve a proposal

    • Acceptors accept proposals. If a proposal is chosen, then the value in the proposal is chosen
  • Learners of Final decision making

    • Learners act as replicators of this protocol

If there are any instances of a Proposer, Acceptor, and Learners, a process may perform more than one role

A Proposer makes a Proposal, an Accepter receives the Proposal, and then a final Proposal is selected among accepters

Problem description

Assuming that there is a set of processes that can propose a proposal, the following points need to be guaranteed for a consistency algorithm:

  • Of these proposed proposals, only one will be chosen
  • If no proposal is put forward, there should be no proposal chosen.
  • Once a proposal is selected, all processes should be able to learn to the selected value

Derivation process

The simplest solution is to have only one Acceptor

If there is only one Acceptor (it can have more than one Proposer),If an Acceptor accepts the first proposal it receives, the proposal is selectedThe value in the proposal is the selected value. This ensures that only one value will be selected. However, if this only AcceptoroutageThen the whole system won’t work! Therefore, multiple acceptors are required!

Multiple proposers and multiple acceptors

How do we guarantee that in the case of multiple proposers and acceptors, we select a value? First we hope that even if only one Proposer presents a value, that value is eventually selected.

P1: An Acceptor must accept the first proposal that it receives.

However, this raises another issue: if each Proposer submits a different value to a different Acceptor. According to P1, acceptors separately accept the first proposal they receive, resulting in different values being selected. There is an inconsistency. The diagram below:Because of this inconsistency, we’re going to add a rule

Rule: a proposal selected must be accepted by more than half of acceptors

This rule implies that “an Acceptor must be able to accept more than one proposal!” Otherwise, no value may end up being selected. Take the picture above. V1, V2, and V3 have not been selected because they are accepted by only one Acceptor.

In this case, we use a global number to identify each Acceptor’s approved proposal. If a proposal with a value is approved by more than half of acceptors, the value is considered selected.

According to the above, we now allow multiple proposals to be selected, but we must ensure that all selected proposals have the same value. Otherwise there will be inconsistency again. Hence the following constraints:

P2: If a proposal with value v is selected, then the value of each selected proposal with a higher number must also be V. 【If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v.】

A proposal can only be selected if an Acceptor accepts it, so we can rewrite the P2 constraint P2a to the proposal accepted by an Acceptor.

P2a: If a proposal of value V is selected, If a proposal with value v is chosen, the value of each proposal with value v is chosen. Then every higher-numbered proposal accepted by any acceptor has value V. 】

If you satisfy P2a, you satisfy P2.

However, consider the following situation: suppose there are a total of five acceptors.

  1. Proposer2 puts forward [M1,V1] proposals,
  2. Acceptor2~5 (more than half) accepted the proposal
  3. Thus, for acceptor2-5 and Proposer2, V1 is considered selected.
  4. Acceptor1 has just recovered from an outage (Acceptor1 has not received any proposals before)
  5. At this point Proposer1 sends an Acceptor1 proposal for [M2,V2] (V2≠V1 and M2>M1)
  6. This is the first proposal received by Acceptor1. According to P1 (an Acceptor must accept the first proposal it receives

Case). Acceptor1 must accept this proposal! Acceptor1 also considers V2 to be selected. This raises two questions:

(1) Acceptor1 considers V2 to be selected, and acceptor2-5 and Proposer2 consider V1 to be selected. There is an inconsistency.

(2) V1 is selected, but the higher number of the Acceptor1 proposal [M2,V2] has value V2, and V2≠V1. This contradicts P2a (if a proposal with value v is selected, then any proposal with a higher number that is accepted by acceptors must also have value V).

So we need to strengthen the P2a constraint!

P2a is a constraint to Acceptor accept proposal, but the proposal was put forward by Proposer, all we can to constraint the proposal put forward by the Proposer. Get P2b:

P2b: If a proposal with value v is selected, then any subsequent proposals with a higher numbered numbered must also have value V. 【If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value.】

From P2b you can derive P2a and then P2.

If a Proposer with a value of V is selected, how do you ensure that the proposals with a higher number are numbered with a value of V?

As long as it satisfies P2c:

P2c: For any Mn and Vn, if the proposal [Mn,Vn] is proposed, then there must be a set S consisting of more than half of acceptors that satisfies either of the following two conditions:

  • Either no Acceptor in S has accepted a proposal numbered less than Mn.
  • The value of the proposal with the largest number approved by all acceptors in S is Vn

P2c:For any v and n, if a proposal with value v and number n is issued,

then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S.

From the above content, it can be seen that the process from P1 to P2c is actually a series of conditions gradually enhanced. If it is necessary to prove that these conditions can guarantee uniformity, then it can be reversed: P2c =>P2b=>P2a=>P2, and then P2 and P1 are used to ensure consistency

A Proposer generates a proposal

Next, how to generate proposals on the basis of P2c

An important idea here is that before a Proposer generates a proposal, it should “learn” a value that has been selected or is likely to be selected, and then use that value as the value of its proposal. If no value is selected, the Proposer can determine the value itself. In this way, we can achieve consistency. This learning phase is achieved through a “Prepare request.”

Thus, we obtained the following proposal generation algorithm:

  1. A Proposer selects a new proposal number N and then sends a request to a set of acceptors (more than half) requesting each of the acceptors in that set

The Acceptor makes the following response

(a) An Acceptor undertakes to a Proposer not to accept any more proposals numbered less than N. (b) If acceptors have accepted proposals, they report to the Proposer the number of proposals numbered less than N that have been accepted.Copy the code

We call this a Prepare request numbered N.

  1. If the Proposer receives a response from more than half of the acceptors, it generates a proposal numbered N and Value V. here

V is the Value of the proposal with the largest number of all responses. If there are no proposals in any of the responses, then V can then be selected by the Proposer itself.

When a Proposer generates a proposal, it sends it to more than half of the acceptors it expects to accept it. We call this request an Accept request.

Acceptor accepts the proposal

If an Acceptor accepts a proposal (Proposer) with a Paxos algorithm, it does not respond to a request for proposals

If an Acceptor receives a Prepare request and an Accept request from a Proposer, the conditions for responding to those requests are described below

  • Prepare request: Acceptors can respond to a Prepare request at any time
  • Accept requests: You can respond to any Accept request without violating Accept’s existing commitments

Therefore, the following constraints apply to Acceptor acceptances:

P1a: An Acceptor may accept a Prepare request numbered N as long as it has not yet responded to any Prepare request numbered greater than N.

Algorithm to optimize

A Proposer numbered a Proposer numbered with a unique number (Proposer number) and a proposal numbered with an Acceptor number (Proposer number) is numbered with a single number (Proposer number). Ignore Prepare requests as much as possible

If an Acceptor receives a Prepare request with number N, it has previously responded to a Prepare request with number greater than N. According to P1a, this Acceptor cannot accept a proposal numbered N. Therefore, the Acceptor can ignore the Prepare request with number N.

With this optimization, each Acceptor only needs to remember the maximum number of proposals it has approved and the maximum number of proposals it has prepared to respond to, so that it is guaranteed a P2c invariability in the event of a failure or restart of a node. As long as it can guarantee that no proposal with the same number will be produced, it can discard any proposal and all of its runtime state information

Paxos algorithm description

If Paxos processes proposals with acceptors (Proposer), then the algorithm performs a phase 2 submission (PHASE 2). If Paxos processes proposals with acceptors (Proposer), the algorithm performs a phase 2 submission (PHASE 2)

The Paxos algorithm is divided into two stages. Details are as follows:

  • Phase 1:

    • (a) a Proposer selects a proposal number N and sends a Prepare request numbered N to more than half of the acceptors.

    • (b) If an Acceptor receives a Prepare request numbered N and N is greater than all Prepare requests the Acceptor has responded to

    The Acceptor responds to the Proposer with the largest numbered proposal (if any) it has accepted, and promises not to accept any proposal numbered less than N.

  • Stage 2:

    • (a) If the Proposer receives a response from more than half of its acceptors to a Prepare request numbered N, it sends a needle

    Accept requests to [N,V] proposals are received by more than half of acceptors. Note: V is the value of the highest-numbered proposal in the response received. If the response does not contain any proposals, then V is decided by the Proposer itself.

    • (b) If an Acceptor receives an Accept request for a proposal numbered N, it accepts the proposal as long as the Acceptor has not responded to a Prepare request numbered greater than N.

Of course, each Proposer may produce more than one proposal in practice, but if each Proposer follows the algorithm described above, it is a guarantee that the algorithm is executed correctly

Learner learns the selected value

Solution a:

Learner receives a selected proposal only if the proposal has been approved by more than half of acceptors. Therefore, it is easiest for an Acceptor to send a proposal to all acceptors once a proposal has been approved. This allows Learner to receive the selected proposal as quickly as possible, but requires each Acceptor to communicate with all learners one by one, and the number of communication times is at least the product of both

Scheme 2:

Another possible solution is to have all acceptors send their acceptances to a specific Learner (called the primary Learner). Each Learner can communicate with each other through messages to perceive the proposal selection. Based on this premise, When the primary Learner is notified that a proposal has been selected, it notifies other learners. In this scheme, Acceptor sends the approved proposal to the primary Learner, which then synchronizes it with the other Learner. Therefore, compared with scheme 1, scheme 2 requires one more step to notify all the learner of the proposal, but the number of communication is greatly reduced, usually just the sum of Acceptor and Learner, but at the same time, the scheme introduces a new unstable factor: the main learner may fail at any time

Solution 3:

In the explanation of scheme two, we mentioned that the biggest problem of scheme two is that the main Learner has a single point of problem, that is, the main Learner may appear at any time, therefore, the improvement of scheme two, can expand the scope of the main Learner, That is, acceptors send approved proposals to a specific set of learners, in which each Learner notifies the others when a proposal is selected. The more there are in the Learner set, the better the reliability is, but at the same time the complexity of network communication is higher

How to ensure the activity of Paxos algorithm

According to the previous explanation, we have basically understood the core logic of Paxos algorithm. Now let’s take a look at some details of Paxos algorithm in the actual process: What will happen eventually: Value must be selected eventually

Suppose there is an extreme case where two proposers present a sequence of proposals with increasing numbers, resulting in an infinite loop with no value selected, and the process is as follows:

Here’s a detailed description of the scenario:

  • Sponsor 1 issuedNumbers for 1After receiving more than half of the Prepare request, complete stage 1 process –> Decision maker cluster guaranteeNo longer accept numbers less than 1The proposal of 】
  • Sponsor 2 issuedNumbers for 2After receiving more than half of the Prepare request, complete stage 1 process –> Decision maker cluster guaranteeNo longer accept numbers less than 2The proposal of 】
  • When proposer 1 enters phase 2, the Accept request sent by proposer 1 is ignored by acceptors
  • Sponsor 1 issuedNumber 3After receiving more than half of the Prepare request, complete stage 1 process –> Decision maker cluster guaranteeNo longer accept numbers less than 3The proposal of 】
  • When proposer 2 enters phase 2, the Accept request sent by the Acceptor is ignored
  • . It goes into an endless loop

(a) if a Proposer selects a master Proposer and states that only a master Proposer can submit a proposal. If a principal Proposer reports a numbered proposal with more than a majority of acceptors, that proposal will be approved if the principal Proposer does so, and the whole Paxos algorithm will remain active by selecting a principal Proposer

Distributed theory: Consistency algorithm Raft

What is Raft algorithm

First, what is Raft algorithm: Raft is a consistency algorithm for managing replication logs. Raft provides the same functionality and performance as the Paxos algorithm, but its algorithm structure is different. Raft algorithms are easier to understand and easier to build real-world systems.

Raft breaks down the consistency algorithm into 3 modules

  1. Leadership election
  2. Log copy
  3. security

Raft algorithm is divided into two phases, first the election process and then normal operations such as log copying, led by an elected leader.

Leader election

Approach: Raft achieves consistency by electing a leader and then giving him full responsibility for managing the replication log. In Raft, at any time a server can play one of the following roles:

  • Leader: Handles client interactions, log replication, and other actions. Generally, there is only one leader at a time
  • Candidate: A candidate is an entity that nominates itself during the election process and, once elected, becomes the leader
  • Followers: Similar to voters, completely passive roles, such servers wait to be notified to vote

And it was the election that influenced their change of identity.

Raft uses a heartbeat mechanism to trigger an election.

  • When the server is started, the initial state is follower.
  • Each server has an election timeout timer (typically 150-300ms).
    • If you receive any message from the leader or candidate within the timeout period, restart the timer
    • If it reaches the timeout and hasn’t received any messages from other leaders, it will assume that there is no leader and start an election, and it will start sending messages to other servers asking them to vote for it.

Thesecretlivesofdata.com/raft/ demo

Leadership selection process

This process is illustrated below:

In the initial state, all nodes in the cluster are in the follower state.

At some point, one of the followers initiates an election timeout because it has not received the leader’s heartbeat.

As long as more than half of the nodes in the cluster accept the vote, the candidate node becomes the switch-leader state.

After becoming the leader node, the leader periodically synchronizes logs to the follower node and sends the heartbeat.

nodes

The state of each node in the cluster can change at any time. In terms of classification of actual changes, node anomalies can be roughly divided into four types:

  • The leader is unavailable.
  • Follower unavailable;
  • Multiple candidates or leaders;
  • A new node is added to the cluster.
Leader is not available

Here’s how the RAFT cluster copes when the leader node in the cluster becomes unavailable.

Trained: Normally, the leader node sends the heartbeat periodically to the follower node.

: : Failure of the leader to send the heartbeat due to some exception or failure of the follower to receive the heartbeat.

Primary election of a follower.

Such a node becomes the new leader if more than half of the followers voteThe number of steps increases by 1And start to log the same step to the follower.

If, after a period of time, the previous leader rejoins the cluster,Then the two leaders compare the number of steps of each other, the leader with a low step number will switch its state to follower.

Such that an inconsistent log in an earlier leader is cleaned up and is consistent with the log in the existing leader.

The follower node is unavailable

The follower node failure is relatively easy to resolve. Because the log content in the cluster is always synchronized from the leader node, as long as the leader node

You only need to copy the logs from the leader node when you join the cluster again.

Trained: a follower node in a cluster fails to synchronize logs and receive heartbeat

Preparation: The original follower node rejoins the cluster after a period of time. He was confused. What was going on? Who am I? Where am I?

The log of this node will be synchronized from the current leader. Just take the current monarch as king, nothing else

Multiple candidates or leaders

Multiple candidates or leaders in a cluster are usually the result of poor data transfer. It is relatively rare to have multiple leaders, but multiple candidates are more likely to appear in the “chaotic” period when the leader has not been selected at the initial stage of cluster node startup.

Initial condition: All nodes in a cluster are in follower state. 【 Slash-and-burn, live and work 】

Primary election of two candidates. At the end of the Eastern Han Dynasty, the armies were divided.

: : Both candidates received only a small number of votes from followers. One master and one servant walk the world.

Candidate continued to ask other followers. Every man is responsible for the rise and fall of the world.

Prepared: some followers had already voted, so they rejected. 【 My mind is made up, master need not say more 】

A candidate may be asked to vote for a candidate. Cao Cao and Liu Bei met at a banquet

A candidate who has the same number of steps will refuse to accept a request from another candidate. It is better to die when life is a disgrace.

Candidate: If the leader is not elected the first time, the candidate will randomly choose a waiting interval (150ms ~ 300ms) to vote again. Make a comeback

Such candidate becomes the leader if accepted by more than half of the followers in the cluster. 【 Near water tower first get the month, sunny flowers and trees easy for spring 】

Prepared to vote again. Every man is responsible for the rise and fall of the world.

Candidate receives a rejected vote because the leader has been elected in the cluster. 【 My mind is made up, master need not say more 】

Such candidate node terminates the voting request, switches to followers, and synchronizes logs from the leader node after being rejected by the majority of nodes. Life in sometimes must have, life in no time don’t force 】

Log replication (for data consistency)

Log replication process

After the Leader is selected, the client requests are received. The Leader adds the request to its Log as Log entries and then initiates a parallel AppendEntries RPC copy of the Log entry to the other server. When the log is copied to most servers, the Leader applies the log to its state machine and returns the execution results to the client.

The following figure shows the entire process when a client sends a request to the leader and the leader copies it to the follower.

  • Each request from the client contains instructions to be executed by the replicated state machine.
  • The leader adds this instruction to the log as a new log entry, and then initiates the RPC in parallel to the other servers to replicate it

Pieces of information.

  • The follower responds to the ACK, and if the follower crashes or runs slowly or loses packets, the leader retries until all the followers are finished

All log entries are copied.

  • All followers are told to submit the log, and the leader submits the log to his state machine and returns it to the client.

You can see

The whole thing doesn’t happen until step four. If any step fails, log consistency is not affected

Refer to the reference

  • Question of the Byzantine general
  • Paxos – simple papers
  • Raft paper
  • Raft Animation understands the basic concepts
  • Raft Animation understands elections
  • infoq raft-consensus-algorithm
  • Microsoft’s hall of fame