Introduction to Paxos and Raft protocols

Article source: Tao Teacher operation notes – wechat official account

1. Background

1.1 Distributed Transactions

Distributed transaction refers to the operation of the transaction is located on different distributed system nodes, so the AICD feature of the transaction must be guaranteed. The difficulty with distributed transaction processing is that there must be a way to know everything a transaction does anywhere, and the decision to commit or roll back a transaction must produce uniform results (all commit or all rollback)

Solve this distributed consistency problem: Two Phase Commitment Protocol, Tree Phase Commitment Protocol and Paxos algorithm are well known.

1.2 Phase 2 Submission

In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of other nodes’ operation. When a transaction across multiple nodes, in order to keep the ACID characteristic of transaction, need to introduce a unity as coordinator component to control all the nodes (referred to as participants) operating results and eventually indicating whether the node should submit the operating results are real (for example, the updated data to disk, etc.).

Therefore, the algorithm idea of two-stage submission can be summarized as follows: participants will inform the coordinator of the success or failure of the operation, and then the coordinator will decide whether to submit the operation or stop the operation according to the feedback information of all participants. 2PC is divided into two stages as its name implies, and its implementation ideas can be summarized as follows:

    1. Voting phase: participants inform the coordinator of the results of their actions;
    1. Commit Phase: After receiving the notification from the participant, the coordinator sends the notification to the participant and decides whether to commit or roll back based on the feedback.

2. Paxos agreement

PAXOS can be used to solve the problem of setting a value in a distributed environment. Paxos is based on the communication model of message passing. It assumes that the communication between processes in a distributed system will be lost, delayed, repeated and so on, but will not be mistransmitted. The Paxos algorithm is designed to ensure that a value is agreed between processes based on message passing in such a system.

2.1 the role

Paxos is the first proven consensus algorithm based on two-phase commit and extended. Nodes are divided into three types in the algorithm:

  • Proposer: A proposer that submits a proposal and waits for approval to close the case, usually with the client. Proposal information includes the proposal number and value.
  • Acceptor: The acceptor is responsible for voting on proposals, usually the server. The proposal is chosen if it is accepted by a majority of Acceptors.
  • Learner: Is told the result of the proposal and can only “learn” the approved proposal. Do not participate in the voting process. Both client and server can be used.

Each node can play multiple roles in the protocol.

2.2 Paxos algorithm

  • Segmentfault.com/a/119000000…
  • www.cnblogs.com/hzmark/p/pa…

Paxos features:

  • One or more nodes can make proposals
  • The system must agree on one of the proposals
  • At best, agreement can be reached on a firm proposal
  • As long as more than half of the nodes are alive and can communicate with each other, the entire system must reach a consistent state.

2.2.1 Paxos vote

  • First the proposers sends the value to the Acceptors.
  • Acceptors accept values. Acceptors can respond to acceptances or rejections.
  • Once a majority of responses from the nodes are accepted, a consensus is reached and the accepted value becomes a formal resolution (called a “approval” resolution).

The whole process (a transaction or a Round) is divided into two phases:

  • A)Proposer sends a prepare message (send number) to more than half of n/2+1 acceptors if the prepare matches the protocol rules

  • A) If more than half of acceptors respond to a promise, the Proposer sends an Accept message (containing a true value at that time) to an Acceptor to check whether the accept message matches the rule. If the message matches, the accept request is approved

A proposer terminates a proposal if it detects a proposal with a larger number. This means that presenting a proposal with a larger number terminates the previous proposal process. It is possible to trap live locks, violating Progress requirements. The solution in this case is to elect a leader and only allow the leader to make proposals. Note that a learner may also be a proposer.

2.2.2 Issue of resolution

The obvious way to do this is to send the message to all of the learner when acceptors approve a value. But this approach can result in too many messages. Accept messages can be sent to a subset of learners, who then inform all learners.

However, due to the uncertainty of message transmission, there may be no message that is approved by the resolution. When the learners need to know how a decision has been passed, they can ask a proposer to redo the proposal.

2.3 Paxos and 2 PCS

In the Paxos algorithm, what if we specify that there can only be one leader in a cluster at a time, and all nodes are required to vote? Yeah, we get 2PC. 2PC is a special case of Paxos

The two phases are prepare and commit. The preparation phase deals with which proposal to vote on, and the submission phase deals with confirming the final value.

3. The Raft

  • raft.github.io/raft.pdf
  • thesecretlivesofdata.com/raft/
  • www.cnblogs.com/MaggieLXC/p…

3.1 Raft role

Raft algorithm is a simplified implementation of Paxos algorithm. There are three roles: leader, candidate, and follower.

  • Follow: All nodes start as followers. If they do not receive the leader message, they become candidate.
  • Candidate: The candidate will solicit votes from other nodes and become the leader if he gets the majority of votes. This process is called leader election.
  • Leader: All changes to the system go through the leader first.

3.2 Raft algorithm

Raft has two basic processes:

  • Leader election: Each candidate will propose an election plan at random after a certain period of time, and the candidate with the most votes in the latest stage is elected as the Leader.
  • Synchronize log: The leader finds the most recent record in the system log (occurrence record of various events) and forces all follows to refresh to this record.

Raft algorithm process: there are three states: leader, candidate and follower.

  1. They all start off in the follower state.

  2. The followers become candidate if they do not hear leader

  3. candidate then requests votes from other nodes

  4. nodes will reply with their vote

  5. Candidate becomes the leader if it gets votes from a majority of nodes

  6. This process is called Leader Election.

  7. All changes to the system now go through the leader.

  8. Each change is added as an entry in the node’s log.

  9. This log entry is currently uncommitted so it won’t update the node’s value.

  10. To commit the entry the node first replicates it to the follower nodes…

  11. then the leader waits until a majority of nodes have written the entry.

  12. The entry is now committed on the leader node and the node state is “5”.

  13. The leader then notifies the followers that the entry is committed.

  14. The cluster has now come to consensus about the system state.

  15. This process is called Log Replication.

3.2.1 Leader Election

3.2.2 the Log Replication

Reference:

  • Raft demo http://thesecretlivesofdata.com/raft/
  • Segmentfault.com/a/119000000…
  • www.cnblogs.com/hzmark/p/pa…
  • Mp.weixin.qq.com/s/CmaxRM_U9…
  • Mp.weixin.qq.com/s/AHVMykAci…
  • Blog.csdn.net/westbrookli…
  • Juejin. Cn/post / 684490…
  • www.jianshu.com/p/a75dec696…
  • www.cnblogs.com/MaggieLXC/p…