The original link: https://thesquareplanet.com/blog/students-guide-to-raft/

Students’ Guide to Raft

Public id: Rust

Lab2 Guide for MIT6.824 Distributed Systems Is a guide for LAB2 instruction in MIT6.824 Distributed Systems. This guide is a guide for LAB2 instruction in MIT6.824 Distributed Systems. This article was published on March 16, 2016. The translation is divided into three parts, until the end of this paper.

Applications on Top of Raft

When building a service on Raft (such as the key/value store in the second 6.824 Raft experiment), the interaction between the service and the Raft log cannot be guaranteed to be correct. This section details some aspects that may be useful to you during the development process as you build your application.

Applying Client Operations

You may be confused about how to implement an application as a copy log. You might start by having your service, as soon as it receives a client request, send that request to the leader, wait for the Raft application to process the client request, and then return it to the client. This works fine on a single-client system, but does not work on concurrent clients.

Instead, the service should be constructed as a state machine in which client operations transition the state of the state machine from one state to another. You should have a loop that takes only one client action at a time (in the same order on all servers — this is where Raft comes into play) and applies it to the state machine in turn. This loop should be the only place in your code that touches application state (corresponding to key/value in 6.824). This means that your client-facing RPC method should simply submit the client-side action to Raft and wait for the action to be applied by the “applier loop”. Only when the client command appears should it be executed and the return value read. Remember, this includes read requests.

This brings up another problem: how do you know when a client operation has finished? In the absence of failures, this is easy — you just wait for what you put in the log to return (that is, pass to apply()). When a return occurs, you return the result to the client. But what happens if there is a failure? For example, when the client first contacts you, you may already be the leader, but another node is elected, and the client request put to you in the log is discarded. Obviously, you need to ask the client to try again, but how do you know when to tell them the wrong information?

A simple way to solve this problem is to log the position of the client request in the Raft log when you insert it into the Raft log. Once the index action has been sent to apply(), you can tell whether the client request was successful based on whether the action that appears in the index is the one you put there. If not, a fault was sent and an error should be returned to the client.

Duplicate Detection

Once you get the client to retry the operation when it encounters an error, you need some kind of redetection mechanism — if a client sends an APPEND to your server, doesn’t get a reply, and then resends it to the next server, your apply() function needs to make sure the APPEND isn’t executed twice. To do this, you need some kind of unique identifier for each client request so that you can identify whether a particular operation has been seen or, more importantly, applied in the past. In addition, this state needs to be part of your state machine so that all your Raft Servers eliminate the same duplication. There are many ways to assign such identifiers. A simple and relatively efficient approach is to give each client a unique identifier and then have them (each client) mark each request with a monotonically increasing sequence number. If a client resends a request, it reuses the same sequence number. Your server keeps track of the latest sequence number for each client it sees and simply ignores any operations that have already been seen (processed).

Hairy corner-cases

If your implementation follows the general outline given above, you are likely to encounter at least two subtle problems that may be difficult to identify without careful debugging. To save you time, here are some questions:

Re-appearing Indices: Let’s say your Raft library has a Start() method that takes a command and returns the index of that command in the log (so you know when to return it to the client, as described above). You might assume that you will never see Start() return the same index twice, or, at the very least, if you see the same index twice, the first command to return the index must fail. As it turned out, neither of those things was true, even if the server didn’t crash.

Consider the following scenario, with five servers,S1 through S5. Initially, S1 is the leader and its log is empty.

  1. Two client operations (C1 and C2) arrive at S1

  2. Start() returns 1 for C1 and 2 for C2

  3. S1 emits AppendEntries to S2 that include C1 and C2, but all other information is lost

  4. S3 carry change candidate

  5. S1 and S2 will not vote for S3, but S3, S4 and S5 will all vote, so S3 becomes the leader

  6. Another client makes a request, and C3 goes to S3

  7. S3 calls Start() (this method returns 1)

  8. S3 sends AppendEntries to S1, which discards C1 and C2 from the log and adds C3

  9. S3 failed before sending AppendEntries to other servers

  10. S1 carries and is chosen as leader because its logs are up to date

  11. Another client initiates a request, C4, to S1

  12. S1 calls Start(), which returns 2(which is also the result of Start(C2))

  13. All AppendEntries of S1 are discarded and S2 carries

  14. S1 and S3 will not vote for S2, but S2, S4 and S5 will all vote for S2, so S2 becomes the leader

  15. A client requests C5 to S2

  16. S2 calls Start(), which returns 3

  17. S2 successfully sent AppendEntries to all servers, and S2 reported back to all servers by including an updated leaderCommit = 3 in the next heartbeat message. Report returns index successfully committed in the last AppendEntries.)

Since S2’s log is [C1 C2 C5], this means that the entry submitted with index 2 (and applied to all servers, including S1) is C2. Although C4 is the latest client operation on S1 that returns index 2.

The four-way deadlock: All thanks to Steven Allen, another ta at 6.824. It found the following four nasty types of deadlocks that are easy to encounter when building Raft based applications.

Your Raft code, even though it’s structured, may have a function like Start() that works like allowing your application to add new commands to the Raft log. There may also be a loop where apply() is called on the application for all elements in the log between lastApplied and commitIndex when the commitIndex is updated. These programs may all have a lock A. In your Raft-based application, you’ll probably call Raft’s Start() function somewhere in your RPC handler, and you’ll also have code somewhere else that will be notified whenever Raft applies a new log entry. Because of the need for both places to communicate (that is, the RPC method needs to know when putting the log is complete), they may both have some lock B.

In Go, the four code snippets might look like this:

func (a *App) RPC(args interface{}, reply interface{}) {

// ...

i := a.raft.Start(args)

// update some data structure so that apply knows to poke us later

a.mutex.Unlock() // wait for apply to poke us

return

Copy the code
func (r *Raft) Start(cmd interface{}) int { 

r.mutex.Lock() 

// do things to start agreement on this new command

// store index in the log where cmd was placed

r.mutex.Unlock() 

return index 

}

Copy the code
func (a *App) apply(index int, cmd interface{}) {

 a.mutex.Lock()

    switch cmd := cmd.(type) {

    case GetArgs:

      // do the get

     // see who was listening for this index

     // poke them all with the result of the operation

      // ...

    } 

    a.mutex.Unlock()    

}

Copy the code
func (r *Raft) AppendEntries(...).{

// ...

r.mutex.Lock()

 // ...

 for r.lastApplied < r.commitIndex {

 r.lastApplied++ 

 r.app.apply(r.lastApplied, r.log[r.lastApplied]) 

 }

 // ...

  r.mutex.Unlock()

}

Copy the code

Consider that the system is in the following state:

  • App.RPC only got onea.mutexAnd callRaft.Start
  • Raft.StartIs waiting forr.mutex
  • Raft.AppendEntriesholdr.mutex, and just calledApp.apply

We now have a deadlock because:

  • Raft.AppendEntriesThe lock will not be released untilApp.applyreturn
  • App.applyCan’t return until it getsa.mutex
  • a.mutexWill not be released untilApp.RPCreturn
  • App.RPCWill not return untilRaft.Startreturn
  • Raft.StartWill not return until it getsr.mutex
  • Raft.StartHave to wait forRaft.AppendEntries

There are several ways you can solve this problem. One of the simplest ways is to get a.mutex after calling A.aft.Start in app. RPC. However, this means that before app.rpc has had a chance to record the fact that it wants to be notified, app.apply may be called by the client operation in which App.rpc just called raft.start. Another possible cleaner design would be to use a separate, dedicated thread to call R.app.apply from Raft. This thread might be alerted every time a commitIndex is updated, and would not then need to hold a lock to apply, breaking deadlocks.

This article prohibits reprinting, thank you for your cooperation!! ! Welcome to follow my wechat official account: Rust

Rust