Book to wen Raft Part B | MIT 6.824 Lab2B Log Replication.

The experiment to prepare

  1. Experiment code:git://g.csail.mit.edu/6.824-golabs-2021/src/raft
  2. How to test:go test -run 2C -race
  3. Related papers: Raft Extended
  4. 6.824 Lab 2: Raft (mit.edu)

The experimental goal

  1. completepersist()andreadPersist()Function, coded according to comments.
  2. To optimize thenextIndex[]Rollback mode. Otherwise, all tests fail.

Some hints

  1. The test involves uncertain events such as server failure and RPC failure. Run the test several times to ensure that it passes.
  2. The parts that need persistence includecurrentTerm,votedFor,log.
  3. The relevantnextIndex[]You can view the rollback optimizationStudents’ Guide to Raft.
  4. Errors that tests fail to detect in Lab2A and Lab2B may be exposed in Lab2C.

persistence

This part is actually quite simple, the comments in the code are already clear, of course you should be aware of the data race issue.

func (rf *Raft) persist(a) {
	w := new(bytes.Buffer)
	e := labgob.NewEncoder(w)
	e.Encode(rf.currentTerm)
	e.Encode(rf.votedFor)
	e.Encode(rf.log)
	rf.persister.SaveRaftState(w.Bytes())
}

func (rf *Raft) readPersist(data []byte) {
	if data == nil || len(data) < 1 {
		return
	}
	r := bytes.NewBuffer(data)
	d := labgob.NewDecoder(r)
	d.Decode(&rf.currentTerm)
        d.Decode(&rf.votedFor)
        d.Decode(&rf.log)
}
Copy the code

NextIndex optimization

In Part B, let nextIndex decrement for a failed AppendEntries request, which is slow.

Optimization point 1

If follower.log does not exist prevLog, have the Leader synchronize the log from the end of follower.log the next time.

Optimization point 2

Prevlog. Term is a conflictTerm if the prevlog. Term is not a match.

  1. ifleader.logTerm cannot be foundconflictTermLog, the next time fromfollower.logIn theconflictTermThe first log position of the start synchronization log.
  2. ifleader.logTerm is foundconflictTermLog, the next time fromleader.logIn theconflictTermThe next position of the last log starts the synchronization log.

The correct location of nextIndex may still take multiple RPCS to find, and the improved process only speeds up finding the correct nextIndex.

AppendEntries has the following logic.

reply.Term = rf.currentTerm
reply.Success = false

if len(rf.log) <= args.PrevLogIndex {
    reply.ConflictIndex = len(rf.log)
    reply.ConflictTerm = - 1
    return
}

ifrf.log[args.PrevLogIndex].Term ! = args.PrevLogTerm { reply.ConflictTerm = rf.log[args.PrevLogIndex].Termfor i := 1; i <= args.PrevLogIndex; i++ {
            if rf.log[i].Term == reply.ConflictTerm {
                    reply.ConflictIndex = i
                    return}}}Copy the code

The logic in Heartbeat is as follows.

if! reply.Success {if reply.ConflictTerm == - 1 {
        rf.nextIndex[id] = reply.ConflictIndex
    } else {
        conflictIndex := - 1
        for i := args.PrevLogIndex; i > 0; i-- {
                if rf.log[i].Term == reply.ConflictTerm {
                        conflictIndex = i
                        break}}ifconflictIndex ! =- 1 {
                rf.nextIndex[id] = conflictIndex + 1
        } else {
                rf.nextIndex[id] = reply.ConflictIndex
        }
    }
}
Copy the code

The experimental conclusion

Part C is not the core Part of Raft algorithm, and the optimization of nextIndex is based on the method in Students’ Guide.

If you’ve done both persistence and fallback tuning and still can’t pass all the tests, you might want to double-check parts A and B for missing details.

Finally, to prove that I am not scribbling, I enclose my test results.