Book to wen Raft Part B | MIT 6.824 Lab2B Log Replication.
The experiment to prepare
- Experiment code:
git://g.csail.mit.edu/6.824-golabs-2021/src/raft
- How to test:
go test -run 2C -race
- Related papers: Raft Extended
- 6.824 Lab 2: Raft (mit.edu)
The experimental goal
- complete
persist()
andreadPersist()
Function, coded according to comments. - To optimize the
nextIndex[]
Rollback mode. Otherwise, all tests fail.
Some hints
- The test involves uncertain events such as server failure and RPC failure. Run the test several times to ensure that it passes.
- The parts that need persistence include
currentTerm
,votedFor
,log
. - The relevant
nextIndex[]
You can view the rollback optimizationStudents’ Guide to Raft. - Errors that tests fail to detect in Lab2A and Lab2B may be exposed in Lab2C.
persistence
This part is actually quite simple, the comments in the code are already clear, of course you should be aware of the data race issue.
func (rf *Raft) persist(a) {
w := new(bytes.Buffer)
e := labgob.NewEncoder(w)
e.Encode(rf.currentTerm)
e.Encode(rf.votedFor)
e.Encode(rf.log)
rf.persister.SaveRaftState(w.Bytes())
}
func (rf *Raft) readPersist(data []byte) {
if data == nil || len(data) < 1 {
return
}
r := bytes.NewBuffer(data)
d := labgob.NewDecoder(r)
d.Decode(&rf.currentTerm)
d.Decode(&rf.votedFor)
d.Decode(&rf.log)
}
Copy the code
NextIndex optimization
In Part B, let nextIndex decrement for a failed AppendEntries request, which is slow.
Optimization point 1
If follower.log does not exist prevLog, have the Leader synchronize the log from the end of follower.log the next time.
Optimization point 2
Prevlog. Term is a conflictTerm if the prevlog. Term is not a match.
- if
leader.log
Term cannot be foundconflictTerm
Log, the next time fromfollower.log
In theconflictTerm
The first log position of the start synchronization log. - if
leader.log
Term is foundconflictTerm
Log, the next time fromleader.log
In theconflictTerm
The next position of the last log starts the synchronization log.
The correct location of nextIndex may still take multiple RPCS to find, and the improved process only speeds up finding the correct nextIndex.
AppendEntries has the following logic.
reply.Term = rf.currentTerm
reply.Success = false
if len(rf.log) <= args.PrevLogIndex {
reply.ConflictIndex = len(rf.log)
reply.ConflictTerm = - 1
return
}
ifrf.log[args.PrevLogIndex].Term ! = args.PrevLogTerm { reply.ConflictTerm = rf.log[args.PrevLogIndex].Termfor i := 1; i <= args.PrevLogIndex; i++ {
if rf.log[i].Term == reply.ConflictTerm {
reply.ConflictIndex = i
return}}}Copy the code
The logic in Heartbeat is as follows.
if! reply.Success {if reply.ConflictTerm == - 1 {
rf.nextIndex[id] = reply.ConflictIndex
} else {
conflictIndex := - 1
for i := args.PrevLogIndex; i > 0; i-- {
if rf.log[i].Term == reply.ConflictTerm {
conflictIndex = i
break}}ifconflictIndex ! =- 1 {
rf.nextIndex[id] = conflictIndex + 1
} else {
rf.nextIndex[id] = reply.ConflictIndex
}
}
}
Copy the code
The experimental conclusion
Part C is not the core Part of Raft algorithm, and the optimization of nextIndex is based on the method in Students’ Guide.
If you’ve done both persistence and fallback tuning and still can’t pass all the tests, you might want to double-check parts A and B for missing details.
Finally, to prove that I am not scribbling, I enclose my test results.