This paper links: blog.openacid.com/algo/paxosk…
preface
After writing the intuitive explanation of Paxos, netizens all said that the curative effect is very good, but they also raised questions about some links in this tutorial (if there is doubt, it really understands 🤔), such as how to apply paxos which can only determine a value to the actual scene.
Since Talk is cheap, Show me the code, this time we will implement the content described in the tutorial directly into the code, hoping to cover every detail covered in the tutorial. To help you understand how PaxOS works.
This is a simple implementation of the KV storage system based on PaxOS, 200 lines of code, as part of the code examples in this tutorial as a visual explanation of PaxOS. The principle of Paxos is not introduced in this article. The data structures mentioned in this article are defined using Protobuf, and the network part is defined using GRPC. Another 200 lines of GO code implement PAxOS storage.
The code in this article may be simplified; the complete code is implemented in the Paxoskv project (naive branch).
Operation and use
🚀
Run:
git clone https://github.com/openacid/paxoskv.git
cd paxoskv
go test -v ./...
Copy the code
In addition to paxOS implementation, this project describes three examples of paxOS operation with three test cases.
- TestCase1SingleProposer: runs without conflicts.
- TestCase2DoubleProposer: runs with a conflict.
- Example_setAndGetByKeyVer: used as key-val.
The test code describes the behavior of several examples of PaxOS running, and running the test verifies that the implementation of PaxOS is as expected.
The protobuf data structure in this article is defined as follows:
service PaxosKV {
rpc Prepare (Proposer) returns (Acceptor) {}
rpc Accept (Proposer) returns (Acceptor) {}
}
message BallotNum {
int64 N = 1;
int64 ProposerId = 2;
}
message Value {
int64 Vi64 = 1;
}
message PaxosInstanceId {
string Key = 1;
int64 Ver = 2;
}
message Acceptor {
BallotNum LastBal = 1;
Value Val = 2;
BallotNum VBal = 3;
}
message Proposer {
PaxosInstanceId Id = 1;
BallotNum Bal = 2;
Value Val = 3;
}
Copy the code
And the main function implementation:
// struct KVServer
Storage : map[string]Versions
func Accept(c context.Context, r *Proposer) (*Acceptor, error)
func Prepare(c context.Context, r *Proposer) (*Acceptor, error)
func getLockedVersion(id *PaxosInstanceId) *Version
// struct Proposer
func Phase1(acceptorIds []int64, quorum int) (*Value, *BallotNum, error)
func Phase2(acceptorIds []int64, quorum int) (*BallotNum, error)
func RunPaxos(acceptorIds []int64, val *Value) (*Value)
func rpcToAll(acceptorIds []int64, action string) ([]*Acceptor)
func ServeAcceptors(acceptorIds []int64) ([]*grpc.Server)
Copy the code
Implement Paxoskv from scratch
Paxos-related data structures
In this example our data structure and service framework is implemented using Protobuf and GRPC, starting with the lowest level paxOS data structure:
The Proposer and Acceptor
In Slide-27 we describe one of the Acceptor fields required:
On the storage side (Acceptor) there are also several concepts:
- The last_rnd is the last Proposer that acceptors remember to read before writing to determine who can actually write a value to the store later.
- V is the last value to be written.
- VRND is paired with V, and it records the Round in which V was written.
In the original article, these nouns refer to the names of Paxos Made Simple, but they were changed in Leslie Lamport’s subsequent papers. For subsequent convenience, they were also replaced in the code implementation of Paxoskv:
BallotNum VRND ==> VBal // Which Ballot in which v is accepted by acceptors (commented) last_rnd ==> LastBalCopy the code
The Proposer field is also simple, and it records:
- Current ballot Number:
Bal
. - And the value it chooses to run at Phase E2:
Val
(slide-29).
Protobuf is used to define the data structures for the two roles, as declared in paxoskv.proto:
message Acceptor {
BallotNum LastBal = 1;
Value Val = 2;
BallotNum VBal = 3;
}
message Proposer {
PaxosInstanceId Id = 1;
BallotNum Bal = 2;
Value Val = 3;
}
Copy the code
The Proposer also requires a PaxosInstanceId to identify which version of the key for which the current paxOS instance is deciding. The paxOS made Simple algorithm describes only one paxos instance algorithm (for each change to a key). To implement multiple changes, we need to add this field to distinguish between different paxOS instances:
message PaxosInstanceId {
string Key = 1;
int64 Ver = 2;
}
Copy the code
Paxoskv. proto also defines a BallotNum, because to ensure that BallotNum is ordered and not repeated throughout the system, it is common practice to combine a locally monotonically increasing integer with a globally unique ID:
message BallotNum {
int64 N = 1;
int64 ProposerId = 2;
}
Copy the code
Define the RPC message structure
The RPC message defines the communication between Proposer and Acceptor.
In a PaxOS system, there must be at least four messages:
- Phase1 Prepare – request, Prepare to reply,
- And Phase2’s accept-request, accept-reply,
As described in Slide-28 (RND is used in the original article, Bal is used here, the same concept):
Phase-1(Prepare):
request: Bal: int reply: LastBal: int Val: string VBal: int Copy the code
Phase-2(Accept):
request: Bal: int Val: string reply: LastBal: int Copy the code
In prepare-request or accept-request, a portion or all of the Proposer fields are sent, so we put in the code:
- The Proposer structure is used directly as the request structure.
- Also use the Acceptor structure as the reply structure.
Use only a few of these fields when you use them. The corresponding RPC service PaxosKV is defined as follows:
service PaxosKV {
rpc Prepare (Proposer) returns (Acceptor) {}
rpc Accept (Proposer) returns (Acceptor) {}
}
Copy the code
Generate the service framework using Protobuf and GRPC
🚀
– Protobuf can generate go code directly from Paxoskv.proto (the code base already contains the generated code: paxoskv.pb.go, only after modifying paxoskv.proto will need to be regenerated)
-
The protobuf compiler, protoc, can be installed by following the steps in install-protoc, usually with a simple one-line command:
- Linux:
apt install -y protobuf-compiler
- Mac:
brew install protobuf
X: libProtoc 3.13.0
- Linux:
-
Install protoc-gen-go plugin for Protoc:
go get -u github.com/golang/protobuf/protoc-gen-go
-
For protokv. Proto files: make gen or:
protoc \ --proto_path=proto \ --go_out=plugins=grpc:paxoskv \ paxoskv.proto Copy the code
As you can see from the generated paxoskv.pb.go code, the main data structures such as the Acceptor definition are:
type Acceptor struct{ LastBal *BallotNum ... Val *Value ... VBal *BallotNum ... . }Copy the code
And the code of client and server of KV service. The client side is well implemented, and the server side has only one interface. We need to complete its implementation later:
type paxosKVClient struct {
cc *grpc.ClientConn
}
type PaxosKVClient interface{ Prepare( ctx context.Context, in *Proposer, opts ... grpc.CallOption ) (*Acceptor, error) Accept( ctx context.Context, in *Proposer, opts ... grpc.CallOption ) (*Acceptor, error) }type PaxosKVServer interface {
Prepare(context.Context,
*Proposer) (*Acceptor, error)
Accept(context.Context,
*Proposer) (*Acceptor, error)
}
Copy the code
Implement the server side of storage
Impl. Go is all the implementation parts, we define a KVServer structure, used to implement GRPC service interface PaxosKVServer; Where an in-memory map structure is used to simulate data storage:
type Version struct {
mu sync.Mutex
acceptor Acceptor
}
type Versions map[int64]*Version
type KVServer struct {
mu sync.Mutex
Storage map[string]Versions
}
Copy the code
Where Version corresponds to a change in a key, that is, to a PAxOS instance. Versions Indicates a series of changes to a key. Storage is all changes to all keys.
Implement Acceptor’s GRPC service handler
An Acceptor, a server in that system, listens on a port, waits for a Proposer to request it, processes it, and then gives a response.
According to Paxos, the logic of Acceptor is simple: described in Slide-28:
Handle Prepare-request for KVServer
func (s *KVServer) Prepare( c context.Context, r *Proposer) (*Acceptor, error) {
v := s.getLockedVersion(r.Id)
defer v.mu.Unlock()
reply := v.acceptor
if r.Bal.GE(v.acceptor.LastBal) {
v.acceptor.LastBal = r.Bal
}
return &reply, nil
}
Copy the code
This code has 3 steps:
- Get the paxOS instance,
- Generate a reply: Acceptors always return
LastBal
.Val
.VBal
These three fields, so assign Acceptor directly to reply. - Then, as described by the PaxOS algorithm, if the request has a larger ballot number, it records that it does not accept a Proposer with a smaller ballot number.
GetLockedVersion () obtains an Acceptor instance from kvserver.storage based on the PaxosInstanceId field key and ver sent by request:
func (s *KVServer) getLockedVersion( id *PaxosInstanceId) *Version {
s.mu.Lock()
defer s.mu.Unlock()
key := id.Key
ver := id.Ver
rec, found := s.Storage[key]
if! found { rec = Versions{} s.Storage[key] = rec } v, found := rec[ver]if! found {// initialize an empty paxos instance
rec[ver] = &Version{
acceptor: Acceptor{
LastBal: &BallotNum{},
VBal: &BallotNum{},
},
}
v = rec[ver]
}
v.mu.Lock()
return v
}
Copy the code
Handle accept-request is handled similarly in theslide-31In the description:
Accept() records three values,
LastBal
: The largest ballot number seen by Acceptor;Val
: Proposer selects a value,- As well as
VBal
: Proposerçš„ballot number:
func (s *KVServer) Accept( c context.Context, r *Proposer) (*Acceptor, error) {
v := s.getLockedVersion(r.Id)
defer v.mu.Unlock()
reply := Acceptor{
LastBal: &*v.acceptor.LastBal,
}
if r.Bal.GE(v.acceptor.LastBal) {
v.acceptor.LastBal = r.Bal
v.acceptor.Val = r.Val
v.acceptor.VBal = r.Bal
}
return &reply, nil
}
Copy the code
(a) If the Acceptor logic is complete, look at the Proposer
Implement the Proposer logic
Proposer runs in two phases, phase E1 and phase E2, corresponding to Prepare and Accept.
Phase1
In an implementation of impl.go, the proposer.phase1 () function takes care of Phase1’s logic:
func (p *Proposer) Phase1(
acceptorIds []int64,
quorum int) (*Value, *BallotNum, error) {
replies := p.rpcToAll(acceptorIds, "Prepare")
ok := 0
higherBal := *p.Bal
maxVoted := &Acceptor{VBal: &BallotNum{}}
for _, r := range replies {
if! p.Bal.GE(r.LastBal) { higherBal = *r.LastBalcontinue
}
if r.VBal.GE(maxVoted.VBal) {
maxVoted = r
}
ok += 1
if ok == quorum {
return maxVoted.Val, nil.nil}}return nil, &higherBal, NotEnoughQuorum
}
Copy the code
This code first sends a prepare-Request request to all acceptors via rpcToAll() and then finds all successful replies:
- If a larger ballot number is found, a Prepare has failed: there is an updated Proposer;
- Otherwise, it is a successful response. Check to see if it returns a value that was accepted by acceptors.
Finally, if a successful response has reached quorum, Phase1 is considered complete, and returns the last checked value, which voted with the maximum VBal. Let the upper caller continue Phase2;
If quorum is not reached, then there may be a conflict with a number of proposers running concurrently, with a larger ballot number, then the Proposer returns the largest ballot number it sees, and raises the ballot number at a higher level and tries again.
The client connects to the server
(a) A Proposer to a client (Proposer) with an Acceptor (if any)
func (p *Proposer) rpcToAll(
acceptorIds []int64,
action string)[] *Acceptor {
replies := []*Acceptor{}
for _, aid := range acceptorIds {
var err error
address := fmt.Sprintf("127.0.0.1: % d",
AcceptorBasePort+int64(aid))
conn, err := grpc.Dial(
address, grpc.WithInsecure())
iferr ! =nil {
log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
c := NewPaxosKVClient(conn)
ctx, cancel := context.WithTimeout(
context.Background(), time.Second)
defer cancel()
var reply *Acceptor
if action == "Prepare" {
reply, err = c.Prepare(ctx, p)
} else if action == "Accept" {
reply, err = c.Accept(ctx, p)
}
iferr ! =nil {
continue
}
replies = append(replies, reply)
}
return replies
}
Copy the code
Phase2
Phase E2, which the Proposer runs, is described in Slide-30 and is simpler than Phase E1:
In phase 2, phase 2, the Proposer X writes its selected value to the Acceptor, either a value it wants to write itself or a v(fix) that it reads from an Acceptor.
func (p *Proposer) Phase2(
acceptorIds []int64,
quorum int) (*BallotNum, error) {
replies := p.rpcToAll(acceptorIds, "Accept")
ok := 0
higherBal := *p.Bal
for _, r := range replies {
if! p.Bal.GE(r.LastBal) { higherBal = *r.LastBalcontinue
}
ok += 1
if ok == quorum {
return nil.nil}}return &higherBal, NotEnoughQuorum
}
Copy the code
As we see, it only needs to confirm that the number of work acknowledgments as phase E2 has reached quorum. It also has the responsibility to return the larger ballot number seen if Phase2 fails, because there may be other proposers between Phase1 and Phase2 that interrupt the current Proposer with a larger ballot number, As described in the slide-33 conflict resolution example. Speak later.
Complete PaxOS logic
The full PAXOS is covered by a Proposer that selects a value so that consistency is guaranteed. As described in Slide-29:
A Proposer X receives a quorum response and considers it ready to proceed. If more than half of the acceptors are not contacted, the system is stuck, which is paxos’s claim that less than half of the nodes will fail. A Proposer then faces two scenarios:
None of the replies contain any non-empty v, indicating that the system was previously clean and that no value has been written by any other Paxos client (because a majority read must see the result of a majority write). The Proposer X then proceeds to actually write the values it is writing to in phase 2 to more than half of the acceptors.
If a Proposer X receives a response containing a V and VRND that were written to it, then it must assume that a Proposer is running, even if it does not know if the Proposer has successfully terminated, but that no value that has been written can be modified! , so X must remain the same. The v corresponding to the maximum VRND that X will see is then the value that X’s Phase-2 will write.
X is actually considered to have performed a fix to the other client (Proposer) if it has interrupted.
A Phase numbered with Acceptor and a Phase numbered with a Proposer (Proposer), then a complete PAxOS is implemented in our code RunPaxos:
func (p *Proposer) RunPaxos(
acceptorIds []int64,
val *Value) *Value {
quorum := len(acceptorIds)/2 + 1
for {
p.Val = val
maxVotedVal, higherBal, err := p.Phase1(
acceptorIds, quorum)
iferr ! =nil {
p.Bal.N = higherBal.N + 1
continue
}
ifmaxVotedVal ! =nil {
p.Val = maxVotedVal
}
// val == nil is a read,
// Fail to read the commented value without Phase2
if p.Val == nil {
return nil
}
higherBal, err = p.Phase2(
acceptorIds, quorum)
iferr ! =nil {
p.Bal.N = higherBal.N + 1
continue
}
return p.Val
}
}
Copy the code
This code does a few things: Run Phase1, select it if you have a “cute” value, select val if you don’t, and then run Phase2.
As described in Phase1 Phase2, at any stage, if quorum is not reached, the larger ballot number encountered needs to be promoted and retry to resolve the ballot number conflicts encountered.
This function takes two arguments:
- A list of all acceptors (an integer ID representing an Acceptor),
- And the value to submit.
If Paxos sees another accepted value after Phase1 is complete, vote for the accepted value and discard val. In this case, in our system, for example, if we want to write key=foo, ver=3 is bar, and if bar is not selected, we need to select the next version key=foo, ver=4 and try again.
In this continuous retry loop, writes eventually succeed in writing a value (a version of a key).
Implement read operations
In our NB(Naive and Bsice) system, both reading and writing are done through the PAXOS algorithm at once. This is because the write process is a paxOS execution, and PaxOS only guarantees that a certain value will be written to a quorum, not to all nodes. Therefore, a read operation must perform at least one majority read if it wants to read the last value written.
But majority reading is not enough: it may read an unfinished PAxOS write, such as the dirty read problem described in Slide-11, and the maximum VBal value read may not be the determined value (write to majority).
For example:
Val=foo Val=bar ?
VBal=3 VBal=2 ?
------- ------- --
A0 A1 A2
Copy the code
If a Proposer attempts to read a Phase1 (a Phase1) to reach two acceptors A0, A1, then which of the values foo and bar is determined depends on the state of A2. So a Phase2 with the maximum VBal is then run through phase e2 to determine the maximum value before it is sent back to the upper layer. (if not, a phase er with a phase b matches A1 and A2 with Val=bar as the determined value.)
Of course if the Proposer succeeds in reading Phase1 of the process and doesn’t see any of the already voted values (such as foo or bar), then it doesn’t have to run Phase2.
So in this version of the implementation, the read is also a call to the RunPaxos function, except that it does not choose any new values. To support the read, add a check to the code above before Phase2.
if p.Val == nil {
return nil
}
Copy the code
The Example_setAndGetByKeyVer test case shows how to implement a KV storage using PaxOS, which reads and writes something like this:
prop := Proposer{
Id: &PaxosInstanceId{
Key: "foo",
Ver: 0,
},
Bal: &BallotNum{N: 0, ProposerId: 2}},/ / write:
v := prop.RunPaxos(acceptorIds, &Value{Vi64: 5})
/ / read:
v := prop.RunPaxos(acceptorIds, nil)
Copy the code
So far, all the functionality covered in this article has been implemented, and the full implementation is in Imp.go.
We then use the test case to implement the two examples listed in the intuitive explanation of PaxOS below, which look at poxOS in action from the code:
In this example
The first example is paxOS running Slide-32 without conflict:
TestCase1SingleProposer (proposer) :
func TestCase1SingleProposer(t *testing.T) {
ta := require.New(t)
acceptorIds := []int64{0.1.2}
quorum := 2
// Start the services of three acceptors
servers := ServeAcceptors(acceptorIds)
defer func(a) {
for _, s := range servers {
s.Stop()
}
}()
// Define the PAxOS instance ID with the key and version to be updated
paxosId := &PaxosInstanceId{
Key: "i",
Ver: 0,}var val int64 = 10
// define a Proposer with an arbitrary id of Proposer 10.
var pidx int64 = 10
px := Proposer{
Id: paxosId,
Bal: &BallotNum{N: 0, ProposerId: pidx},
}
// Run Phase1 with the two acceptors on the left,
// Success, no other ballot number is seen
latestVal, higherBal, err := px.Phase1(
[]int64{0.1}, quorum)
ta.Nil(err, "constitued a quorum")
ta.Nil(higherBal, "no other proposer is seen")
ta.Nil(latestVal, "no voted value")
// After Phase1 succeeds, I can't see any other "cute" values,
// Proposer selects its own value for phase E2
px.Val = &Value{Vi64: val}
// Phase 2
higherBal, err = px.Phase2(
[]int64{0.1}, quorum)
ta.Nil(err, "constitued a quorum")
ta.Nil(higherBal, "no other proposer is seen")}Copy the code
The second example corresponds to two instances where a Proposer encounters a conflict and resolves that conflict. The code is not posted in the article and can be seen in TestCase2DoubleProposer
The next step
We implemented a storage system that specifies a key, VER, but there are still a few things missing from actually producing usable KV storage:
-
In write operations, users do not need to specify a VER. Therefore, you need to implement the function of searching for the maximum VER for the specified key. None of this is relevant to PaxOS, so the logic is now omitted from this implementation. More on that later. 🤔
-
In order to avoid the need to specify ver for read operations, a snapshot function is also required, that is, to save a map of key-values. This map only needs to record the latest value of each key (and ver, etc.). Once you have this map, the version of the confirmed value can be deleted. That is, the Versions structure only exists as a change log for each key and stores the paxOS instance corresponding to each change.
-
The snapshot feature also introduces another requirement, the learn behavior in Paxos Made Simple, corresponding to Phase3. In the store described in this article, only the Proposer knows that a key-ver has reached a majority, Acceptor does not know yet, so paxos must be read again. When an Acceptor accepts a value, it notifys other acceptors of the value. We can set each Acceptor to a user as well: If an Acceptor votes a value to a Proposer, it also broadcasts that event to other acceptors so that each Acceptor knows which value has reached quorum (safe) and can be read directly.
In practice, however, this approach produces an n-² number of messages. So if a Proposer receives a Phase2 response for a quorum, then it broadcasts a message to all acceptors that the paxos instance is safe. That message is called Commit on most systems.
The next version of the implementation will use the classic log plus snapshot method to store data.
Friends interested in what aspects, welcome to urge more 🤔…
The code used in this article is on the Naive branch of the Paxoskv project: github.com/openacid/pa…
If there is anything missing in this article, or you have any good ideas, please feel free to discuss.
Questions related to this article can be raised on the Paxoskv project hub Issue.
This paper links: blog.openacid.com/algo/paxosk…