After 6.824 was finished, the code was in a mess, so I wanted to rewrite it and sort out my thoughts. Later, I found tinyKV, which was more than 6.824, so I planned to do tinyKV as well.

To translate documents

In this project, a single KV storage service supporting Column Family is implemented. Column Family indicates the namespace of the Key. Different Column families can have the same Key.

The service provides four basic operations: Put, Delete, Get, and Scan.

This project can be divided into two steps:

  1. Realize the storage engine of a single machine.
  2. Implement the native service interface.

Tinykvpb. proto and kvrpcpb.proto define RPC interfaces and request response messages. The RPC service is registered in KV /main.go.

The proto file is generated by protocol-buffer and does not need to be modified.

The Server layer is supported by the Storage abstract class, which needs to implement all the interfaces of Storage for StandAloneStorage.

type Storage interface {
    // Other stuffs
    Write(ctx *kvrpcpb.Context, batch []Modify) error
    Reader(ctx *kvrpcpb.Context) (StorageReader, error)
}
Copy the code

The Storage infrastructure is supported by Badger, a Storage engine similar to LevelDB or RocksDB, and StandAloneStorage is a simple wrapper for Badger.

There is no need to consider the meaning of kvrpcpb.Context at this point.

A few hints.

  1. Use badger.txn to implement the Reader() function, which provides transaction support for Badger.
  2. Badger does not support Column Family, and engine_util provides a series of functions that implement the Column Family using the prefix and use them to implement the Write() function.
  3. Use Connor1996/badger instead of dgraph-io/badger.
  4. Use Discard() to close badger-.txn, which requires closing all iterators.

Finally is based on Storage implementation RawGet/RawScan/RawPut/RawDelete, is completed by the make project1 test.

StandAloneStorage

Engine_util encapsulates Badger’s interface, and StandAloneStorage encapsulates another layer on top of engine_util. So the StandAloneStorage structure is very simple.

type StandAloneStorage struct {
   engine *engine_util.Engines
}
Copy the code

StandAloneStorage is a concrete implementation of the abstract class Storage, so Write and Reader functions need to be implemented. The function signatures are as follows.

func (s *StandAloneStorage) Write(ctx *kvrpcpb.Context, batch []storage.Modify) error
func (s *StandAloneStorage) Reader(ctx *kvrpcpb.Context) (storage.StorageReader, error)
Copy the code

In Project1, kvrpcpb.Context is not used. Storage. Modify corresponds to Put or Delete writes. Storage. StorageReader is also an abstract class.

Reader

type StorageReader interface {
   GetCF(cf string, key []byte) ([]byte, error)
   IterCF(cf string) engine_util.DBIterator
   Close()
}
Copy the code

You can see the two functions of the StorageReader, which mask transactions and simplify the interface. To implement the StorageReader, GetCFFromTxn and NewCFIterator functions provided by Engine_util are required.

func GetCFFromTxn(txn *badger.Txn, cf string, key []byte) (val []byte, err error)
func NewCFIterator(cf string, txn *badger.Txn) *BadgerIterator
Copy the code

The function engine_util to get badger.txn is not given and requires a direct call to badger.db. NewTransaction.

func (db *DB) NewTransaction(update bool) *Txn
Copy the code

Update indicates the Put/Delete write operations. False indicates the Get/Scan read operations.

Write

type Modify struct {
   Data interface{}}type Put struct {
   Key   []byte
   Value []byte
   Cf    string
}

type Delete struct {
   Key []byte
   Cf  string
}
Copy the code

Modify is a Put/Delete operation. In Write, an assertion is used to determine whether it is Put or Delete, and the PutCF and DeleteCF functions provided by engine_util are called.

func PutCF(engine *badger.DB, cf string, key []byte, val []byte) error
func DeleteCF(engine *badger.DB, cf string, key []byte) error
Copy the code

In fact, both functions are implemented internally using transactions, and in contrast to Reader, Write’s transaction masking does not allow us to implement it ourselves.

Server

StandAloneStorage is the implementation of Storage, Server must be on top of it, and RaftStorage will follow. In this way, the Storage engine at the bottom of Server can be replaced, corresponding to stand-alone and distributed Storage services respectively.

type Server struct {
   storage storage.Storage
}
Copy the code

Project1 requires us to implement a native service interface for the Server. This part is in raw_api.go, and the functions that need to be implemented are signed below.

func (server *Server) RawGet(_ context.Context, req *kvrpcpb.RawGetRequest) (*kvrpcpb.RawGetResponse, error)
func (server *Server) RawPut(_ context.Context, req *kvrpcpb.RawPutRequest) (*kvrpcpb.RawPutResponse, error)
func (server *Server) RawDelete(_ context.Context, req *kvrpcpb.RawDeleteRequest) (*kvrpcpb.RawDeleteResponse, error)
func (server *Server) RawScan(_ context.Context, req *kvrpcpb.RawScanRequest) (*kvrpcpb.RawScanResponse, error)
Copy the code

Write, reader. GetCF, and reader. IterCF are called to implement these four interfaces. One of the Scan functions may not be easily comprehended, because badgerDB is stored in order. RawScanRequest.Limit is the meaning of N.

Engine_util encapsulates DBIterator and provides Item, Valid, Next, and Seek to implement RawScan functions.

Column Family

In case you haven’t figured out what CF is, it’s essentially just a string, used as a prefix for a Key, which acts as a namespace.

const (
   CfDefault string = "default"
   CfWrite   string = "write"
   CfLock    string = "lock"
)

func KeyWithCF(cf string, key []byte) []byte {
   return append([]byte(cf+"_"), key...)
}
Copy the code

For example, if the default CF is “default”, the Key named “apple” under the CF is actually stored as the string default_apple.