BoltDB: a pure Go key/value database

Blot

Bolt is made by Howard Chu LMDB (https://symas.com/lmdb/technical/) project inspired by a pure Go key/value databases. The goal of the project is to provide a simple, fast and reliable database for projects that do not require a full database server such as Postgres or MySQL.

Since Bolt is used as such a low-level feature, simplicity is key. The API will be small, focusing only on getting and setting values.

Project status

Blot stability, FIXED API, fixed file format. Use full unit test coverage and random black box testing to ensure database consistency and thread safety. Blot is currently used in high-load production environments up to 1TB. Many companies, such as Shopify and Heroku, use Bolt to support services every day.

A message from the author

Bolt’s original goal was to provide a simple, pure Go key/value store without making the code redundant. To that end, the project has been a success. However, this limited scope also means that the project is complete.

Maintaining an open source database requires a lot of time and effort. Changes to code can have unexpected or even catastrophic effects, so even simple changes require hours of careful testing and verification.

Unfortunately, I no longer have the time or energy to continue the work. Blot is stable and has been successfully produced for many years. Therefore, I feel that leaving it in its current state is the most prudent course of action.

If you’re interested in using a more distinctive version of Bolt, I suggest you take a look at the CoreOS fork called BBolt.

Getting Started

The installation

To use Blot, first install the GO environment and then execute the following command:

$ go get github.com/boltdb/bolt/...Copy the code

This command retrieves the library and installs the Blot executable into the $GOBIN path.

Open the BlotDB

The top-level object in Bolt is a DB. It is represented as a single file on disk and represents a consistent snapshot of the data.

To Open the database, simply use the bolt.open () function:

package main import ( "log" "github.com/boltdb/bolt" ) func main() { // Open the my.db data file in your current directory. // It will be created if it doesn't exist. db, err := bolt.Open("my.db", 0600, nil) if err ! = nil { log.Fatal(err) } defer db.Close() ... }Copy the code

Note :Bolt obtains a file lock on the data file, so multiple processes cannot open the same database at the same time. Opening an already open Bolt database will cause it to hang until another process closes it. To prevent an indefinite wait, you can pass the timeout option to the Open() function:

db, err := bolt.Open("my.db", 0600, &bolt.Options{Timeout: 1 * time.Second})Copy the code

The transaction

Bolt allows only one read-write transaction at a time, but multiple read-only transactions at a time. Each transaction has a consistent view of the data.

Individual transactions and all objects created from them (such as buckets, keys) are not thread-safe. To process data in multiple Goroutines, you must either start a transaction for each goroutine or use locks to ensure that only one Goroutine accesses a transaction at a time. Creating transactions from DB is thread-safe.

Read-only and read-write transactions should not depend on each other and generally should not be opened at the same time in the same routine. This can lead to deadlocks because read and write transactions need to periodically remap data files, but this can only be done when a read-only transaction is open.

Read/write transaction

To start a read/write thing, you can use the db.update () function:

err := db.Update(func(tx *bolt.Tx) error {
    ...
    return nil
})Copy the code

Inside the closure, you have a consistent view of the database. You complete the transaction by returning zero. You can also roll back a transaction at any time by returning an error. All database operations are allowed in a read/write transaction.

Always check for a return error, as it will report any disk failure that could cause your transaction to not complete. If you return an error in a closure, it will be passed.

Read-only transactions

To start a read-only transaction, you can use the db.view () function:

err := db.View(func(tx *bolt.Tx) error {
    ...
    return nil
})Copy the code

You can also get a consistent view of the database in this closure, but variation is not allowed in read-only transactions. You can only retrieve the store, retrieve values, or replicate the database in a read-only transaction.

Bulk read and write transactions

Each db.update () waits for disk to commit a write. This overhead can be minimized by combining multiple updates with the db.batch () function:

err := db.Batch(func(tx *bolt.Tx) error {
    ...
    return nil
})Copy the code

Concurrent bulk calls can be combined into larger transactions. Batch processing is only useful if there are multiple Goroutine calls.

Batch can call a given function multiple times if part of the transaction fails. This function must be idempotent and only takes effect after a successful return from db.batch ().

For example, instead of displaying messages inside a function, set variables inside a closed scope:

var id uint64 err := db.Batch(func(tx *bolt.Tx) error { // Find last key in bucket, decode as bigendian uint64, increment // by one, encode back to []byte, and add new key. ... id = newValue return nil }) if err ! = nil { return ... } fmt.Println("Allocated ID %d", id)Copy the code

Managing transactions manually

Db.view () and db.update () functions are wrappers for db.begin (). These helper functions will start the transaction, execute a function, and then safely close the transaction if an error is returned. This is the recommended way to use Bolt trading.

However, sometimes you may need to start and close a transaction manually. You can use db.begin () directly, but be sure to close transactions.

// Start a writable transaction. tx, err := db.Begin(true) if err ! = nil { return err } defer tx.Rollback() // Use the transaction... _, err := tx.CreateBucket([]byte("MyBucket")) if err ! = nil { return err } // Commit the transaction and check for error. if err := tx.Commit(); err ! = nil { return err }Copy the code

The first argument to db.begin () is a Boolean value indicating whether the transaction is writable.

Using buckets

A store is a collection of key/value pairs in a database. All keys in a bucket must be unique. You can create a storage bucket using the db.createBucket () function:

db.Update(func(tx *bolt.Tx) error { b, err := tx.CreateBucket([]byte("MyBucket")) if err ! = nil { return fmt.Errorf("create bucket: %s", err) } return nil })Copy the code

Only in the use of Tx. CreateBucketIfNotExists () function does not exist, you can create a bucket. It is a common pattern to call this function for all top-level buckets after the database is opened, so you can guarantee that they exist for future transactions.

To delete a bucket, simply call the tx.deletebucket () function.

Using key/value pairs

To save a key/value pair to a bucket, use the bucket.put () function:

db.Update(func(tx *bolt.Tx) error {
    b := tx.Bucket([]byte("MyBucket"))
    err := b.Put([]byte("answer"), []byte("42"))
    return err
})Copy the code

This sets the value of the “Answer” key to “42” in the bucket of MyBucket. To retrieve the value, we can use the bucket.get () function:

db.View(func(tx *bolt.Tx) error { b := tx.Bucket([]byte("MyBucket")) v := b.Get([]byte("answer")) fmt.Printf("The answer  is: %s\n", v) return nil })Copy the code

The Get() function does not return an error because its operation is guaranteed to work (unless there is some kind of system failure). If the key exists, it returns its byte fragment value. If none exists, zero is returned. Note that you can set a zero-length value to a different key than one that does not exist.

Delete a key from a Bucket using bucket.delete ().

Note that the value returned from Get() is only valid if the transaction is open. If you need to use a value outside of a transaction, you must copy it to another byte fragment using copy().

Automatically increase the number of buckets

By using the NextSequence() function, you can tell Bolt to determine a sequence that can be used as a unique identifier for a key/value pair. Look at the following example.

// CreateUser saves u to the store. The new user ID is set on u once the data is persisted. func (s *Store) CreateUser(u  *User) error { return s.db.Update(func(tx *bolt.Tx) error { // Retrieve the users bucket. // This should be created when the DB is first opened. b := tx.Bucket([]byte("users")) // Generate ID for the user. // This returns an error only if the Tx is closed or not writeable. // That can't happen in an Update() call so I ignore the error check. id, _ := b.NextSequence() u.ID = int(id) // Marshal user data into bytes. buf, err := json.Marshal(u) if err ! = nil { return err } // Persist bytes to users bucket. return b.Put(itob(u.ID), buf) }) } // itob returns an 8-byte big endian representation of v. func itob(v int) []byte { b := make([]byte, 8) binary.BigEndian.PutUint64(b, uint64(v)) return b } type User struct { ID int ... }Copy the code

Iterative keys

Bolt stores keys in byte sorted order in a bucket. This makes the sequential iteration of these keys very fast. To traverse the key, we’ll use a Cursor:

db.View(func(tx *bolt.Tx) error { // Assume bucket exists and has keys b := tx.Bucket([]byte("MyBucket")) c := b.Cursor() for k, v := c.First(); k ! = nil; k, v = c.Next() { fmt.Printf("key=%s, value=%s\n", k, v) } return nil })Copy the code

A Cursor allows you to move to a specific point in a list of keys and move one key forward or backward at a time.

The Cursor has the following functions:

First()  Move to the first key.
Last()   Move to the last key.
Seek()   Move to a specific key.
Next()   Move to the next key.
Prev()   Move to the previous key.Copy the code

Each function has a return signature (key [] byte, value [] byte). When you iterate to the end of the cursor, Next() returns a zero key. Before calling Next() or Prev(), you must use First(), Last(), or Seek() to find the location. These functions will return a zero key if you are not looking for the location.

During iteration, if the key is non-zero but the value is zero, it means that the key refers to a bucket and not a value. Use bucket.bucket () to access subbuckets.

The prefix scanning

Iterate over keyword prefixes, combining Seek() with bytes.hasprefix () :

db.View(func(tx *bolt.Tx) error { // Assume bucket exists and has keys c := tx.Bucket([]byte("MyBucket")).Cursor() prefix := []byte("1234") for k, v := c.Seek(prefix); k ! = nil && bytes.HasPrefix(k, prefix); k, v = c.Next() { fmt.Printf("key=%s, value=%s\n", k, v) } return nil })Copy the code

Range scan

Another common use case is scanning a range, such as a time range. If you use a sortable time code (such as RFC3339), you can query for a specific date range, as follows:

db.View(func(tx *bolt.Tx) error { // Assume our events bucket exists and has RFC3339 encoded time keys. c := tx.Bucket([]byte("Events")).Cursor() // Our time range spans the 90's decade. min := []byte("1990-01-01T00:00:00Z") max := []byte("2000-01-01T00:00:00Z") // Iterate over the 90's. for k, v := c.Seek(min); k ! = nil && bytes.Compare(k, max) <= 0; k, v = c.Next() { fmt.Printf("%s: %s\n", k, v) } return nil })Copy the code

Note that although RFC3339 is sortable, the Golang implementation of RFC3339 nano does not use a fixed number of numbers after the decimal point and therefore cannot be sorted.

ForEach()

You can also use ForEach() if you know you’re iterating over all keys in your bucket:

db.View(func(tx *bolt.Tx) error {
    // Assume bucket exists and has keys
    b := tx.Bucket([]byte("MyBucket"))

    b.ForEach(func(k, v []byte) error {
        fmt.Printf("key=%s, value=%s\n", k, v)
        return nil
    })
    return nil
})Copy the code

Note that the keys and values in ForEach() are valid only when the transaction is open. If you need to use a key or value outside of the transaction, you must copy it to another byte slice using copy().

Nested bucket

You can also store a bucket in a key to create nested buckets. The API is the same as the storage management API on DB objects:

func (*Bucket) CreateBucket(key []byte) (*Bucket, error)
func (*Bucket) CreateBucketIfNotExists(key []byte) (*Bucket, error)
func (*Bucket) DeleteBucket(key []byte) errorCopy the code

Suppose you have a multi-tenant application where the root bucket is the account bucket. Inside this bucket is a series of accounts that are themselves a bucket. In a sequential bucket, you can have a number of stores associated with the account itself (users, remarks, and so on) separating information into logical groups.

// createUser creates a new user in the given account. func createUser(accountID int, u *User) error { // Start the transaction. tx, err := db.Begin(true) if err ! = nil { return err } defer tx.Rollback() // Retrieve the root bucket for the account. // Assume this has already been created when the account was set up. root := tx.Bucket([]byte(strconv.FormatUint(accountID, 10))) // Setup the users bucket. bkt, err := root.CreateBucketIfNotExists([]byte("USERS")) if err ! = nil { return err } // Generate an ID for the new user. userID, err := bkt.NextSequence() if err ! = nil { return err } u.ID = userID // Marshal and save the encoded user. if buf, err := json.Marshal(u); err ! = nil { return err } else if err := bkt.Put([]byte(strconv.FormatUint(u.ID, 10)), buf); err ! = nil { return err } // Commit the transaction. if err := tx.Commit(); err ! = nil { return err } return nil }Copy the code

Database Backup

Blot is a single file, so it’s easy to back up. You can use the tx.writeto () function to write a consistent view of the database to the destination. If you call it from a read-only transaction, it performs a hot backup without blocking reads and writes from other databases.

By default, it uses a regular file handle to take advantage of the operating system’s page cache. For information on optimizing larger than RAM datasets, see the Tx documentation.

A common use case is to take a backup over HTTP, so you can use a tool like cURL to take a database backup:

func BackupHandleFunc(w http.ResponseWriter, req *http.Request) { err := db.View(func(tx *bolt.Tx) error { w.Header().Set("Content-Type", "application/octet-stream") w.Header().Set("Content-Disposition", `attachment; filename="my.db"`) w.Header().Set("Content-Length", strconv.Itoa(int(tx.Size()))) _, err := tx.WriteTo(w) return err }) if err ! = nil { http.Error(w, err.Error(), http.StatusInternalServerError) } }Copy the code

Then you can backup with this command:

$ curl http://localhost/backup > my.dbCopy the code

Or you can open your browser to http://localhost/backup and it will download automatically. If you want to back up to another file, you can use the tx.copyFile () helper function.

Compare with other databases

Postgres, MySQL, & other relational databases

Relational databases structure data as rows and can only be accessed using SQL. This approach provides flexibility in how the data is stored and queried, but also incurs the overhead of parsing and planning SQL statements. Bolt accesses all data through the byte slice key. This allows Bolt to read and write data quickly, but does not provide built-in support for connected values.

Most relational databases (with the exception of SQLite) are standalone servers, separate from the server. This gives your system the flexibility to connect multiple application servers to a single database server, but it also increases the overhead of serializing and transferring data over the network. Bolt runs as a library included in the application, so all data access must go through the application’s processes. This brings the data closer to your application, but limits multiple processes’ access to the data.

LevelDB, RocksDB

LevelDB and its derivatives (RocksDB, HyperLevelDB) are similar to Bolt in that they are tied to the application, but their underlying structure is the log structure merge tree (LSM tree). The LSM tree optimizes random writes by using pre-write logs and multi-level sorting files called SSTables. Bolt uses B+ trees internally and has only one file. There are trade-offs in both approaches.

LevelDB might be a good choice if you need high random write throughput (> 10,000 W/SEC) or if you need to use spinning disks. Bolt may be a good choice if your app is reread or does a lot of range scanning.

Another important consideration is that LevelDB has no transactions. It supports bulk writing of key/value pairs, and it supports reading snapshots, but does not enable you to compare and swap safely. Bolt supports fully serializable ACID transactions.

LMDB

Bolt was originally a similar implementation of LMDB, so it is structurally similar. Both use B+ trees, have ACID semantics of fully serializable transactions, and use a single writer and multiple readers to support lock-free MVCC.

The two projects diverge somewhat. While LMDB focuses on raw performance, Bolt focuses on simplicity and ease of use. For example, LMDB allows unsafe operations, such as direct writes. Bolt chooses to disallow operations that could leave the database in a corrupt state. The only exception to this in Bolt is db.nosync.

There are also some API differences. LMDB requires the maximum Mmap size when mDB_ENV is opened, and Bolt will automatically handle incremental Mmap resizing. LMDB overloads getter and setter functions with multiple flags, while Bolt breaks these special cases into their own functions.

Precautions and restrictions

Choosing the right tool is important, and Bolt is no exception. When evaluating and using Bolt, note the following:

Bolt is suitable for reading intensive workloads. Sequential write performance is also fast, but random writes can be slow. You can use db.batch () or add write-ahead logs to help alleviate this problem.
Bolt uses B + trees internally, so it can have a lot of random page visits. SSDS can significantly improve performance compared to spinning disks.
Try to avoid long running read transactions. Bolt uses copy-on-write technology where old transactions are being used and old pages cannot be recycled.
The byte slice returned from Bolt is only valid for the duration of the transaction. Once transactions are committed or rolled back, the memory they point to can be reused by a new page, or it can be unmapped from virtual memory, and an unexpected failure address scare can be seen when visited.
Bolt uses an exclusive write lock on database files, so it cannot be shared by multiple processes
Be careful with bucket.fillPercent. Setting a high fill percentage with randomly inserted buckets results in poor page utilization of the database.
Typically, a larger bucket is used. Small buckets result in poor page utilization once they are larger than the page size (typically 4KB).
Loading a large batch of random writes into the new store can be slow because the page does not split until the transaction commits. It is not recommended to randomly insert more than 100,000 key/value pairs into a single new bucket in a single transaction.
Bolt uses memory-mapped files for the underlying operating system to handle caching of data. Typically, the operating system caches as many files as possible and frees memory to other processes as needed. This means Bolt can show very high memory usage when processing large databases. However, it is expected that the operating system will free up memory as needed. Bolt can handle databases that are much larger than the available physical RAM, as long as its memory mapping fits the process’s virtual address space. This can be problematic on 32-bit systems.
The data structures in Bolt’s database are memory-mapped, so the data files will be Endian specific. This means you can’t copy Bolt files from a small-endian machine to a big-endian machine and make it work. This is not a problem for most users because most modern cpus are small end.
Bolt cannot truncate the data file and return the free pages to disk because of the way the pages are laid out on disk. Instead, Bolt keeps a free list of unused pages in its data files. These free pages can be reused for future transactions. Since databases generally grow, this is a good approach for many use cases, however, it is important to note that deleting large chunks of data does not allow you to reclaim space on your disk.

This article is translated by Ran Xiaolong from Copernicus team.