Go ByteBufferPool of one library per day

Introduction to the

In programming, we often need to create and destroy objects of the same class frequently. Such operations are likely to have an impact on performance. In this case, the common optimization method is to use object pool. When we need to create an object, we first look it up in the object pool. If there is a free object, it is removed from the pool and returned to the caller for use. A new object is actually created only if there are no free objects in the pool. On the other hand, we do not destroy objects after they are used. Instead, it is put back into the object pool for later use. Using object pools can greatly improve performance when objects are frequently created and destroyed. At the same time, to avoid objects in the object pool occupy too much memory. Object pools typically also have specific cleanup strategies. One such example is the Go library sync.pool. Objects in sync.Pool are cleaned up by garbage collection.

A special class of objects is byte buffering (the underlying layer is typically byte slicing). When doing string concatenation, we usually store the intermediate result in a byte buffer for efficient concatenation. After the concatenation is complete, the result string is generated from the byte buffer. When sending and receiving network packets, it is also necessary to temporarily store incomplete packets in byte buffers.

The type bytes.buffer in the Go library encapsulates slices of bytes and provides some interfaces to use. We know that the capacity of slices is limited, and it needs to be expanded when the capacity is insufficient. Frequent capacity expansion may cause performance jitter. Bytebufferpool implements its own Buffer type and uses a simple algorithm to reduce the performance penalty of capacity expansion. Bytebufferpool has been used in the well-known Web framework Fasthttp and the flexible Go module library QuickTemplate. In fact, all three libraries are the same author: Valyala 😀.

Quick to use

The code in this article uses Go Modules.

Create directory and initialize:

$ mkdir bytebufferpool && cd bytebufferpool
$ go mod init github.com/darjun/go-daily-lib/bytebufferpool
Copy the code

Install byteBufferPool library:

$ go get -u github.com/PuerkitoBio/bytebufferpool
Copy the code

The typical usage is to Get a ByteBufferPool.buffer object through the Get() method provided by ByteBufferPool, call the object’s methods to write data, and then call byteBufferPool.put () to Put the object back into the object pool. Ex. :

package main

import (
  "fmt"

  "github.com/valyala/bytebufferpool"
)

func main(a) {
  b := bytebufferpool.Get()
  b.WriteString("hello")
  b.WriteByte(', ')
  b.WriteString(" world!")

  fmt.Println(b.String())

  bytebufferpool.Put(b)
}
Copy the code

Call the Get() and Put() methods of the byteBufferPool package directly, and the underlying operation is the default object pool in the package:

// bytebufferpool/pool.go
var defaultPool Pool

func Get(a) *ByteBuffer { return defaultPool.Get() }
func Put(b *ByteBuffer) { defaultPool.Put(b) }
Copy the code

Of course, we can create new object pools as needed, putting objects of the same purpose together (for example, we can create one object pool to assist receiving network packets, and one object pool to assist concatenating strings) :

func main(a) {
  joinPool := new(bytebufferpool.Pool)
  b := joinPool.Get()
  b.WriteString("hello")
  b.WriteByte(', ')
  b.WriteString(" world!")

  fmt.Println(b.String())

  joinPool.Put(b)
}
Copy the code

Bytebufferpool does not provide a specific creation function, but can be created using new.

Optimize the details

When objects are put back into the pool, they are processed according to the capacity of the current slice. Bytebufferpool divides the size into 20 ranges:

6 | | < 2 ^ 2 ^ 2 ^ 6 ~ 7-1 |… 25 | | > ^ 2

If it’s less than 2^6, it’s in the first range. If it’s between 2 to the sixth and 2 to the seventh minus 1, it falls in the second interval. And so on. After a sufficient number of callbacks, The ByteBufferPool recalibrates to calculate which range has the most objects. DefaultSize is set to the maximum capacity of the range, 2^6 for the first range, 2^7 for the second range, and 2^26 for the last range. If there are no free objects in the pool when you use Get() to request objects, set the capacity to defaultSize when you create a new object. In this way, slice expansion can be avoided during use, thus improving performance. Here is the code to understand:

// bytebufferpool/pool.go
const (
  minBitSize = 6 // 2**6=64 is a CPU cache line size
  steps      = 20

  minSize = 1 << minBitSize
  maxSize = 1 << (minBitSize + steps - 1)

  calibrateCallsThreshold = 42000
  maxPercentile           = 0.95
)

type Pool struct {
  calls       [steps]uint64
  calibrating uint64

  defaultSize uint64
  maxSize     uint64

  pool sync.Pool
}
Copy the code

As you can see, byteBufferPool internally uses the library object sync.pool.

There are 20 steps in total. The calls array records the number of times the size of the returned object fell in each range.

When we call pool.get () to put the object back, we first calculate the range in which the slice capacity fell and increment the value of the corresponding element in the calls array:

// bytebufferpool/pool.go
func (p *Pool) Put(b *ByteBuffer) {
  idx := index(len(b.B))

  if atomic.AddUint64(&p.calls[idx], 1) > calibrateCallsThreshold {
    p.calibrate()
  }

  maxSize := int(atomic.LoadUint64(&p.maxSize))
  if maxSize == 0 || cap(b.B) <= maxSize {
    b.Reset()
    p.pool.Put(b)
  }
}
Copy the code

Calibrate () if the element in the calls array exceeds the specified calibrateCallsThreshold=42000, then call Pool.Calibrate () to calibrate:

// bytebufferpool/pool.go
func (p *Pool) calibrate(a) {
  // Avoid and issue back objects that trigger 'calibrate'
  if! atomic.CompareAndSwapUint64(&p.calibrating,0.1) {
    return
  }

  // step 1
  a := make(callSizes, 0, steps)
  var callsSum uint64
  for i := uint64(0); i < steps; i++ {
    calls := atomic.SwapUint64(&p.calls[i], 0)
    callsSum += calls
    a = append(a, callSize{
      calls: calls,
      size:  minSize << i,
    })
  }
  sort.Sort(a)

  // step 2. Calculate defaultSize and maxSize
  defaultSize := a[0].size
  maxSize := defaultSize

  maxSum := uint64(float64(callsSum) * maxPercentile)
  callsSum = 0
  for i := 0; i < steps; i++ {
    if callsSum > maxSum {
      break
    }
    callsSum += a[i].calls
    size := a[i].size
    if size > maxSize {
      maxSize = size
    }
  }

  // step 3. Save the corresponding values
  atomic.StoreUint64(&p.defaultSize, defaultSize)
  atomic.StoreUint64(&p.maxSize, maxSize)

  atomic.StoreUint64(&p.calibrating, 0)}Copy the code

Step 1. Make statistics and sort

The Calls array records the number of times an object is put back into the corresponding range. In order of this number from largest to smallest. Note: minSize << I represents the upper limit of the range I.

Step 2. Calculate defaultSize and maxSize

DefaultSize is easy to understand, select the first size after sorting. MaxSize Value Records the maximum capacity of multiple objects that are put back more than 95% of the time. Its purpose is to prevent less used high-volume objects from being put back into the object pool and thus taking up too much memory. You can see the logic behind the second half of the pool.put () method:

// If the size of the object to be put back is greater than maxSize, it is not put back
maxSize := int(atomic.LoadUint64(&p.maxSize))
if maxSize == 0 || cap(b.B) <= maxSize {
  b.Reset()
  p.pool.Put(b)
}
Copy the code

Step 3. Save the corresponding values

When you use pool.get () to obtain objects, if there are no free objects in the Pool, the default capacity of the newly created object is defaultSize. This capacity can be used in most cases, avoiding slice expansion during use.

// bytebufferpool/pool.go
func (p *Pool) Get(a) *ByteBuffer {
  v := p.pool.Get()
  ifv ! =nil {
    return v.(*ByteBuffer)
  }
  return &ByteBuffer{
    B: make([]byte.0, atomic.LoadUint64(&p.defaultSize)),
  }
}
Copy the code

Some other details:

The minimum size is 2^6 = 64, because this is the size of the CPU cache line on a 64-bit computer. This size of data can be loaded into a CPU cache row at once, and any smaller is meaningless.
Used multiple times in codeatomicAtomic operation to avoid performance loss caused by locking.

Of course, the disadvantages of this library are obvious, because most of the capacity used is less than defaultSize, there is some memory waste.

conclusion

Without comments and empty lines, ByteBufferPool implemented a high-performance Buffer object pool in about 150 lines of code. The details are worth savoring. Reading high quality code will help you improve your coding skills and learn the details of coding. I strongly recommend taking time to read!!

If you find a fun and useful Go library, please Go to GitHub and submit issue😄

reference

Bytebufferpool GitHub:github.com/valyala/byt…
GitHub: github.com/darjun/go-d…

I

My blog is darjun.github. IO

Welcome to follow my wechat public account [GoUpUp], learn together, progress together ~

Go ByteBufferPool of one library per day

Introduction to the

Quick to use

Optimize the details

conclusion

reference

I

Related Posts

Soul Source Code Reading 06- Data synchronization overall process overview

Three lines of code solve a hundred if else!

GRPC is used to realize PHP, Java and GO intermodulation