Go development cache scenarios typically use a map or cache framework, with sync. map or a thread-safe cache framework used for thread safety.

In the cache scenario, if the amount of data is larger than the million level, special consideration should be given to the impact of the data type on the GC (note that string is pointer +Len+Cap, so it is a pointer type), but not if the cache key and value are non-pointer types. However, in practical application scenarios, it is common for key and value to be pointer type data. Therefore, special attention should be paid to the impact of cache framework on GC. From the perspective of impact on GC, cache framework can be roughly divided into two categories:

  • Zero GC overhead: Such as Freecache or BigCache, which is ringbuf based and reduces the number of Pointers;
  • There is GC overhead: cache framework implemented directly based on Map.

For a map, gc scans all key/value pairs, and if they are of basic type, then GC does not scan any more.

The following uses Freecache as an example to analyze its implementation principle. The code example is as follows:

func main(a) {
   cacheSize := 100 * 1024 * 1024
   cache := freecache.NewCache(cacheSize)

   for i := 0; i < N; i++ {
      str := strconv.Itoa(i)
      _ = cache.Set([]byte(str), []byte(str), 1)
   }

   now := time.Now()
   runtime.GC()
   fmt.Printf("freecache, GC took: %s\n", time.Since(now))
   _, _ = cache.Get([]byte("aa"))

   now = time.Now()
   for i := 0; i < N; i++ {
      str := strconv.Itoa(i)
      _, _ = cache.Get([]byte(str))
   }
   fmt.Printf("freecache, Get took: %s\n\n", time.Since(now))
}
Copy the code

1 the initialization

Freecache.NewCache initializes the local cache. Size indicates the storage space. Freecache initializes 256 segments. Each segment has a ringbuf with an initial size of size/256. Freecache’s claim to zero GC comes from the fact that it has 512 fixed Pointers, two for each segment (rb and slotData).

type segment struct {
   rb            RingBuf // ring buffer that stores data
   segId         int
   _             uint32  / / placeholder
   missCount     int64
   hitCount      int64
   entryCount    int64
   totalCount    int64      // number of entries in ring buffer, including deleted entries.
   totalTime     int64      // used to calculate least recent used entry.
   timer         Timer      // Timer giving current time
   totalEvacuate int64      // used for debug
   totalExpired  int64      // used for debug
   overwrites    int64      // used for debug
   touched       int64      // used for debug
   vacuumLen     int64      // up to vacuumLen, new data can be written without overwriting old data.
   slotLens      [256]int32 // The actual length for every slot.
   slotCap       int32      // max number of entry pointers a slot can hold.
   slotsData     []entryPtr // Index pointer
}

func NewCacheCustomTimer(size int, timer Timer) (cache *Cache) {
    cache = new(Cache)
    for i := 0; i < segmentCount; i++ {
        cache.segments[i] = newSegment(size/segmentCount, i, timer)
    }
}
func newSegment(bufSize int, segId int, timer Timer) (seg segment) {
    seg.rb = NewRingBuf(bufSize, 0)
    seg.segId = segId
    seg.timer = timer
    seg.vacuumLen = int64(bufSize)
    seg.slotCap = 1
    seg.slotsData = make([]entryPtr, 256*seg.slotCap) // Each slotData initializes 256 units
}
Copy the code

2 Read/Write Process

Freecache’s key and value are both []byte arrays, which need to be serialized and deserialized by themselves. If you cache complex objects, the serialization and deserialization can not be ignored.

_ = cache.Set([]byte(str), []byte(str), 1)
Copy the code

The Set process hashes the key. HashVal type uint64 the lower 8 bits segID corresponds to the segment array. The lower 8 to 15 bits segID corresponds to slotId and the higher 16 bits entryPtr corresponds to []. There is a search operation. Note that []entryPtr array size is slotCap(initially 1). SlotCap will be doubled during expansion.

Each segment has a lock (sync.mutex), so it can support large amounts of concurrency, unlike sync.map, which has only one lock.

func (cache *Cache) Set(key, value []byte, expireSeconds int) (err error) {
   hashVal := hashFunc(key)
   segID := hashVal & segmentAndOpVal / / low 8 bits
   cache.locks[segID].Lock() / / lock
   err = cache.segments[segID].set(key, value, hashVal, expireSeconds)
   cache.locks[segID].Unlock()
}

func (seg *segment) set(key, value []byte, hashVal uint64, expireSeconds int) (err error) {
   slotId := uint8(hashVal >> 8)
   hash16 := uint16(hashVal >> 16)
   slot := seg.getSlot(slotId)
   idx, match := seg.lookup(slot, hash16, key)

   var hdrBuf [ENTRY_HDR_SIZE]byte
   hdr := (*entryHdr)(unsafe.Pointer(&hdrBuf[0]))
   if match { // There are data update operations
      matchedPtr := &slot[idx]
      seg.rb.ReadAt(hdrBuf[:], matchedPtr.offset)
      hdr.slotId = slotId
      hdr.hash16 = hash16
      hdr.keyLen = uint16(len(key))
      originAccessTime := hdr.accessTime
      hdr.accessTime = now
      hdr.expireAt = expireAt
      hdr.valLen = uint32(len(value))
      if hdr.valCap >= hdr.valLen {
         // The existing value space can store the value size
         atomic.AddInt64(&seg.totalTime, int64(hdr.accessTime)-int64(originAccessTime))
         seg.rb.WriteAt(hdrBuf[:], matchedPtr.offset)
         seg.rb.WriteAt(value, matchedPtr.offset+ENTRY_HDR_SIZE+int64(hdr.keyLen))
         atomic.AddInt64(&seg.overwrites, 1)
         return
      }
      // Delete the corresponding entryPtr, which involves slotsData memory copy. Ringbug only marks deletion
      seg.delEntryPtr(slotId, slot, idx)
      match = false
      // increase capacity and limit entry len.
      for hdr.valCap < hdr.valLen {
         hdr.valCap *= 2
      }
      if hdr.valCap > uint32(maxKeyValLen-len(key)) {
         hdr.valCap = uint32(maxKeyValLen - len(key))
      }
   } else { / / no data
      hdr.slotId = slotId
      hdr.hash16 = hash16
      hdr.keyLen = uint16(len(key))
      hdr.accessTime = now
      hdr.expireAt = expireAt
      hdr.valLen = uint32(len(value))
      hdr.valCap = uint32(len(value))
      if hdr.valCap == 0 { // avoid infinite loop when increasing capacity.
         hdr.valCap = 1}}// The actual data length is ENTRY_HDR_SIZE=24 + the length of the key and value
   entryLen := ENTRY_HDR_SIZE + int64(len(key)) + int64(hdr.valCap)
   slotModified := seg.evacuate(entryLen, slotId, now)
   if slotModified {
      // the slot has been modified during evacuation, we need to looked up for the 'idx' again.
      // otherwise there would be index out of bound error.
      slot = seg.getSlot(slotId)
      idx, match = seg.lookup(slot, hash16, key)
      // assert(match == false)
   }
   newOff := seg.rb.End()
   seg.insertEntryPtr(slotId, hash16, newOff, idx, hdr.keyLen)
   seg.rb.Write(hdrBuf[:])
   seg.rb.Write(key)
   seg.rb.Write(value)
   seg.rb.Skip(int64(hdr.valCap - hdr.valLen))
   atomic.AddInt64(&seg.totalTime, int64(now))
   atomic.AddInt64(&seg.totalCount, 1)
   seg.vacuumLen -= entryLen
   return
}
Copy the code

Seg. Evacuate evaluates whether ringbuf has enough space to store the key/value. If not, OldOff := seg.rb.end () + seg.vacuumlen-seg.rb.size ()). If the corresponding data has been logically deleted or expired, Then the block memory can be reclaimed directly. If the reclamation condition is not met, the entry is replaced from the ring head to the ring end and the entry index is updated. If this loop does not work for five times, the current oldHdrBuf needs to be reclaimed to meet the memory requirements.

If the number of elements in []entryPtr is greater than seg. SlotCap (initial 1), you need to expand the space. See SEg. Expand for corresponding methods, which will not be described here.

To Write ringbuf, run rb.write.

func (seg *segment) evacuate(entryLen int64, slotId uint8, now uint32) (slotModified bool) {
   var oldHdrBuf [ENTRY_HDR_SIZE]byte
   consecutiveEvacuate := 0
   for seg.vacuumLen < entryLen {
      oldOff := seg.rb.End() + seg.vacuumLen - seg.rb.Size()
      seg.rb.ReadAt(oldHdrBuf[:], oldOff)
      oldHdr := (*entryHdr)(unsafe.Pointer(&oldHdrBuf[0]))
      oldEntryLen := ENTRY_HDR_SIZE + int64(oldHdr.keyLen) + int64(oldHdr.valCap)
      if oldHdr.deleted { / / deleted
         consecutiveEvacuate = 0
         atomic.AddInt64(&seg.totalTime, -int64(oldHdr.accessTime))
         atomic.AddInt64(&seg.totalCount, - 1)
         seg.vacuumLen += oldEntryLen
         continue} expired := oldHdr.expireAt ! =0 && oldHdr.expireAt < now
      leastRecentUsed := int64(oldHdr.accessTime)*atomic.LoadInt64(&seg.totalCount) <= atomic.LoadInt64(&seg.totalTime)
      if expired || leastRecentUsed || consecutiveEvacuate > 5 {
      // It can be recycled
         seg.delEntryPtrByOffset(oldHdr.slotId, oldHdr.hash16, oldOff)
         if oldHdr.slotId == slotId {
            slotModified = true
         }
         consecutiveEvacuate = 0
         atomic.AddInt64(&seg.totalTime, -int64(oldHdr.accessTime))
         atomic.AddInt64(&seg.totalCount, - 1)
         seg.vacuumLen += oldEntryLen
         if expired {
            atomic.AddInt64(&seg.totalExpired, 1)}else {
            atomic.AddInt64(&seg.totalEvacuate, 1)}}else {
         // evacuate an old entry that has been accessed recently for better cache hit rate.
         newOff := seg.rb.Evacuate(oldOff, int(oldEntryLen))
         seg.updateEntryPtr(oldHdr.slotId, oldHdr.hash16, oldOff, newOff)
         consecutiveEvacuate++
         atomic.AddInt64(&seg.totalEvacuate, 1)}}}Copy the code

The freecache Get process is relatively simple. Use hash to find the segment, slotId to find the index slot, and binary + traversal to find the data. If the segment is not found, ErrNotFound is returned, otherwise some time indicators are updated. The Get process also updates metrics related to cache hit ratio.

func (cache *Cache) Get(key []byte) (value []byte, err error) {
   hashVal := hashFunc(key)
   segID := hashVal & segmentAndOpVal
   cache.locks[segID].Lock()
   value, _, err = cache.segments[segID].get(key, nil, hashVal, false)
   cache.locks[segID].Unlock()
   return
}
func (seg *segment) get(key, buf []byte, hashVal uint64, peek bool) (value []byte, expireAt uint32, err error) {
   hdr, ptr, err := seg.locate(key, hashVal, peek) // hash+ Locate search
   iferr ! =nil {
      return
   }
   expireAt = hdr.expireAt
   if cap(buf) >= int(hdr.valLen) {
      value = buf[:hdr.valLen]
   } else {
      value = make([]byte, hdr.valLen)
   }

   seg.rb.ReadAt(value, ptr.offset+ENTRY_HDR_SIZE+int64(hdr.keyLen))
}
Copy the code

After locating the data, read ringbuf. Generally, the value read is the newly created memory space, so the operation of copying []byte data is involved.

3 summary

From the perspective of the pressure measurement performance of several common cache frameworks, the difference in Set performance is large but not of an order of magnitude, and the difference in Get performance is small. Therefore, for most scenarios, it is not necessary to pay too much attention to the Set/Get performance, and the focus should be on whether the function meets the business needs and gc impact. The comparison of the performance pressure measurement is as follows: Golang2.eddycjy.com/posts/ch5/0… .

There is a special scenario for caching, that is, all data is cached in memory. When it comes to updating, all data is updated (replaced) in full. In this scenario, if the size is not set properly, some data may be eliminated, which is not expected. To avoid this problem with Freecache, set the size to “large enough”, but be aware of the memory footprint.