Manual memory management is really bad (e.g. C++), but we have powerful automated systems that manage memory allocation and life cycles, freeing our hands.

However, if you want to fix the problem by tweaking the JVM garbage collector parameters or optimizing the memory allocation pattern of the GO code, this is not enough. Automated memory management helps us avoid most of these errors, but that’s only half the story. We have to build our software efficiently so that the garbage collection system works.

After building high-performance GO service Centrifuge, we can share memory-related things we learn here. Centrifuge can handle hundreds of events per second. Centrifuge is a key part of Segment’s infrastructure. Consistency and predictable behavior are a must. Clean, efficient, and accurate use of memory is an important part of achieving consistency.

In this article, we introduce common patterns that lead to inefficiencies and production accidents related to memory allocation, as well as practical ways to eliminate these problems. We will focus on the core mechanism of the allocator to provide developers with a way to handle memory usage.

The use of tools

The first thing we recommend is to avoid optimizing too early. Go provides excellent analysis tools that point directly to memory-intensive parts of the code. There is no need to rebuild the wheel, we can refer directly to the Go official article. It provides a solid demo for CPU and allocation analysis using Pprof. This is the tool we use in Segment to find bottlenecks in production Go code, and learning to use pprof is a basic requirement.

In addition, use data to drive your optimizations.

Escape analysis

Go automatically manages memory allocation. This prevents a large class of potential errors, but it does not eliminate the mechanics of allocation.

The first thing to remember is that stack allocation is cheap and heap allocation is expensive. Let’s see what that means.

Go allocates memory in two places: the global heap for dynamic allocation, and the local stack for each Goroutine. Go tends to allocate on the stack —- Most Go programs allocate on the stack. Stack allocation is cheap because it requires only two CPU instructions: one to allocate onto the stack and the other to release from the stack.

Unfortunately, not all data can use memory allocated on the stack. Stack allocation requires that the lifetime and memory footprint of a variable be determined at compile time. However, dynamic allocation on the heap occurs at run time. Malloc must find a chunk of free memory large enough to hold the new value. The garbage collector then scans the heap for objects that are no longer referenced. No doubt, it is much more expensive than the two instructions used for stack allocation.

The compiler uses escape analysis to select the heap or stack. The basic idea is to do garbage collection at compile time. The compiler tracks the scope of code domain variables. It uses trace data to check which variables have a fully knowable life cycle. If the variable passes these checks, it can be allocated on the stack. If it doesn’t pass, known as escape, it must be allocated on the heap.

Escape analysis rules are not explicitly stated in the GO language. For Go programmers, the most direct way to learn the rules is to experiment. You can see the result of escape analysis by adding go build-gcFlags ‘-m’ to the build time. Let’s look at an example.

package main

import "fmt"

func main() {
        x := 42
        fmt.Println(x)
}
Copy the code
$ go build -gcflags '-m' ./main.go
# command-line-arguments
./main.go:7: x escapes to heap
./main.go:7: main ... argument does not escape
Copy the code

We see here that the variable X “escapes to the heap” because it is allocated dynamically on the heap at run time. This example can be a little confusing. To the naked eye, it is obvious that the x variable will not escape on main(). The compiler output does not explain why it thinks the variable escaped. To see more detail, add a -m argument to see more output

$ go build -gcflags '-m -m' ./main.go
# command-line-arguments
./main.go:5: cannot inline main: non-leaf function
./main.go:7: x escapes to heap
./main.go:7:         from ... argument (arg to ...) at ./main.go:7
./main.go:7:         from *(... argument) (indirection) at ./main.go:7
./main.go:7:         from ... argument (passed to call[argument content escapes]) at ./main.go:7
./main.go:7: main ... argument does not escape
Copy the code

So, x escapes because it’s passed into a method parameter, and that method parameter escapes by itself. We’ll see more of this later.

Rules may seem arbitrary, but with the tool’s trial, some patterns emerge. Here are some typical scenarios that lead to escape:

  • Sends a pointer or a value with a pointer to a channel. There is no way to know at compile time which Goroutine will receive data from a channel. Therefore, the compiler cannot determine when this data is no longer referenced.
  • Store Pointers or values with Pointers in slice. An example of this is[]*string. It always causes the contents of slice to escape. Although the underlying array of the slice is still on the heap, the referenced data escapes onto the heap.
  • Slice’s underlying array is refragmented as append operations exceed its capacity. If the initial size of the slice is known at compile time, it is allocated on the stack. If the underlying storage of the slice must be extended, the data is retrieved at run time. It will be allocated on the heap.
  • Invoke a method on an interface type. Method calls to interface types are dynamic invocations — the implementation of the interface can only be determined at run time. Consider an interface of typeio.ReaderThe variables ofr. rightr.Read(b)The call to therAnd byte slicebAll of the underlying arrays escape and are therefore allocated on the heap.

In our experience, these four cases are the most common dynamic allocation cases in Go programs. There are some solutions to these situations. Next, we’ll delve into some concrete examples of how to address memory inefficiencies in production software.

Pointer to the relevant

The rule of thumb is: Pointers point to allocated data on the heap. Therefore, reducing the number of Pointers in your program reduces the number of heap allocations. This is not an axiom, but we find it common in real-world Go programs.

A common assumption we intuitively make goes something like this: “Copying values is expensive, so I’ll use Pointers.” In many cases, however, copying values is much cheaper than using Pointers. Why, you may ask.

  • When dereferencing a pointer, the compiler generates a check. Its purpose is, if the pointer is nil, to runpanic()To avoid memory corruption. This extra code must be run at run time. If data is passed by value, it won’t be nil.
  • Pointers generally have poor reference locality. All values used in the function are collated in stack memory. Reference locality is an important aspect of code efficiency. It greatly increases the likelihood of variables getting hot in CPU caches and reduces the risk of a miss when prefetching.
  • Copying an object in a cached row is roughly equivalent to copying a single pointer. The CPU moves memory between the cache layer and the main cache line of constant size. On x86, the cache line is 64 bytes. In addition, Go uses a technology called Duff’s Devices to make common memory operations such as copying very efficient.

Pointers should be used primarily to reflect member ownership and variability. In practice, using Pointers to avoid copying should be unusual. Don’t fall into the trap of premature optimization. Passing data by value is a good habit; use Pointers only when necessary. In addition, value passing increases security by eliminating nil.

Reducing the number of Pointers in a program can have another useful result, because the garbage collector skips memory areas that do not contain Pointers. For example, the slice heap region of the return type []byte is not scanned at all. The same is true for structure-type arrays that do not contain any fields of pointer type.

Reducing Pointers not only reduces garbage collection, but also results in cache-friendly code. Reading memory moves data from main memory to the CPU cache. Caches are priority, so some data must be cleared to make room. The data flushed from the cache may be related to other parts of the program. The resulting cache jitter can lead to unexpected behavior and sudden changes in production services.

Pointer to further

Reducing pointer use often means delving into the source code of the type used to build the program. Our service Emulsion retains a failed operation queue as a retry circular buffer for retries, which contains a set of data structures as follows:

type retryQueue struct {
    buckets       [][]retryItem // each bucket represents a 1 second interval
    currentTime   time.Time
    currentOffset int
}

type retryItem struct {
    id   ksuid.KSUID // ID of the item to retry
    time time.Time   // exact time at which the item has to be retried
}
Copy the code

The external size of the array Buckets is a constant value, but the items contained in []retryItem change at run time. The more retries, the larger the slices.

Digging into the retryItem details, we see that KSUID is a [20]byte type of the same name that contains no Pointers and is therefore excluded by the escape rule. CurrentOffset, which is an int value, is a fixed size original value that can also be excluded. Time.Time implementation:

type Time struct {
    sec  int64
    nsec int32
    loc  *Location // pointer to the time zone structure
}
Copy the code

The time. time structure contains a pointer to the LOC. Using it inside retryItem causes the GC to mark the pointer on the struct every time a variable passes through the heap area.

We find that this is typical of cascading effects in unexpected situations. In general, operation failures are rare. There is only a small amount of memory to hold the retries variable. As failed operations proliferate, retry queues can grow to thousands per second, which can greatly increase the workload of the garbage collector.

For this particular use case, the time information for time.time is actually unnecessary. These timestamps are stored in memory and are never serialized. These data structures can be refactored to avoid the time type altogether.

type retryItem struct {
    id   ksuid.KSUID
    nsec uint32
    sec  int64
}

func (item *retryItem) time() time.Time {
    return time.Unix(item.sec, int64(item.nsec))
}

func makeRetryItem(id ksuid.KSUID, time time.Time) retryItem {
    return retryItem{
        id:   id,
        nsec: uint32(time.Nanosecond()),
        sec:  time.Unix(),
}
Copy the code

Now retryItem does not contain any Pointers. This greatly reduces the garbage collector’s workload, and the compiler knows the entire footprint of retryItem.

Please Slice for me.

Slice can easily lead to inefficient allocation code. Unless the compiler knows the size of slice, the underlying array of Slice (and maps) is allocated to the heap. Let’s look at some ways to have slice allocated on the stack instead of the heap.

After cooling, Mysql is used in centralized mode. The efficiency of the entire program depends heavily on the efficiency of the Mysql driver. After analyzing the allocation behavior using pprof, we found that the Go MySQL driver code serializes the time. time value is very expensive.

The analyzer shows that most of the heap allocation is in the code that serializes time.time.

The relevant code is in the Format that calls time. time, which returns a string. Wait, aren’t we talking about slices? Well, according to the Go official documentation, a string is just read bytes slices, with a little extra language-level support. Most allocation rules apply!

Analysis of the data tells us that a large number of allocations, 12.38%, are generated in the Format method being run. What does this Format do?

It turns out there’s a more efficient way to do the same thing. While the Format() method is convenient and easy, we use AppendFormat() to make it easier on the allocator. Looking at the source code base, we notice that all internal uses are AppendFormat() instead of Format(), which is an important reminder that AppendFormat() is much better.

In fact, the Format method simply wraps the AppendFormat method:

func (t Time) Format(layout string) string {
          const bufSize = 64
          var b []byte
          max := len(layout) + 10
          if max < bufSize {
                  var buf [bufSize]byte
                  b = buf[:0]
          } else {
                  b = make([]byte, 0, max)
          }
          b = t.AppendFormat(b, layout)
          return string(b)
}
Copy the code

More importantly, AppendFormat() gives the programmer more control over allocation. Pass slice instead of allocating it internally like Format() itself. AppendFormat() allows fixed-size slice allocations, as opposed to Format, so memory allocation is on top of stack space.

Take a look at the PR we proposed for the Go MySQL Driver

First notice that var a [64]byte is a fixed-size array. At compile time we know its size and its scope is only in this method, so we know it will be allocated in stack space.

However, this type cannot be passed to AppendFormat(), which only accepts []byte. Converts a fixed-size array to the slice type represented by B supported by this array using the notation a[:0]. This is checked by the compiler, and memory is allocated on the stack.

More critically, AppendFormat(), which itself passes the compiler stack allocation check. In the previous version of Format(), the compiler could not determine how much memory needed to be allocated, so it did not comply with stack allocation rules.

This small change greatly reduces the heap allocation for this portion of code! Similar to the “attach mode” we use in MySQL drivers. In this PR, the KSUID type uses the Append() method. In the hot path code, KSUID uses Append() mode to handle fixed-size buffers instead of String() methods, saving a similar amount of dynamic heap allocation. It is also worth noting that the Strconv package uses the same Append pattern for converting strings containing numbers to numeric types.

The interface type

It is well known that method calls on interface types are much more expensive than on struct types. Method calls of the interface type are executed through dynamic scheduling. This severely limits the compiler’s ability to determine how code will execute at run time. So far, we’ve largely covered fixed-type code so that the compiler can best understand its behavior at compile time. Interface types discard all these rules!

Unfortunately, interface types are very useful at the level of abstraction – they allow us to write more flexible code. A related example of hot path code commonly used in programs is the hash package provided by the standard library. The Hash package defines a set of generic interfaces and provides several concrete implementations. Let’s look at an example.

package main

import (
        "fmt"
        "hash/fnv"
)

func hashIt(in string) uint64 {
        h := fnv.New64a()
        h.Write([]byte(in))
        out := h.Sum64()
        return out
}

func main() {
        s := "hello"
        fmt.Printf("The FNV64a hash of '%v' is '%v'\n", s, hashIt(s))
}
Copy the code

Construction inspection Escape analysis results:

./foo1.go:9:17: inlining call to fnv.New64a
./foo1.go:10:16: ([]byte)(in) escapes to heap
./foo1.go:9:17: hash.Hash64(&fnv.s·2) escapes to heap
./foo1.go:9:17: &fnv.s·2 escapes to heap
./foo1.go:9:17: moved to heap: fnv.s·2
./foo1.go:8:24: hashIt in does not escape
./foo1.go:17:13: s escapes to heap
./foo1.go:17:59: hashIt(s) escapes to heap
./foo1.go:17:12: main ... argument does not escape
Copy the code

That is, the hash object, the input string, and the []byte representing the input all escape to the heap. It’s obvious to us that it won’t escape, but the interface type limits the compiler. There is no way to safely use a concrete implementation without using the hash package’s interface. So what should productivity developers do?

We encounter this problem when building emulsion ge, which hashes small strings in a hot code path without encryption. So we built the Fasthash library. Building it is straightforward, and the hard work is still done in the standard library. Fasthash simply repackaged the standard library without using heap allocation.

Take a look directly at the Fasthash version of the code

package main

import (
        "fmt"
        "github.com/segmentio/fasthash/fnv1a"
)

func hashIt(in string) uint64 {
        out := fnv1a.HashString64(in)
        return out
}

func main() {
        s := "hello"
        fmt.Printf("The FNV64a hash of '%v' is '%v'\n", s, hashIt(s))
}
Copy the code

Take a look at the escape analysis output

./foo2.go:9:24: hashIt in does not escape
./foo2.go:16:13: s escapes to heap
./foo2.go:16:59: hashIt(s) escapes to heap
./foo2.go:16:12: main ... argument does not escape
Copy the code

The only escape that occurs is because of the dynamic nature of the fmt.printf () method. Although we usually prefer to use standard libraries, there are situations where there are trade-offs to improve allocation efficiency.

A little tip

The last thing we did, it wasn’t practical but it was interesting. It helps us understand the compiler’s escape analysis mechanism. When looking at the standard library of optimizations covered, we came across a rather strange piece of code.

// noescape hides a pointer from escape analysis.  noescape is
// the identity function but escape analysis doesn't think the // output depends on the input. noescape is inlined and currently // compiles down to zero instructions. //  USE CAREFULLY! //go:nosplit func noescape(p unsafe.Pointer) unsafe.Pointer { x := uintptr(p) return unsafe.Pointer(x ^ 0) }Copy the code

This method allows the passed pointer to escape the compiler’s escape analysis. So what does this mean? Let’s set up an experiment and see.

package main

import (
        "unsafe"
)

type Foo struct {
        S *string
}

func (f *Foo) String() string {
        return *f.S
}

type FooTrick struct {
        S unsafe.Pointer
}

func (f *FooTrick) String() string {
        return *(*string)(f.S)
}

func NewFoo(s string) Foo {
        return Foo{S: &s}
}

func NewFooTrick(s string) FooTrick {
        return FooTrick{S: noescape(unsafe.Pointer(&s))}
}

func noescape(p unsafe.Pointer) unsafe.Pointer {
        x := uintptr(p)
        return unsafe.Pointer(x ^ 0)
}

func main() {
        s := "hello"
        f1 := NewFoo(s)
        f2 := NewFooTrick(s)
        s1 := f1.String()
        s2 := f2.String()
}
Copy the code

This code contains two implementations of the same task: they contain a String and return the held String using the String() method. However, the compiler’s escape analysis shows that the FooTrick version does not escape at all.

./foo3.go:24:16: &s escapes to heap
./foo3.go:23:23: moved to heap: s
./foo3.go:27:28: NewFooTrick s does not escape
./foo3.go:28:45: NewFooTrick &s does not escape
./foo3.go:31:33: noescape p does not escape
./foo3.go:38:14: main &s does not escape
./foo3.go:39:19: main &s does not escape
./foo3.go:40:17: main f1 does not escape
./foo3.go:41:17: main f2 does not escape
Copy the code

These two lines are the most relevant

./foo3.go:24:16: &s escapes to heap
./foo3.go:23:23: moved to heap: s
Copy the code

This is because the compiler thinks the NewFoo() method took a string reference and stored it in the structure, causing an escape. But the NewFooTrick() method has no such output. If noescape() is removed, escape analysis moves the data referenced by the FooTrick structure onto the heap. What’s going on here?

func noescape(p unsafe.Pointer) unsafe.Pointer {
    x := uintptr(p)
    return unsafe.Pointer(x ^ 0)
}
Copy the code

The noescape() method hides the direct dependency between the input parameter and the return value. The compiler doesn’t think p will escape through x because the uintptr() produces an opaque reference to the compiler. The name of the built-in Uintptr type would lead one to believe that it is a true pointer type, but from the compiler’s point of view, it is just an integer that happens to be big enough to store Pointers. The last line of code constructs and returns a safe.Pointer value that looks like an arbitrary integer.

To be clear, we do not recommend using this technique. That’s why the package it refers to is called unsafe, and its annotation says USE CAREFULLY!

conclusion

Let’s summarize the key points:

  1. Don’t optimize too soon! Use data to drive optimization efforts
  2. Stack allocation is cheap, heap allocation is expensive
  3. Understanding the rules of escape analysis allows us to write more efficient code
  4. Using Pointers is almost never allocated on the stack
  5. Look for apis that provide allocation control in performance-critical code snippets
  6. Use interface types discreetly in hot code paths
Original link:Segment.com/blog/alloca…