Slice Preallocates memory
When slice’s capacity is less than 1024, the capacity grows by a factor of two. When the capacity is greater than 1024, the capacity is increased 1.25 times. Take a look at the following example:
func appendOne(num int) []int {
var res []int
for i := 0; i < num; i++ {
res = append(res, i)
}
return res
}
func appendMany(num int) []int {
res := make([]int.0, num)
for i := 0; i < num; i++ {
res = append(res, i)
}
return res
}
Copy the code
AppendOne does not specify the initial capacity; appendMany specifies the initial capacity. Here’s a benchmark test:
func BenchmarkAppendOne(b *testing.B) {
num := 10000
for i := 0; i < b.N; i++ {
_ = appendOne(num)
}
}
func BenchmarkAppendMany(b *testing.B) {
num := 10000
for i := 0; i < b.N; i++ {
_ = appendMany(num)
}
}
Copy the code
Run the test
$ go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: com.learn/gormLearn/gc_gc
BenchmarkAppendOne-4 23163 50675 ns/op 386296 B/op 20 allocs/op
BenchmarkAppendMany-4 96781 12241 ns/op 81920 B/op 1 allocs/op
PASS
Copy the code
AppendMany allocates memory once per operation, allocates 81920 B per operation, and spends 12241 ns per operation. Slice does not allocate memory as it expands the underlying array, and the old underlying data can still be reused, which obviously reduces GC pressure.
Similarly, you can specify the size of a new map.
func makeMap(num int){
m := make(map[int]int,num)
for i:=0; i<len(num); i++{ m[i]=i } }Copy the code
This reduces the overhead of memory copying as well as rehash overhead.
Map holds values instead of Pointers, using a segmented map
In the following example, map holds Pointers and values respectively.
func timeGC(a) time.Duration {
start := time.Now()
runtime.GC()
return time.Since(start)
}
func mapPointer(num int) {
m := make(map[int] *int, num)
for i := 0; i < num; i++ {
m[i] = &i
}
runtime.GC()
fmt.Printf("With %T, GC took %s\n", m, timeGC())
_ = m[0]}func mapValue(num int) {
m := make(map[int]int, num)
for i := 0; i < num; i++ {
m[i] = i
}
runtime.GC()
fmt.Printf("With %T, GC took %s\n", m, timeGC())
_ = m[0]}func mapPointerShard(num int) {
shards := make([]map[int] *int.100)
for i := range shards {
shards[i] = make(map[int] *int)}for i := 0; i < num; i++ {
shards[i%100][i] = &i
}
runtime.GC()
fmt.Printf("With map shards (%T), GC took %s\n", shards, timeGC())
_ = shards[0] [0]}func mapValueShard(num int) {
shards := make([]map[int]int.100)
for i := range shards {
shards[i] = make(map[int]int)}for i := 0; i < num; i++ {
shards[i%100][i] = i
}
runtime.GC()
fmt.Printf("With map shards (%T), GC took %s\n", shards, timeGC())
_ = shards[0] [0]}const N = 5e7 // 5000w
func BenchmarkMapPointer(b *testing.B) {
mapPointer(N)
}
func BenchmarkMapValue(b *testing.B) {
mapValue(N)
}
func BenchmarkMapPointerShard(b *testing.B) {
mapPointerShard(N)
}
func BenchmarkMapValueShard(b *testing.B) {
mapValueShard(N)
}
Copy the code
run
$ go testBench =^BenchmarkMapPointer$benchmem With map[int]*int, GC took 545.139836ms Goos: Darwin Goarch: amd64 PKG: bench=^BenchmarkMapPointer$benchmem With map[int]*int, GC took 545.139836ms Goos: Darwin Goarch: amd64 PKG: com.learn/gormLearn/gc_gc BenchmarkMapPointer-4 1 9532798100 ns/op 1387850488 B/op 724960 allocs/op $ gotest-bench=^BenchmarkMapPointerShard$-benchmem With map shards ([]map[int]*int), GC took 688.39764ms goos: Darwin goarch: amd64 pkg: com.learn/gormLearn/gc_gc BenchmarkMapPointerShard-4 1 20670458639 ns/op 4286763416 B/op 1901279 allocs/op $ gotestBench =^BenchmarkMapValueShard$-benchmem With map shards ([]map[int]int), GC took 1.965519ms goos: Darwin goarch: bench=^BenchmarkMapValueShard$-bench With map shards ([]map[int]int), GC took 1.965519ms goos: Darwin goarch: amd64 pkg: com.learn/gormLearn/gc_gc BenchmarkMapValueShard-4 1 16190847776 ns/op 4385268936 B/op 1918445 allocs/op $ gotestBench =^BenchmarkMapValue$benchmem With map[int]int, GC took 22.993926ms goos: Darwin goarch: amd64 PKG: bench=^BenchmarkMapValue$benchmem With map[int]int, GC took 22.993926ms goos: Darwin goarch: amd64 PKG: com.learn/gormLearn/gc_gc BenchmarkMapValue-4 1 8253025035 ns/op 1444338752 B/op 724512 allocs/opCopy the code
As you can see, using a segmented, value-saving map has the lowest GC time. Add GODEBUG=gctrace=1 to GC trace:
$ GODEBUG=gctrace=1 go test-bench=^BenchmarkMapPointer$ -benchmem ... The gc 3 @ 0.130 s 19% : 0.006+424+ 0.013ms clock, 0.027+0.18/424/848+ 0.055ms CPU, 1224->1224->1224 MB, 1225 MB goal, 4 P GC 4 @ 9.41s 2%: 0.005+543+0.002 ms clock, 0.022+0/543/1628+0.011 ms CPU, 1325->1325->1323 MB, 2448 MB goal, (2) the contract shall be forced. 0.003+547+0.003 ms clock, 0.013+0/547/1631+0.013 ms CPU, 1323->1323->1323 MB, 2647 MB goal, 4 P (forced) With map[int]*int, GC took 550.40821msCopy the code
To understand the printed log, we need to understand gCTrace, taking the 0.013+0/547/1631+0.013 ms CPU as an example, GC is divided into three phases.
- Mark Prepare (STW) 。
0.013
Represents the stop the WROLD time of the marking phase. - Marking.
0/547/1631
, 0 meansmutator assist
The time consuming,547
.1631
Are the time taken to mark GC. - Mark Termination (STW).
0.013
Represents the stop the WROLD time for the end of the tag.
$ GODEBUG=gctrace=1 go test -bench=^BenchmarkMapValue$ -benchmem ... The gc 3 @ 0.018 s 0% : 1224->1224->1224 MB, 1225 MB goal, 4 P gc 4 @8.334s 0%: 0.006+21+0.003 ms clock, 0.027+0/6.4/21+0.013 MS CPU, 1379->1379->1334 MB, 2448 MB 0.003+19+0.003 ms clock, 0.014+0/5.0/20+0.015 MS CPU, 1334->1334->1334 MB, 2668 MB goal, 4 P (forced)Copy the code
As you can see, map takes less time to save values than Pointers, mainly during the marking phase of GC.
Conversion of string to []byte
In Golang, strings are immutable by design. Thus, both string and []byte conversions produce a new copy.
func Example(a) {
s := "Hello,world"
b := []byte(s)
}
Copy the code
If you are sure that the converted String /[]byte will not be modified, you can do a direct conversion so that a copy of the original variable will not be generated. The new variables share the underlying data pointer.
func String2Bytes(s string) []byte {
stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
bh := reflect.SliceHeader{
Data: stringHeader.Data,
Len: stringHeader.Len,
Cap: stringHeader.Len,
}
return* (* []byte)(unsafe.Pointer(&bh))
}
func Bytes2String(b []byte) string {
sliceHeader
sh := reflect.StringHeader{
Data: sliceHeader.Data,
Len: sliceHeader.Len,
}
return* (*string)(unsafe.Pointer(&sh))
}
Copy the code
Function returns values using values, not Pointers
For a function that occupies little space and is frequently allocated, if the function returns a pointer, memory escape will occur, so that the memory that could be allocated on the stack will be allocated on the heap. Copying small objects on the stack performs well, much better than allocating objects on the heap. In the following example, two functions return a value and a pointer.
type S struct {
a, b, c int64
d, e, f string
g, h, i float64
}
func byCopy(a) S {
return S{
a: 1, b: 1, c: 1,
e: "lyp", f: "lyp",
g: 1.0, h: 1.0, i: 1.0,}}func byPointer(a) *S {
return &S{
a: 1, b: 1, c: 1,
e: "lyp", f: "lyp",
g: 1.0, h: 1.0, i: 1.0,}}Copy the code
Benchmark functions
func BenchmarkMemoryStack(b *testing.B) {
var s S
f, err := os.Create("stack.out")
iferr ! =nil {
panic(err)
}
defer f.Close()
err = trace.Start(f)
iferr ! =nil {
panic(err)
}
for i := 0; i < b.N; i++ {
s = byCopy()
}
trace.Stop()
b.StopTimer()
_ = fmt.Sprintf("%v", s.a)
}
func BenchmarkMemoryHeap(b *testing.B) {
var s *S
f, err := os.Create("heap.out")
iferr ! =nil {
panic(err)
}
defer f.Close()
err = trace.Start(f)
iferr ! =nil {
panic(err)
}
for i := 0; i < b.N; i++ {
s = byPointer()
}
trace.Stop()
b.StopTimer()
_ = fmt.Sprintf("%v", s.a)
}
Copy the code
run
go test. /... -bench=BenchmarkMemoryHeap -benchmem -run=^$ -count=10 goos: darwin goarch: amd64 pkg: Com. learn/gormLearn/ gc_GC BenchmarkMemoryHeap-4 19625536 53.0 ns/op 96 B/op 1 ALlocs /op gotest. /... -bench=BenchmarkMemoryStack -benchmem -run=^$ -count=10 goos: darwin goarch: amd64 pkg: Com. learn/gormLearn/ gc_GC BenchmarkMemoryStack-4 163253341 7.22 NS /op 0 B/op 0 ALlocs /opCopy the code
As you can see, stack allocation (return value) takes 7.22 ns/op, while heap allocation (return pointer) takes 53.0 ns/op.
Optimization using struct{}
In Golang, there is no set. If you want to implement a collection, you can use struct{} as the value.
func assign(num int) {
m := make(map[int]bool, num)
for i := 0; i < num; i++ {
m[i] = true}}func assignStruct(num int) {
m := make(map[int]struct{}, num)
for i := 0; i < num; i++ {
m[i] = struct{} {}}}Copy the code
Struct {} is optimized by the compiler to refer to the same memory address (runtime.zerobase).
Tools for GC analysis
- go tool pprof
- go tool trace
- Go build – gcflags = “- m”
- GODEBUG gctrace = = “1”
My official account: THE place to share lyP
My Zhihu column
My blog 1
My blog 2