Golang Gc related!

Program started

Go procedures to start, the first absolute initialization of a bunch of resources, on how to start the need to see Go to the assembly code!

// The bootstrap sequence is:
//
//	call osinit
//	call schedinit
//	make & queue new G
/ / call the runtime mstart
//
// The new G calls Runtime ·main
func schedinit(a) {
	// raceinit must be the first call to race detector.
	// In particular, it must be done before mallocinit below calls racemapshadow.
/ /...
	gcinit()
/ /...
}
Copy the code

First, golang.org/src/runtime… is called when a Go program starts. The roughly startup logic is the same as in this comment, starting OS first, schedle then schedle again

The gc initialization

func gcinit(a) {
	ifunsafe.Sizeof(workbuf{}) ! = _WorkbufSize { throw("size of Workbuf is suboptimal")}// No sweep on the first cycle.
	mheap_.sweepdone = 1

	// Set a reasonable initial GC trigger. Core focus with this! Is the threshold that triggers GC collection, which defaults to 0.875
	memstats.triggerRatio = 7 / 8.0

	// Fake a heap_marked value so it looks like a trigger at
	// heapminimum is the appropriate growth from heap_marked.
	// This will go into computing the initial GC goal.
	memstats.heap_marked = uint64(float64(heapminimum) / (1 + memstats.triggerRatio))

	// Set gcpercent from the environment. This will also compute
	// and set the GC trigger and goal.
	_ = setGCPercent(readgogc())

	work.startSema = 1
	work.markDoneSema = 1
}
Copy the code

1, triggerRatio

Set a reasonable initial GC trigger. Core focus with this! This is the threshold that triggers GC collection. The default value is 0.875, meaning that the heap is collected when the last 1+(7/0.8) value is larger

If the heap size is 100 MB and the heap size is 200 MB, then the heap size is (20m-100m)/100M>7/0.8.

This value can be set according to GOGC/100, and GOGC = off will completely disable garbage collection depending on the business

2, heapminimum

Heapminimum is the minimum heap size to trigger GC.

During initialization, this is set to 4MB * GOGC / 100

GC execution classification

The GOGC execution allocates the following classes roughly

  • gcTriggerHeap
  • gcTriggerTime
  • gcTriggerCycle

func (t gcTrigger) test(a) bool {
	if! memstats.enablegc || panicking ! =0|| gcphase ! = _GCoff {return false
	}
	switch t.kind {
	case gcTriggerHeap:
		// Non-atomic access to heap_live for performance. If
		// we are going to trigger on this, this thread just
		// atomically wrote heap_live anyway and we'll see our
		// own write.
    return memstats.heap_live >= memstats.gc_trigger // The number of objects alive in the heap is greater than the threshold that the GC needs to trigger (which was set in the last GC)
	case gcTriggerTime:
		if gcpercent < 0 {
			return false
		}
		lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
		returnlastgc ! =0 && t.now-lastgc > forcegcperiod // The difference between the current time and the last GC time is 2 minutes
	case gcTriggerCycle:
		// t.n > work.cycles, but accounting for wraparound.
		return int32(t.n-work.cycles) > 0
	}
	return true
}
Copy the code

1. Periodic GC

func forcegchelper(a) {
	forcegc.g = getg()
	for {
		lock(&forcegc.lock)
		ifforcegc.idle ! =0 {
			throw("forcegc: phase error")
		}
		atomic.Store(&forcegc.idle, 1)
		goparkunlock(&forcegc.lock, waitReasonForceGGIdle, traceEvGoBlock, 1)
		// this goroutine is explicitly resumed by sysmon
		if debug.gctrace > 0 {
			println("GC forced")}// Time-triggered, fully concurrent.
		gcStart(gcTrigger{kind: gcTriggerTime, now: nanotime()})
	}
}
Copy the code

You can see that when you call it, you put a lock on it, then you hang it up and wait for someone to wake up to gc, and then look at the sysmon function, this value usually doesn’t change!

// forcegcperiod is the maximum time in nanoseconds between garbage
// collections. If we go this long without a garbage collection, one
// is forced to run.
//
// This is a variable for testing purposes. It normally doesn't change.
var forcegcperiod int64 = 2 * 60 * 1e9

// Always runs without a P, so write barriers are not allowed.
//
//go:nowritebarrierrec
func sysmon(a) {
  //...
		// check if we need to force a GC
  // The first gc cycle is required, and the second forcegc.idle is not 0
		ift := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) ! =0 {
			lock(&forcegc.lock)
			forcegc.idle = 0
			var list gList
			list.push(forcegc.g)
			injectglist(&list)
			unlock(&forcegc.lock)
		}
  ////......
}
Copy the code

2, malloc gc

// Allocate an object of size bytes.
// Small objects are allocated from the per-P cache's free lists.
// Large objects (> 32 kB) are allocated straight from the heap.
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
	/ /...
  // Whether gc is required
	shouldhelpgc := false
	if size <= maxSmallSize { / / less than 32 k
		if noscan && size < maxTinySize { // Less than 16 bytes

		} else{}else {
		var s *mspan
		shouldhelpgc = true
		systemstack(func(a) {
			s = largeAlloc(size, needzero, noscan)
		})
		s.freeindex = 1
		s.allocCount = 1
		x = unsafe.Pointer(s.base())
		size = s.elemsize
	}
	//...
	if shouldhelpgc {
		if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
			gcStart(t)
		}
	}

	return x
}
Copy the code

Here you can see the need to determine whether gc is required or not. Generally, if an object larger than 32K is allocated, gc is checked

Force GC

This behavior is generally not recommended for users to perform on their own, first it will force the blocking program! The second is that go’s GC does not reclaim the actual allocated physical memory, so it is still up to the system to enforce the collection!

// GC runs a garbage collection and blocks the caller until the
// garbage collection is complete. It may also block the entire
// program.
func GC(a) {
	// We consider a cycle to be: sweep termination, mark, mark
	// termination, and sweep. This function shouldn't return
	// until a full cycle has been completed, from beginning to
	// end. Hence, we always want to finish up the current cycle
	// and start a new one. That means:
	//
	// 1. In sweep termination, mark, or mark termination of cycle
	// N, wait until mark termination N completes and transitions
	// to sweep N.
	//
	// 2. In sweep N, help with sweep N.
	//
	// At this point we can begin a full cycle N+1.
	//
	// 3. Trigger cycle N+1 by starting sweep termination N+1.
	//
	// 4. Wait for mark termination N+1 to complete.
	//
	// 5. Help with sweep N+1 until it's done.
	//
	// This all has to be written to deal with the fact that the
	// GC may move ahead on its own. For example, when we block
	// until mark termination N, we may wake up in cycle N+2.

	// Wait until the current sweep termination, mark, and mark
	// termination complete.
	n := atomic.Load(&work.cycles)
	gcWaitOnMark(n)

	// We're now in sweep N or later. Trigger GC cycle N+1, which
	// will first finish sweep N if necessary and then enter sweep
	// termination N+1.
	gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})

	// Wait for mark termination N+1 to complete.
	gcWaitOnMark(n + 1)

/ /...
}
Copy the code

The GC test

1. GC test for memory expansion

Here’s a test of Malloc’s method

package main

import (
	"fmt"
	"time"
)

var (
	appender = make([] []byte.0.100))// GODEBUG=gctrace=1
func main(a) {
	ticker := time.NewTicker(time.Second * 1)
	count := 0
	alloc := 4 << 20
	for {
		<-ticker.C
		appender = append(appender, make([]byte, alloc))
		count++
		fmt.Printf("% DTH allocated space: %dm\n", count, alloc>>20)}}Copy the code

GOGC=100 GODEBUG=gctrace=1

➜ GC git:(Master) Qualify GOGC=100 GODEBUG=gctrace=1 bin/app 4->4->4 MB, 5 MB goal, 12 P 第2次 空间: 4m gc 2 @2.003s 0%: 0.003+0.082+0.024 ms clock, 0.040+0.060/0.019/0.059+0.29 ms CPU, 8->8->8 MB, 9 MB goal, 12 P A % b % d % d 0.021+0.20+0.009 ms clock, 0.25+0.10/0.099/0.12+0.11 ms CPU, 16->16->16 MB, 17 MB goal, 12 P 第5次 区 域 : 4m 第6次 区 域 : M # m % s % m 0.005+0.19+0.008 ms clock, 0.061+0.10/0.074/0.086+ 0.10ms CPU, 32->32->32 MB, 33 MB goal, 12 P 第9次 区 别 : 4m 第10次 区 别 : 4m 11th allocated space: 4m 12th allocated space: 4m 13th allocated space: 4m 14th allocated space: 4m 15th allocated space: 4m 16th allocated space: 4m 0.015+0.21+0.011 MS clock, 0.18+0.14/0.062/0.15+0.13 ms CPU, 64->64->64 MB, 65 MB goal, 12 PCopy the code

The first allocation is GC, which is exactly what the default setting is. When the second allocation is GC, the threshold for the next allocation will be 16 because 8/4>1. If we set GOGC=50, we will continue

➜ GC git:(Master) Qualify GOGC=50 GODEBUG=gctrace=1 bin/app 0.011+0.18+0.016 ms clock, 0.13+0.10/0.024/0.16+0.19 ms CPU, 4->4->4 MB, 5 MB goal, 12 P 第2次 空间: 4m gc 2 @2.002s 0%: 0.006+0.27+ 0.010ms clock, 0.078+0.12/0.12/0.12+ 0.12ms CPU, 8->8->8 MB, 9 MB goal, 12 P 0.006+0.21+0.010 ms clock, 0.083+0.10/0.12/0.11+0.12 ms CPU, 12->12->12 MB, 13 MB goal, 12 P 4M GC 4 @5.005s 0%: 0.006+0.24+ 0.012ms clock, 0.080+0.10/0.055/0.22+ 0.14ms CPU, 20->20->20 MB, 21 MB goal, 12 PCopy the code

You can see that the memory for the third GC is 12M, because the heap memory after the last GC was 8m, so the next gc will be 8m*1.5=12m, so it fits exactly!

3. If garbage collection is turned off

➜ gc git:(Master) Qualify GOGC=off GODEBUG=gctrace=1 bin/app The first allocation: 4m The second allocation: 4M the third allocation: 4M The fourth allocation: 4M the fifth allocation: 4mCopy the code

About GC log learning

gctrace: setting gctrace=1 causes the garbage collector to emit a single line to standard
error at each collection, summarizing the amount of memory collected and the
length of the pause. The format of this line is subject to change.
Currently, it is:
  gc # @#s #%: #+#+# ms clock, #+#/#/#+# ms cpu, #->#-># MB, # MB goal, # P
where the fields are as follows:
  gc #        the GC number, incremented at each GC
@ #s         time inSeconds since Program start, specifies the start time of the program, in seconds
  #%          percentage of time spent inGC Since Program Start, the percentage of time spent
  #+... +This is the number of phases that will be used by the CPU
  #->#-># MB heap size at GC start, at GC end, and live heap,
  #MB goal Goal heap size
  #P number of processors used P
Copy the code

2. Enforce GC

package main

import (
	"runtime"
	"time"
)

// GOGC=100 GODEBUG=gctrace=1
func main(a){_ =make([]byte.0.3<<20)
	runtime.GC()
	time.Sleep(time.Second * 100)}Copy the code

perform

➜ GC git:(Master) Qualify GOGC=100 GODEBUG=gctrace=1 bin/app GC 1 @0.000s 2%: 0.003+0.11+0.006 ms clock, 0.037+0/0.089/0.10+0.080 MS CPU, 3->3->0 MB, 4 MB goal, 12 P (forced) ^CCopy the code

We can see that the last surviving object in the heap is 0M. We can see that the 3m object we declared is reclaimed, and the system MEM collection is not triggered.

3. Periodic cleaning

This is actually not easy to test, because that period value cannot be configured!

package main

import (
	"time"
)

// GOGC=100 GODEBUG=gctrace=1
func main(a){_ =make([]byte.0.3<<20)
	time.Sleep(time.Second * 130)}Copy the code

Wait for the 120S! . The results did not

Getting Runtime Information

Use Prometheus to get proc information. It must be Linux/Windows, so MAC cannot be used

1. Mem information

package main

import (
	"fmt"
	"github.com/prometheus/procfs"
	"os"
	"runtime"
)

// GOGC=100 GODEBUG=gctrace=1
func main(a) {
	memInfo()
	_ = make([]byte.0.3<<20)
	memInfo()
	runtime.GC()
	memInfo()
}

func memInfo(a) {
	stats := runtime.MemStats{}
	runtime.ReadMemStats(&stats)
	fmt.Printf("%+v\n", stats)
	procMem()
}

func procMem(a) {
	p, err := procfs.NewProc(os.Getpid())
	iferr ! =nil {
		panic(err)
	}
	procStat, err := p.Stat()
	iferr ! =nil {
		panic(err)
	}
	procStat.ResidentMemory() // The RES occupied by the process
	procStat.VirtualMemory()  // VIRT occupied by the process
	fmt.Printf("res: %dM, virt: %dM\n", procStat.ResidentMemory()>>20, procStat.VirtualMemory()>>20)}Copy the code

Execute the following result of the above function

➜ gc git:(master) qualify GOGC=100 GODEBUG=gctrace=1  bin/app
{Alloc:158760 TotalAlloc:158760 Sys:69928960 Lookups:0 Mallocs:173 Frees:3 HeapAlloc:158760 HeapSys:66879488 HeapIdle:66535424 HeapInuse:344064 HeapReleased:66469888 HeapObjects:170 StackInuse:229376 StackSys:229376 MSpanInuse:5168 MSpanSys:16384 MCacheInuse:20832 MCacheSys:32768 BuckHashSys:2203 GCSys:2240512 OtherSys:528229 NextGC:4473924 LastGC:0 PauseTotalNs:0 PauseNs:[0 0.. ]  PauseEnd:[0 0. ]  NumGC:0 NumForcedGC:0 GCCPUFraction:0 EnableGC:true DebugGC:false BySize:[{Size:0 Mallocs:0 Frees:0} {Size:8 Mallocs:5 Frees:0} {Size:16 Mallocs:42 Frees:0}... ] } res:3M, virt: 211M

{Alloc:3328688 TotalAlloc:3328688 Sys:69928960 Lookups:0 Mallocs:406 Frees:111 HeapAlloc:3328688 HeapSys:66879488 HeapIdle:63250432 HeapInuse:3629056 HeapReleased:63250432 HeapObjects:295 StackInuse:229376 StackSys:229376 MSpanInuse:6528 MSpanSys:16384 MCacheInuse:20832 MCacheSys:32768 BuckHashSys:2203 GCSys:2240512 OtherSys:528229 NextGC:4473924 LastGC:0 PauseTotalNs:0 PauseNs:[0 0. ]  PauseEnd:[0 0. ]  NumGC:0 NumForcedGC:0 GCCPUFraction:0 EnableGC:true DebugGC:false BySize:[{Size:0 Mallocs:0 Frees:0} {Size:8 Mallocs:6 Frees:0} {Size:16 Mallocs:151 Frees:0}... ] } res:3M, virt: 211M

gc 1 @0.002s 0% :0.006+0.17+0.004 ms clock, 0.075+0/0.098/0.20+0.052 ms cpu, 3->3->0 MB, 4 MB goal, 12 P (forced)

{Alloc:158248 TotalAlloc:3345520 Sys:70256640 Lookups:0 Mallocs:656 Frees:484 HeapAlloc:158248 HeapSys:66781184 HeapIdle:66.396.160 HeapInuse:385.024 HeapReleased:62.996.480 HeapObjects:172 StackInuse:327680 StackSys:327680 MSpanInuse:7072 MSpanSys:16384 MCacheInuse:20832 MCacheSys:32768 BuckHashSys:2203 GCSys:2312192 OtherSys:784229 NextGC:4194304 LastGC:1617196841921944000 PauseTotalNs:10675 PauseNs:[10675 0 0. ]  PauseEnd:[1617196841921944000 0 0 0.. ]  NumGC:1 NumForcedGC:1 GCCPUFraction:0.007580984200182396 EnableGC:true DebugGC:false BySize:[{Size:0 Mallocs:0 Frees:0} {Size:8 Mallocs:6 Frees:1}... ] } res:3M, virt: 283M
Copy the code

You can see that the memory is 3m before gc and 0M after GC

  • Alloc158760 – > 3328688 – > 158248 (andHeapAllocThe same)
  • TotalAlloc158,760-> 3,328,688-> 3,345,520 (total allocated memory)
  • Sys69,928,960->69,928,960->70,256,640 (system allocated memory, it means occupy all memory of the operating system!)
  • HeapAlloc158,760->3,328,688->158,248 (heap allocated memory)
  • HeapSys 66,879,488->66,879,488->66,781,184
  • HeapIdle66,535,424->63,250,432->66,396,160
  • HeapInuse344,064->3,629,056->385,024 (number of bytes in use)
  • HeapReleased66,469,888->63,250,432->62,996,480 (memory returned to the operating system, which counts the amount of memory returned to the operating system from Idle Span without being recaptured.)
  • PauseNsWhen GC pauses,PauseNs[NumGC%256]The last 256 GCS are recorded.
  • NextGC4,473,924-> 4,473,924-> 4,194,304, indicating the threshold triggered by the next GC
  • GCSys229376 – > 2240512 – > 2312192

For other indicators, see golang.org/src/runtime…

2. Monitor process indicators

Need to use github.com/prometheus/procfs package, here is very good solve cross-platform issues!

package main

import (
	"github.com/prometheus/procfs"
	"os"
)

func main(a) {
	p, err := procfs.NewProc(os.Getpid())
	iferr ! =nil {
		panic(err)
	}

	procStat, err := p.Stat()
	iferr ! =nil {
		panic(err)
	}
	procStat.ResidentMemory() // The RES occupied by the process
	procStat.VirtualMemory()  // VIRT occupied by the process
}
Copy the code

3. Monitor using Prometheus

Here are the metrics of an online service

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summaryGo_gc_duration_seconds {quantile = "0"} 5.6827 e-05 go_gc_duration_seconds 8.1842 e-05 {quantile = "0.25"} Go_gc_duration_seconds {quantile = "0.5"} 9.8818 e-05 go_gc_duration_seconds {quantile = "0.75"} 0.000125499 Go_gc_duration_seconds {quantile="1"} 0.000555719 go_gc_duration_seconds_sum 0.247680951 go_gc_duration_seconds_count 2366# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 50
# HELP go_info Information about the Go environment.
# TYPE go_info gauge1 go_info {version = "go1.13.5"}# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge8.338104 e+06 go_memstats_alloc_bytes# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter1.3874634688 e+10 go_memstats_alloc_bytes_total# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge1.922436 e+06 go_memstats_buck_hash_sys_bytes# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter8.9915565 e+07 go_memstats_frees_total# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge5.2633836319412915 e-06 go_memstats_gc_cpu_fraction# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge2.398208 e+06 go_memstats_gc_sys_bytes# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge8.338104 e+06 go_memstats_heap_alloc_bytes# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge5.1625984 e+07 go_memstats_heap_idle_bytes# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge1.0829824 e+07 go_memstats_heap_inuse_bytes# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 42405
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge4.9709056 e+07 go_memstats_heap_released_bytes# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge6.2455808 e+07 go_memstats_heap_sys_bytes# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge1.6172457774344466 e+09 go_memstats_last_gc_time_seconds# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter8.995797 e+07 go_memstats_mallocs_total# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 83328
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 98304
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 142528
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 196608
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge1.0362992 e+07 go_memstats_next_gc_bytes# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge5.542772 e+06 go_memstats_other_sys_bytes# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge4.653056 e+06 go_memstats_stack_inuse_bytes# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge4.653056 e+06 go_memstats_stack_sys_bytes# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge7.7267192 e+07 go_memstats_sys_bytes# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 48
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counterProcess_cpu_seconds_total 3875.24# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge1.048576 e+06 process_max_fds# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 29
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge7.5575296 e+07 process_resident_memory_bytes# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge1.61709350436 e+09 process_start_time_seconds# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge2.018103296 e+09 process_virtual_memory_bytes# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 25373
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
Copy the code

Core focus indicators:

The meaning of 7.5575296e+07 is 7.5575296*10^7, so it is converted to M. Fast calculation only needs / 10^6, so -6 is enough. In other words, 7.5575296e+01 is 75M

  • process_resident_memory_bytesRESPhysical memory occupied by processes (75M)
  • process_virtual_memory_bytesVIRTVirtual memory occupied by the process (Go virtual memory tends to be large, 2G)
  • go_memstats_heap_alloc_bytesHeapAllocHeap memory size (actual size currently used by the heap, 8M)
  • go_memstats_next_gc_bytesNextGCRepresents the threshold for the next GC trigger (10M)
  • go_memstats_heap_idle_bytesHeapIdleRepresents free memory in the heap (59M)
  • go_memstats_heap_inuse_bytesHeapInuseRepresents the heap memory in use (10M, possibly with fragmentation, this is the actual memory used for reference, since free memory may be reclaimed/unallocated or may not actually be allocated)
  • process_open_fdsOpen Files (29)
  • go_goroutinesNumber of Goroutines of Go (50)
  • go_memstats_buck_hash_sys_bytesRepresents data in the Hash table (8M)

The following is our company’s monitoring of Go service

The service is deployed in the container 4C_4g (host is 128G_64C). It can be seen that within 24 hours, the memory of the service is relatively stable. The heap memory is basically maintained at around 10M-15m, and the GC is basically within 100us!

4, the use ofnet/http/pprof

Just introduce _ “net/ HTTP /pprof”

And then add a line

	go func(a) {
    // Unused ports
		http.ListenAndServe(": 8080".nil)
	}()
Copy the code

1, the mem

HeapAlloc, HeapInuse, NextGC, PauseNs, NumGC

➜  ~ curl http://localhost:8080/debug/pprof/allocs\? debug\=1 -v

# runtime.MemStats
# Alloc = 457531928
# TotalAlloc = 672404416
# Sys = 556939512
# Lookups = 0
# Mallocs = 24449
# Frees = 23362
# HeapAlloc = 457531928
# HeapSys = 536248320
# HeapIdle = 77873152
# HeapInuse = 458375168
# HeapReleased = 44040192
# HeapObjects = 1087
# Stack = 622592 / 622592
# MSpan = 29512 / 32768
# MCache = 20832 / 32768
# BuckHashSys = 1443701
# GCSys = 17530880
# OtherSys = 1028483
# NextGC = 915023280
# LastGC = 1617268469532778000
# PauseNs = [35105 36373 34839 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# PauseEnd = [1617268337687730000 1617268338545459000 1617268469532778000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# NumGC = 3
# NumForcedGC = 0
# GCCPUFraction = 1.2248226722456959 e-06
# DebugGC = false
* Connection #0 to host localhost left intact
* Closing connection 0
Copy the code

2. Goroutine/thread

➜ ~ curl http://localhost:8080/debug/pprof/goroutine\? The debug \ = 1 - v goroutine profile: total 6 ➜ to curl http://localhost:8080/debug/pprof/threadcreate\? debug\=1 -v threadcreate profile: total 14Copy the code

3 and cooperate withgo tool pprof The command

Seconds is required. By default, 30s is collected. In the following example, 10s is collected

➜  ~ go tool  pprof -http ": 8888"  http://localhost:62316/debug/pprof/allocs\? seconds\=10
Fetching profile over HTTP from http://localhost:62316/debug/pprof/allocs? seconds=10
Saved profile in /Users/fanhaodong/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space011..pb.gz
Serving web UI on http://localhost:8888
Copy the code

You can see the final effect of collecting 10s

1, such as searchgoroutineHigh utilization rate!

So we’re going to query the number first

➜  ~ curl http://localhost:62316/debug/pprof/goroutine\? debug\=1
goroutine profile: total 100
Copy the code

Why so many

➜  ~ go tool  pprof -http ": 8888"  http://localhost:62316/debug/pprof/goroutine\? seconds\=10
Fetching profile over HTTP from http://localhost:62316/debug/pprof/goroutine? seconds=10
Saved profile in /Users/fanhaodong/pprof/pprof.goroutine001..pb.gz
Serving web UI on http://localhost:8888
Copy the code

You can view the CSV diagram, provided you are local, first of all if online on the server, obviously not possible! Online needs to be executed

➜ ~ go tool pprof http://localhost:62316/debug/pprof/goroutine\? seconds\=Copy the code

And then check it out

See the trace

➜ ~ go tool pprof http://localhost:62316/debug/pprof/goroutine\? seconds\=5 Fetching profile over HTTP from http://localhost:62316/debug/pprof/goroutine? seconds=5 (pprof) tree Showing nodes accounting for 102, 98.08% of 104 total Showing top 80 nodes out of 129 ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - | 53 51.96% runtime. | selectgo 26 25.49% The runtime. Goparkunlock 23 22.55% 102 98.08% 98.08%, 102 98.08% | | runtime.net pollblock runtime. Gopark -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - | 54 0 0% 98.08% 51.92% Github.com/apache/rocketmq-client-go/v2/primitive.WithRecover | 9 16.67% RemotingClient github.com/apache/rocketmq-client-go/v2/internal/remote. (*). Connect. | func1 5 9.26% PushConsumer github.com/apache/rocketmq-client-go/v2/consumer. (*). PullMessage. | func1 5 9.26% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func1 5 9.26% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func2 5 9.26% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. 5 9.26% | func3 - a non-class function StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func4 5 9.26% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func5 5 9.26% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func6 2 3.70% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.2 2 3.70% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.3 2 3.70% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.4 2 3.70% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.5 2 3.70% github.com/apache/rocketmq-client-go/v2/internal.(*traceDispatcher).Start.func1 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - | 5 9.43% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func1 5 9.43% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func2 5 9.43% StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. 5 9.43% | func3 - a non-class function StatsItemSet github.com/apache/rocketmq-client-go/v2/consumer. (*). The init. | func4 5 9.43% ResponseFuture github.com/apache/rocketmq-client-go/v2/internal/remote. (*). | waitResponse 5 9.43% .net/HTTP. (* persistConn). 3.77% writeLoop 2 | database/SQL DB (*). | connectionOpener 2 3.77% The database/SQL. (* DB). | connectionResetter 2 3.77% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.2 2 3.77% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.3 2 3.77% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.4 2 3.77% RmqClient github.com/apache/rocketmq-client-go/v2/internal. (*). Start. | func1.5 2 3.77% 2 3.77% | traceDispatcher github.com/apache/rocketmq-client-go/v2/internal. (*). The process MysqlConn github.com/go-sql-driver/mysql. (*). StartWatcher. | func1 1 1.89% ManagementServiceClient github.com/SkyAPM/go2sky/reporter/grpc/management. (*). The KeepAlive | 1 1.89% PushConsumer github.com/apache/rocketmq-client-go/v2/consumer. (*). Start. | func1.1 1 1.89% Github.com/nacos-group/nacos-sdk-go/common/http_agent.post 53 50.96% | 0 0% 98.08% runtime. 53 100% selectgo | runtime.goparkCopy the code

conclusion

To sum up, heap is the largest space in Go, and the heap size is basically inexhaustible. For GC (because it allocates virtual memory and periodically releases memory), Go adopts reuse as it can be reused, that is to say, if I create 3M and then reclaim 3M, I will reuse the space next time. The second is that Go’s heap memory is not reclaimed, which means that if I make a lot of heap space at some point, in fact, for the program, memory is not reclaimed!

The GC time of Go mainly depends on the size of the heap. For example, the heap size of Go is very small, less than 100M, so the GC time is not very long!

GC optimizations for Go are recyclable as soon as the program/request is finished running. If you want to open up a lot of memory, use sync.pool!

Secondly, reducing the GC frequency can open up a space at the time of program initialization and the size of the stable operation of the program, so for Go, it will reduce the number of GC collection, but the time of GC collection will increase!

Go’s GC collection time is limited by the CPU of the machine, the higher the CPU of the machine, the faster the GC collection time!

note

Mem information field

// A MemStats records statistics about the memory allocator.
type MemStats struct {
	// General statistics.

	// Alloc is the amount of memory (bytes) used by the allocated heap memory object.
	//
	// This value is basically the same as HeapAlloc (see below).
	Alloc uint64

	// TotalAlloc is the amount of memory allocated by the accumulated heap memory object (bytes).
	//
	// TotalAlloc grows slowly with heap memory object allocation, but unlike Alloc and HeapAlloc,
	// This value does not shrink as the object is released.
	TotalAlloc uint64

	// Sys is the amount of memory obtained from the OS (bytes).
	//
	// Sys is a composite of the XSys fields listed below. Sys maintains the address of the virtual memory space reserved for the Go runtime,
	// It contains: heap, stack, and other internal data structures.
	Sys uint64

	// Lookups is the number of pointer queries executed by runtime.
	//
	// This is useful for debugging within the Runtime.
	Lookups uint64

	// Mallocs is the cumulative number of heap memory objects allocated.
	// The number of live heap memory objects is mallocs-frees.
	Mallocs uint64

	// Frees is the cumulative number of heap memory objects that are freed.
	Frees uint64

	// Heap memory statistics.
	//
	// Understanding heap memory statistics requires some knowledge of how Go manages memory. Go spans heap memory virtual memory space on a scale of "SPANS".
	// SPANS are 8K (or more) of contiguous memory. A SPAN may be in one of three states:
	//
	// An "idle" span contains no objects or other data.
	// Span can be freed back to OS (but not virtual memory),
	// Can also be converted to the "in use" or" stack stack" state.
	//
	// An "in use" span contains at least one heap object and may have room to allocate more.
	//
	// A "stack stack" span is the memory space used for the Goroutine stack.
	// Span of stack state is not considered part of heap memory. A SPAN can switch between heap and stack memory;
	// But it is impossible to be both.

	// HeapAlloc is the amount of memory (bytes) used by the allocated heap memory object.
	//
	The "allocated" heap memory object contains all reachable objects, as well as all unreachable objects known to the garbage collector but not yet collected.
	// Specifically, HeapAlloc grows as heap memory objects are allocated and shrinks as memory is cleaned up and unreachable objects are freed.
	// Cleanup occurs incrementally as the GC cycle progresses, so both growth and shrinkage occur simultaneously,
	// The HeapAlloc trend as a result is smooth (in contrast to the sawtooth trend of the traditional stop-the-world garbage collector).
	HeapAlloc uint64

	// HeapSys is the total amount of heap memory obtained from the OS (bytes).
	//
	// HeapSys maintains the virtual memory space reserved for heap memory. This includes the virtual memory space that is reserved but not yet used,
	// This part does not occupy the actual physical memory, but tends to shrink,
	// The same as virtual memory that occupies physical memory but is later released back to the OS because it is no longer used. (See HeapReleased for proofreading)
	//
	// HeapSys is used to estimate the maximum size that heap memory has ever reached.
	HeapSys uint64

	HeapIdle is the number of bytes you take on "Idle (unused)."
	//
	// Idle SPANS have no internal objects. Spans can (and probably have been) released back to OS,
	// They can be reused in heap allocation, or they can be reused as stack memory.
	//
	// HeapIdle minus HeapReleased estimates the amount of memory that can be released back to the OS,
	// But since the memory is already occupied by the Runtime, the heap can be reused.
	// There is no need to request more memory from OS. If this difference is significantly greater than the heap memory size, it means that there is a short-term peak in the number of viable heap memory objects.
	HeapIdle uint64

	HeapInuse is the total spans of memory (bytes) you use in the "in use" state.
	//
	Spans in use have at least one object inside. These SPANS can only be used to store other spans of similar size.
	//
	// HeapInuse minus HeapAlloc is used to estimate the amount of memory used to store objects of a particular size.
	// It is not currently in use. This is an upper bound on memory fragmentation, but generally this memory is reused efficiently.
	HeapInuse uint64

	// HeapReleased is the total amount of physical memory released back to the OS (bytes).
	//
	// This value counts as the SPANS heap memory space that has been freed back into the OS's idle state and hasn't been reallocated by the heap.
	HeapReleased uint64

	// HeapObjects is the total number of objects in the heap memory.
	//
	// As with HeapAlloc, this value increases as objects are allocated and decreases as heap memory clears unreachable objects.
	HeapObjects uint64

	// Stack memory statistics.
	//
	// Stack memory is not considered part of heap memory, but the Runtime uses span in a heap memory as stack memory, and vice versa.

	// StackInuse is the total spans of stack memory used (bytes).
	//
	Spans You have at least one stack memory. These SPANS can only be used to store other stack memory of a similar size.
	//
	// There is no StackIdle, because SPANS of unused stack memory are freed back to the heap (and therefore HeapIdle).
	StackInuse uint64

	// StackSys is the amount of memory the stack gets from the OS (bytes).
	//
	// StackSys is StackInuse plus some memory (which should be small) fetched directly from the OS for OS thread stacks.
	StackSys uint64

	// Off-heap memory statistics.
	//
	The following statistics describe run-time internal constructs that are not allocated from heap memory (usually because they are part of the heap memory implementation).
	// Unlike heap or stack memory, memory allocation for any of these structures only serves those structures.
	//
	// These statistics are useful for the debugging Runtime memory overhead.

	// MSpanInuse is the amount of memory allocated by the MSPAN structure (bytes).
	MSpanInuse uint64

	// MSpanSys is the amount of memory requested from the OS for the MSPAN structure (bytes).
	MSpanSys uint64

	// MCacheInuse is the amount of memory allocated by the McAche structure (bytes).
	MCacheInuse uint64

	// MCacheSys is the amount of memory (bytes) requested from the OS for the McAche structure.
	MCacheSys uint64

	// BuckHashSys is the amount of memory (bytes) used for profiling bucket hash tables.
	BuckHashSys uint64

	// GCSys is the amount of memory (bytes) of metadata used in garbage collection.
	GCSys uint64

	OtherSys is the amount of out-of-heap memory allocated by various Runtime (bytes).
	OtherSys uint64

	// Garbage collection statistics

	// NextGC is the target heap size for the NextGC cycle.
	//
	// The garbage collector's goal is to keep HeapAlloc ≤ NextGC.
	// At the end of each GC cycle, the target value for the next cycle is calculated based on the current amount of reachable object data and the value of the GOGC.
	NextGC uint64

	// LastGC is the time when the last garbage collection was completed, and its value is nanoseconds (UNIX epoch) for paper towels since 1970.
	LastGC uint64

	// PauseTotalNs is the cumulative duration of pauses in GC stop-the-world since the program started, in nanoseconds.
	//
	// During a stop-the-world pause, all goroutines are suspended and only the garbage collector is running.
	PauseTotalNs uint64

	// PauseNs is the circular buffer (in nanoseconds) that took the most recent GC stop-the-world pause.
	//
	// PauseNs[(NumGC+255)%256] was last paused.
	// In general, PauseNs[N%256] records the pause time for the latest N%256th GC cycle.
	// There may be multiple pauses in each GC loop; This is the sum of all pause times in a loop.
	PauseNs [256]uint64

	// PauseEnd is a circular buffer of the most recent GC PauseEnd time, whose value is tissue nanoseconds (UNIX epoch) since 1970.
	//
	// This buffer is filled in the same way as PauseNs.
	// There may be multiple pauses per GC loop; This buffer records the end time of the last pause in each loop.
	PauseEnd [256]uint64

	NumGC is the number of GC cycles that have been completed.
	NumGC uint32

	NumForcedGC is the number of GC cycles that the application forces to initiate by calling GC functions.
	NumForcedGC uint32

	// GCCPUFraction is the amount of time that the application's available CPU time has been consumed by GC since the program started.
	//
	// GCCPUFraction is a number between 0 and 1, with 0 indicating that the GC is not consuming any CPU for the application.
	// The available CPU time of an application is defined as the GOMAXPROCS integral since the application started.
	// For example, if GOMAXPROCS is 2 and the application has been running for 10 seconds, the "AVAILABLE CPU time" is 20 seconds.
	// GCCPUFraction does not include the CPU time consumed by write barriers.
	//
	// This value is consistent with the CPU duration reported by GODEBUG=gctrace=1.
	GCCPUFraction float64

	// EnableGC Displays whether GC is enabled. This value is always true, even if GOGC=off is enabled.
	EnableGC bool

	// The DebugGC is not currently in use.
	DebugGC bool

	// BySize reports span-level memory allocation statistics BySize.
	//
	// BySize[N] specifies the size of the object.
	BySize[n-1].Size < S ≤ BySize[N].Size.
	//
	// Data in this structure does not report memory allocations larger than BySize[60].Size.
	BySize [61]struct {
		// Size is the byte Size of the largest object that the current Size level can hold.
		Size uint32

		// Mallocs is the cumulative number of heap memory objects allocated to this size level.
		// Cumulative allocated memory capacity (bytes) The value can be Size*Mallocs.
		// The number of living objects in the current size level can be calculated using mallocs-frees.
		Mallocs uint64

		// Frees is the cumulative number of heap memory objects released at the current size level.
		Frees uint64}}Copy the code

Use initialized memory to reduce GC cycles of testing code

As you can see, this code is free to trigger GC when initializing 240M of memory, so there is not a single GC for subsequent memory allocation

var (
	buffer = makeArr(240 << 20) // 240m means that heap 2* 240m does not trigger GC
)
var (
	appender = make([] []byte.0.100))// GODEBUG=gctrace=1
func main(a) {
	count := 0
	alloc := 4 << 20
	for x := 0; x < 100; x++ {
		appender = append(appender, makeArr(alloc))
		count++
		time.Sleep(time.Millisecond * 10)

		// Collect historical data at 50, then gc will be triggered when memory reaches 480MB, so memory usage is generally around 480MB after this program ends
		if x == 50 {
			for index := range appender {
				appender[index] = nil
			}
			println(len(appender))
		}
	}
}

func makeArr(len int) []byte {
	fmt.Printf("Allocate heap memory: %dM\n".len>>20)
	bytes := make([]byte.len)
	for index, _ := range bytes {
		bytes[index] = 'x'
	}
	return bytes
}
Copy the code

Compare the total GC time with and without buffer

Allocate buffer gc twice

The gc 1 @ 0.009 s 0% : Goal, 4 P gc 2 @1.251s 0%: 1.0 -> 1.0 -> 1.0 -> 1.0 0.030+8.8+0.025 ms clock, 0.12+0/0.13/8.6+0.10 ms CPU, 468->468->264 MB, 480 MB goal, 4 P
#The gc time
283700 302700 =586400 ns
Copy the code

Gc of unallocated buffers

#Nine times the gc
122800 81400 55000 78900 103200 39400 88400 66800 33900  = 669800 ns

#Last GCGc 9 @1.830s 0%: 0.016+2.2+0.017 ms clock, 0.066+ 0/2.2+0.069 ms CPU, 188->188->188 MB, 192 MB goal, 4 P res: 263M, virt: 355MCopy the code

So overall, gc times are pretty consistent, but reducing gc times is a good option

reference

Xenojoshua.com/2019/03/gol…