preface

The body of the

1. A one-sentence introduction

Golang comes with an out-of-the-box performance monitoring and analysis tool.

(There is no need to memorize the whole story. It will make you feel different after reading it.)

2. Use gestures?

2.1 runtime/pprof

Manual calls runtime. StartCPUProfile/runtime. StopCPUProfile API for data collection, etc.

Advantages: high flexibility, on-demand collection.

Usage scenario: tool-based applications (e.g., customized analytics widgets, integration into corporate monitoring systems)

The.net 2.2 / HTTP/pprof

The Profile sample file is obtained through the HTTP service. import _ “net/http/pprof”

Pros: Easy to use.

Usage Scenario: Online service (always running program)

(Net/HTTP /pprof is just wrapped in the Runtime /pprof package and exposed on the HTTP port)

2.3 go test

Run the go test-bench. -cpuprofile cpu.prof command to collect data.

Advantages: strong pertinence, refinement to the function

Usage scenario: Perform a function performance test

3 Analyzing posture?

3.1 Data Collection

The basis of analysis is to obtain the corresponding collection files. The Runtime /pprof and go test are collected from the command line (for example, CPU analysis). Net/HTTP /pprof highlights the data through interfaces.

Run the go test-bench. -cpuprofile cpu.prof command to generate the test.

2. Runtime /pprof, the simplest code.

package main

import (
	"fmt"
	"os"
	"runtime/pprof"
	"time") // A piece of problematic code funcdo() {
	var c chan int
	for {
		select {
		case v := <-c:
			fmt.Printf("I am the line with the problem because I cannot receive value: %v", v)
		default:
		}
	}
}

func main() {// Create an analysis file file, err := os.create ()"./cpu.prof")
	iferr ! = nil { fmt.Printf("Failed to create collection file, err:%v\n", err)
		returnPprof.startcpuprofile (file) defer pprof.stopcpuProfile () // Execute a problematic codefor i := 0; i < 4; i++ {
		go do()
	}
	time.Sleep(10 * time.Second)
}
Copy the code

Execute command:

go run pprof.go
Copy the code

Then you get the data acquisition file: cpu.prof. (This file will be used later)

3. HTTP mode, on the code:

package main

import (
	"fmt"
	"net/http"
    _ "net/http/pprof"// The first step ~) // a piece of problematic code funcdo() {
	var c chan int
	for {
		select {
		case v := <-c:
			fmt.Printf("I am the line with the problem because I cannot receive value: %v", v)
		default:
		}
	}
}

func main() {// Execute a bad piece of codefor i := 0; i < 4; i++ {
		go do()
	}
	http.ListenAndServe("0.0.0.0:6061", nil)
}
Copy the code

Through the code key two step, can carry out through http://127.0.0.1:6061/debug/pprof/ to see the corresponding data of ~

3.2 Data Content

Either way, it can be analyzed. In this case, the HTTP method lists all the information that can be seen.

type describe
allocs Within theStore sampling information for allocation
blocks blockingSampling information for operation conditions
cmdline Display program startThe command parameterAnd its parameters
goroutine Display current allcoroutinesStack information of
heap The heapSample information about memory allocation on
mutex The lockSampling information on competition conditions
profile cpuSampling information on occupancy, click will download the file
threadcreate systemthreadCreate sample information for the situation
trace The programRunning trackinformation

Through the keywords in bold, it is very intuitive to see the data that can be analyzed.

(The command line and graphical pages will be discussed based on CPU sampling information, and the rest will be applied in actual combat.)

3.3 Data Analysis

3.3.1 the command line

Go tool pprof

Binary: indicates the path of the binary file.

Source: on behalf of the analysis of the generated data source, can be a local file (as generated by the CPU. The initial), also can be the HTTP address (such as: go tool pprof http://127.0.0.1:6060/debug/pprof/profile).

It is important to note that valid data is more meaningful under high load (either by deliberately writing failure code or by simulating access pressure), and may not be meaningful if idle

Start analyzing the CPU.prof generated above:

go tool pprof cpu.prof
Copy the code

See the page:

top

This diagram needs to be talked about! Since WHEN I was in theory, I thought it would be better to look at a specific graph together, so I provided two ways to generate the graph:

1. Enter Web in the preceding dialog box. Generate a page for pprof001.svg.

2. Run the go tool pprofe-pdf cpu.prof command to generate profile001. PDF file. (Optional text, PDF, SVG)

Either way, you get the following image:

type describe For example,
flat CPU usage of this function The CPU usage of the selectNbrecV is 12.29 seconds
flat% The percentage of CPU time spent by this function Selectnbrecv time: 12.29s. Total CPU time: 29.14, 12.29/29.14=42.18
sum% The sum of the function ranked above it in the top command and the function flat% Chanrecv: 42.18%+30.47% = 72.65%
cum The current function plus the cumulative CPU time before the function was called Chanrecv: 8.88 + 0.54 = 9.42
cum% The current function plus the percentage of accumulated CPU time before the function was called 9.42/29.14 = 32.33%
The last column Current function name

The thicker the discovery process, the more problems it takes, and the more problems it may have. Found a problem with the do function. In this case, run the list funcName command to check the specific location

The analysis of MEM is similar to that of CPU and will not be covered here.

To summarize, remember at least three steps to analyze: top -> list Func -> web

3.3.2 Visual page

There are two ways to enable the browser to open a Web site:

1. Run the go tool pprof-http =:6060 cpu.prof command

  • Top (same as the Top command on the GDB interaction page above)

  • Graph (Web command to interact with the previous GDB page)

  • Flame Graph

Each block represents a function in the order from top to bottom, and the larger the block, the longer the CPU usage. It also supports click – on – block in-depth analysis.

  • Peek (detail = tree structure)

  • Source (same as the list FuncName command on the GDB interaction page above)
  • Disassemble

4. Actual combat in games

In Journey to the West, the master and his disciples went on a pilgrimage to the West to obtain the scriptures after ninety-eight difficulties.

This side has done a little script for small partners: point me

Directly in the browser execution./xiyouji, you can see the teacher and his disciples along the way to eat and drink.

4.1 Difficulty 1 – The CPU usage is too high

First look at the profile file to see if there are any CPU exceptions

go tool pprof http://localhost:6060/debug/pprof/profile
Copy the code

So let’s see, execute the command: list Drink

Originally eat above have a problem, carried out 100 million empty cycle, no wonder occupy CPU so high.

We’re looking at the big picture: the Web

Fix the problem. (Note off can be repaired, the same as the following)

4.2 Second difficulty – High memory usage

After recompiling, move on. Now let’s see if memory is ok.

go tool pprof http://localhost:6060/debug/pprof/heap 
Copy the code

Find sand monk seems to eat more?

Take a closer look at why: List Eat

It turned out that there was malicious memory appending until capacity peaked

Go ahead and look at the diagram to confirm again: Web

Fix the code.

4.3 Third hard – Frequent memory reclamation

We all know that frequent gc processing results in a constant STW, which a high-performance service cannot tolerate.

We need an environment variable to start the GC observation,

GODEBUG=gctrace=1 ./xiyouji 2>&1|grep gc
Copy the code

Description of this message:

It can be seen that gc will be triggered in about 3s, each time from 16M->0, indicating that memory is constantly applied and released.

By the allocation of memory, you can see if there is an exception in the GC, and if it is 100% consistently or a large percentage then there is something wrong.

Execute command:

go tool pprof http://localhost:6060/debug/pprof/allocs
Copy the code

Continue to check how Wukong: list Shit

See the big picture: the Web

As before, note out the code and move on.

Why 16m? Simply put, this allows memory requests to run off the stack to the heap during escape analysis, which causes GC collection. (Check out Golang’s Escape Analysis of the Series, which I haven’t written yet, for more details)

4.4 The fourth difficulty – coroutine leakage

We’re going goroutine, which is a little too much, right? Is the coroutine leaking? Keep reading.

Check out goroutine:

go tool pprof http://localhost:6060/debug/pprof/goroutine
Copy the code

You see a function that raises a lot of goroutines, but you don’t seem to see the problem caused by the manually added function.

Keep chasing the big picture: the Web

It was a problem when Tang’s monk slept.

Keep tracking down the problematic function: List Sleep

Get rid of the problem, and then look at http://127.0.0.1:6060/debug/pprof/, found that has returned to normal.

4.5 Fifth hardest – contention for locks

There is an exception: goroutine has been reduced to 4, why is there a lock acquisition problem?

go tool pprof localhost:6060/debug/pprof/mutex
Copy the code

As you can see, 126 rows of the main coroutine are locked and then immediately locked again, but the other coroutine is unlocked, and the other coroutine sleeps for a full second. This will cause blocking and lock contention.

Just fix it (note it)

4.6 The sixth hard – blocking operation

After solving the above problems, it is found that:

It turns out that there is other logic that blocks other than locks, so read on.

go tool pprof localhost:6060/debug/pprof/block
Copy the code

As you can see, the slepe block is a channel block that waits for 1 second for output, causing the program to block for 1 second.

4.7 Difficult 7 – A misunderstanding

go tool pprof localhost:6060/debug/pprof/block
Copy the code

It turned out to be HTTP listening, as expected.

4.8 The eighth Difficulty – Get the scriptures

After the previous operations, I believe that you are already familiar with the investigation process and the content of the investigation.

Here’s a summary of the screening routine.

Step 1: Enter and exclude the corresponding GDB interaction.

Go tool pprof http://localhost:6060/debug/pprof/ {fill in what you want to see}Copy the code

Content Keywords:

type describe
allocs Within theStore sampling information for allocation
blocks blockingSampling information for operation conditions
cmdline Display program startThe command parameterAnd its parameters
goroutine Display current allcoroutinesStack information of
heap The heapSample information about memory allocation on
mutex The lockSampling information on competition conditions
profile cpuSampling information on occupancy, click will download the file
threadcreate systemthreadCreate sample information for the situation
trace The programRunning trackinformation

Step 2: Triple recruitment, top->list FuncName-> Web

Through the occupancy ratio analysis, check the specific number of lines of code, see the large picture to confirm.

Step 3: Solve the problem.

(Careful students may find that trace is not analyzed. Please look forward to “Golang’s Trace at a Glance”.)

5. Dance with test commands

-cpuprofile=cpu.prof or -memprofile=mem.prof to obtain the corresponding data collection file, the following things for the real you should understand

Run the go test-bench. -cpuprofile cpu.prof command

reference

1. Performance analysis of Golang Killer PProf

2. Golang pprof of actual combat

3. Performance monitoring and analysis of Go program Pprof

4. Go Pprof performance analysis

5. How to monitor golang’s garbage collection