Everybody is good, today is really hot outside, don’t want to go out, the weather in an air-conditioned room with a small spoon πŸ₯„ dig iced watermelon to eat, is really no πŸ˜„, when I was watching Olympic Games while just watermelon, I suddenly thought of, the big can’t light me cool in summer, also get to our program “cooling”, The key to cooling down is to find the “heating point,” and Golang provides a very useful tool to help you locate many of the problems in your application, in the form of PProf.

Pprof profile

Pprof provides the profiling of runtime programs, which is typically translated as profiling. In the Internet, each app generally has its own user portrait, which contains multiple characteristics such as age, gender, video preference and so on, so that it is more convenient to recommend the content that users may be interested in. In the computer world, a profile refers to the runtime characteristics of a process, usually including CPU, memory, lock, etc., so that we can easily optimize the performance of our programs.

Golang is a very performance-oriented language (although gcπŸ˜‚ is available), so Golang has a built-in pprof tool to help you understand the profiling data of your application, as well as a plug-in to visually see the profiling data of your application. Golang provides two ways to use PProf.

  1. runtime/pprof

Corresponding scene is the process of the script/tools, generally run for a period of time will stop, will not continue to run, this case directly using the runtime package pprof tool for performance data collection process is the most convenient, directly in the process in the operation of the continuous write pprof file or will be after the end of each performance data can be written to the file.

  1. net/http/pprof

Net/HTTP/pProf exposes a port that we can use for continuous/dynamic/real-time collection through various apis. Of course, net/ HTTP /pprof is just for the convenience of using external interfaces, and its underlying calls are also the interfaces provided by Runtime /pprof.

The basic use

The Go code and the PProf tool used below are based on Golang 1.16

runtime/pprof

package main



import (

    "fmt"

    "os"

    "runtime/pprof"

    "time"

)



func main(a) {

    fi, err := os.Create("cpu.pprof")

    iferr ! =nil {

        panic(err)

    }

    pprof.StartCPUProfile(fi)



    go infLoop()

    time.Sleep(time.Second)



    pprof.StopCPUProfile()

}



func infLoop(a) {

    for {

        fmt.Println("abcd")}}Copy the code

Executing the above program generates the PROFILING of the CPU and writes it to the cpu.pprof file.

We then execute the go tool pprof cpu.pprof to enter the interactive terminal of Pprof.

First there are a few lines of information, which are

  • Type: The Type of the current pprof file. The current value is CPU. We will see more types of performance analysis data later
  • Time: The Time when pprof file collection started, accurate to min
  • Duration: The Duration of pprof, followed by Total samples for how long the sample number was collected

Run the top command to see the functions in reverse order, as follows.

You can see that there are six columns of information

The column name explain
flat The amount of CPU time used by the current function (in units of time since we are analyzing CPU, and in units of space when analyzing memory)
flat% Percentage of CPU time occupied by the current function
sum% Flat % adds up from top to bottom, so the sum% of the first row will be the same as the flat% of the first row, and the sum% of the second row will be the flat% of the first row plus the flat% of the second row, and so on
cum Cumulative, the CPU time used by the current function and its subfunctions
cum% The percentage of CPU time consumed by the current function and its subfunctions
(Last column) The path of the
P a c k a g e . {Package}.
{Function}

In addition to the top command, use help to see all the interactive commands supported by Pprof. The current version (Golang 1.16) supports the following, with the more common ones highlighted in gray.

Command explain
callgrind Output graphs in the Callgrind format, a performance analysis tool provided by the Linux Valgrind toolset.
comments Output comments for all profiles
disasm Select or filter assembly calls in the program and output the display
dot Output the graph in the dot format, a drawing provided by Linux
eog Display graphs with EOG (requires Graphviz installed)
evince Displaying graphs via Evince (requires graphviz installed)
gif Output the graph in GIF format (requires graphviz installed)
kcachegrind Display reports visually with KCachegrind (a performance visualization tool)
list Need to give to a regular, output in line with the corresponding regular function source code
pdf Output the diagram in PDF format
peek You need to give a regular, and output the function’s Caller /callee that matches the corresponding regular
png Output the graph in PNG format
proto Output profile in proto compressed format
ps Generate graphs in Photoshop format (requires Graphviz installed)
raw Output profile source files in text format
svg Generate graphs in SVG format (requires Graphviz installed)
tags Output all the tags in the profile file
text The same as the top
top Output the first n functions in text format
topproto Output each node of top in protobuf format
traces Output all profile samples in text format
tree Output function call stack
web View pProf generated graphs in a browser (requires Graphviz installed)
weblist Use the browser to view functions and their corresponding code
o/options The list outputs all options and their corresponding values
q/quit/exit/^d Exit the pprof command-line interactive mode

net/http/pprof

package main



import (

    "net/http"

    _ "net/http/pprof"

)



func main(a) {

    http.HandleFunc("/".func(resp http.ResponseWriter, req *http.Request) {

        resp.Write([]byte("hellp net pprof"))

    })

    err := http.ListenAndServe("127.0.0.1:8000", http.DefaultServeMux)

    iferr ! =nil {

        panic(err)

    }

}
Copy the code

If we are using the DefaultServeMux for the HTTP package, we will automatically register several routes to our mux when init the pprof package. If we are not using http.DefaultServeMux, we will register the following serveMux ourselves.

//src/net/http/pprof/pprof.go

package pprof

/ /...

func init(a) {

    http.HandleFunc("/debug/pprof/", Index)

    http.HandleFunc("/debug/pprof/cmdline", Cmdline)

    http.HandleFunc("/debug/pprof/profile", Profile)

    http.HandleFunc("/debug/pprof/symbol", Symbol)

    http.HandleFunc("/debug/pprof/trace", Trace)

}

/ /...
Copy the code

Note that in general, in a real online service, a new port (debug Port) is used as the pprof port in order to separate it from our external service port.

After compiling the program to run directly, visit http://localhost:8000 to see our root directory.

Let’s take a look at some of the routes registered by PProf.

  1. /debug/pprof/cmdline

Simply print the command line arguments for the current process, as follows.

  1. /debug/pprof/symbol

Based on the function address (PC) passed in, GET the corresponding function name, read the data from the body in the case of HTTP POST method, or read the URL Query in the case of HTTP GET method. If you need to pass multiple function addresses, use a plus sign to connect, as shown below.

  1. /debug/pprof/trace

To obtain the tracing information of various events in the program running, such as system calls, GC, Goroutine, etc., can be followed by a second parameter, which represents the sampling time (unit: second), after the completion of the execution will automatically download a file, as follows:

In this case, we do 5s sampling, and the sample trace file is automatically downloaded when the sample is finished.

Run the go tool trace tracefile command to view the sample information as follows:

You can also see more information here, but trace is outside the scope of this article, so you can learn more about it later.

  1. /debug/pprof/profile

Collect the profiling of the CPU, as shown in trace. You can also use seconds to specify the length of the profiling. When the profiling is complete, a file is automatically downloaded, as shown below.

You can then use the Go Tool PProf profile to enter interactive mode, and the rest of the operation is the same as described in the Runtime /pprof section.

The underlying implementation of/ debug/pprof/profile is the same as our Runtime /pprof demo, internally calling StartCPUProfile and StopCPUProfile.

  1. /debug/pprof

List the profiles supported by PProf. The current version (GO 1.16) supports the following.

Including the profile operations we just introduced, each profile and explain as follows:

profile instructions
allocs Program running to the current memory application situation
block The stack frame of a blocked coroutine is disabled by default and needs to be calledruntime.SetBlockProfileRateTo turn on
goroutine Stack frames for all coroutines. Use? Debug =2 shows more detailed information (such as goroutine status)
heap Memory allocation of objects on the heap
mutex The mutex race is disabled by default and needs to be invokedruntime.SetMutexProfileFractionTo turn on
profile CPU status, after sampling, was analyzed using go Tool PProf
threadcreate System thread creation
trace The tracking state of an application

graphical

In Runtime/Pprof, you can see that there are a lot of commands (most of which are graph-generating commands) that rely on one component, graphviz. Let’s take a look at Graphviz and the various graphs that graphviz generates.

Graphviz

Graphviz is an open source graph visualization tool that is great for drawing structured ICONS and networks. Graphviz uses a language called DOT to represent graphs.

Golang Pprof can generate dot data, and then use Graphviz open to see a variety of beautiful graphics, graphviz you go directly to www.graphviz.org/ to install.

Flame Graph πŸ”₯

Flame graphs are a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths to be identified quickly and accurately.

Images from www.brendangregg.com/FlameGraphs… Visit the source site to experience a detailed interaction with a flame diagram. Here’s a brief description of how to look at a flame diagram (as an example of a CPU flame diagram).

The flame diagram is a TWO-DIMENSIONAL SVG graphic.

The Y-axis represents the stack of function calls, with each layer being a function. The deeper the call stack, the higher the fire, with the function being executed at the top and its parent functions below.

The x axis represents the number of samples, and the wider the width of the x axis a function occupies, the more times it is drawn, which means that it takes longer to execute. Note that the X-axis does not represent function call times. It does not mean that the functions that appear earlier in the call stack are called first. Instead, the stack is arranged alphabetically by function name.

The flame diagram is just looking at which function takes up the most width. Any “flat top” indicates that the function may have a performance problem.

The longer the “flat top” section, the longer the function is called during the sample.

Color has no special meaning, because the flame diagram indicates the CPU’s busy degree, so generally choose warm color (πŸ”₯).

Graph

Graph, I usually call it a call Graph (to distinguish it from a fire Graph, I don’t know if it’s true πŸ˜€), where each square represents a function entity, and the directed edge is a function call. The source node is Caller, and the target node is Callee. The coarser the directed edge means the more expensive the call, and the larger the node means the more expensive the current function.

Pprof of actual combat

On paper come zhongjue shallow, realize that this matter to practice.

Now that we’ve seen the basics, let’s reinforce pProf with a demo.

Actual source code: github.com/wolfogre/go… Thanks to Wolfgre for the live code πŸ™.

Clone the clone to go build, then run (source repository does not build based on go mod, we here also off the mod).

export GO111MODULE=off && go build
Copy the code

Let’s start with a quick look at the main.go file.

The program is set to GOMAXPROCS, to limit the number of P to 1, to limit the number of CPU cores, to enable mutex and block tracking, and the program’s pprof address is localhost:6060.

Note that the actual code directly builds performance cases, which is relatively simple, and many exceptions can be seen with the naked eye, but instead of looking directly at the code to solve the problem, use the PProf tool to find problems in the application.

Once the process is started, we can see that the terminal is constantly producing output.

Let’s take a look at the various aspects of the bomb program.

profile

I’m here on MAC OS, so I’m looking directly at the process with the activity monitor, and you can see that our actual program has a very high CPU 😱.

Next let’s sample the CPU information for 30 seconds using pprof.

Go tool pprof ` ` http://127.0.0.1:6060/debug/pprof/profile ` `? seconds=30

After sampling, you can directly use the command line tool described in the previous section to view the top call, here we use a new way, we use the graphical form of observation.

go tool pprof --http=:8080 xxxx/pprof.samples.cpu.001.pb.gz

An 8080 port is started here to allow us to visually access the CPU sample information.

  1. flame graph

  1. top

  1. graph

The top, Graph, and Flame Graph all show the “flat top” in *Tiger.Eat (of the three, the graph should be the more visually striking).

Now that we’ve identified the problem in Tiger.Eat, let’s switch to the Source TAB and look at the detailed code.

As you can see, each time Eat is called, it is called 1e10 times. Here we comment out the loop code and run it again.

As you can see, the CPU consumption has been completely reduced and πŸŽ‰ has solved the CPU problem, so let’s look at the second problem.

heap

Using the activity monitor again, you can see that our process is currently eating close to 2 gb of memory.

We are consistent with the CPU chapter, collect first then analyze.

= go tool pprof - HTTP: 8080 127.0.0.1:6060 / debug/pprof/heap

From the heap call graph, we can see that πŸ€ stole our memory 😁, mainly in the case of mouse. Steal, and then we look at source.

It is found that the append slice is looped, each append 1MB until it reaches 1G, and there is no subsequent release operation.

So here’s a little exercise, why is the application only append 1 gigabyte of memory, and what we end up seeing in the application monitor is 2 gigabytes. A hint is related to the implementation mechanism of the Slice Append.

We comment out the problem code, compile and run again, you can see that the memory has been reduced, we have solved another problem πŸŽ‰! But have the memory-related issues been completely resolved πŸ€”?

allocs

Most memory problems can be solved by using the heap command, but it does not cover the special case of applying for and then freeing memory. Golang is a language with GC, and frequent memory /GC requests can cause a significant performance drop in our program.

Using the go tool pprof = – HTTP: 8080 127.0.0.1:6060 / debug/pprof/allocs can see history of memory application.

You can see that the “flat top” appears in dog.run, so let’s look at the source again with source.

Here, dog.Run is inline, so you can’t see the Run code in the source. Let’s use pprof to look at it in a different way (this is to demonstrate the list call; it’s easier to look directly at the source code when working).

Use the go tool pprof ${filename} to access the interactive command terminal of pprof, and then use the list Run to see where the problem is. Dog. Run allocates 16MB of memory for each call, and if it is not used, it will be GC.

I could leave a second little problem here, where I’ve allocated memory and there’s no reference anywhere else, why is it still allocating memory on the heap? Instead of applying on the stack?

Let the program run for a while, then look at Alloc_spaces and see that there are no more applications available πŸŽ‰.

goroutine

Golang supports concurrency at the language level. It is very easy to write concurrent programs with Goroutine in go, but goroutine can also cause problems if used improperly. If you’ve ever used a language without GC (such as C++), you’ve probably experienced memory leaks. Golang is a language with GC, so you don’t normally get memory leaks, but if you don’t use goroutine properly, you can get goroutine leaks, which can cause CPU/memory spikes.

Open the http://127.0.0.1:6060/debug/pprof/.

You can see that there are more than 100 goroutines in the program, which is certainly not normal for a small program like ours.

Using the go tool pprof = – HTTP: 8080 127.0.0.1:6060 / debug/pprof goroutine distribution / / goroutine to view our program in use.

As you can see at Wolf.Drink, 101 Goroutines were applied, let’s take a look at the source again.

Let’s comment this out and rerunning the program. We can see that Wolf.Drink applies 10 Goroutines per call, and each goroutine sleeps for 30 seconds.

As you can see, there are only 5 goroutines left πŸŽ‰.

mutex

All of the problems we’ve addressed are resource-related, but in addition to resource-related, a lot of synchronization (such as mutex locking) can result in poor program performance despite low resource utilization.

It’s the same old story, sampling first.

= go tool pprof - HTTP: 8080 127.0.0.1:6060 / debug/pprof/mutex

All right, so the lock competition appears on The Wolf.Howl, and then the source operation is the one that we’re all familiar with.

“Mutex” is not a recommended use of “mutex” (πŸ“šmutex). “Mutex” is not a recommended use of “mutex” (πŸ“šmutex).

You can see that the wait time for lock contention is very little πŸŽ‰.

block

Golang supports concurrent operations, but it’s essentially a synchronous language (as opposed to Node.js event callbacks), so there’s bound to be some blocking, and blocking is a big performance problem for our program.

Using the go tool pprof = – HTTP: 8080 127.0.0.1:6060 / debug/pprof/block.

You can see that the blocking operation is happening on Chanrecv, which is traced back to our business code, cat.Pee.

In the source code, we can see that Cat.Pee reads a chan every time it is called, and the chan is not readable until 1 second later. Just take this out πŸŽ‰.

conclusion

First, we introduced the basic definition of a profile. Second, we introduced the basic use of Golang Pprof. The exposed API of Golang Pprof includes Runtime /pprof and NET/HTTP /pprof. Net/HTTP /pprof encapsulates the HTTP call to Runtime /pprof. The common commands for pprof are list, top, and web. Then, a graphical PProf helps us visualize the various profile data of the GO process. Finally, the performance problems of CPU, memory, lock and other aspects of a program are solved in practice. All in all, pprof is a very powerful tool, with more uses and features to explore. Hopefully, pprof will help you keep your Golang app cool for the summer.

Reference documentation

  1. Pprof – Stefno – Cnblogs.com
  2. Golang pprof combat | Wolfogre ‘s Blog