CSDN GitHub
Performance analysis using flame chart under Linux LDD-LinuxDeviceDrivers/study/debug/tools/perf/flame_graph


This work is licensed under creative Commons Attribution – Non-commercial Use – Same way Share 4.0 International license, please indicate the source of reproduction, thank you for your cooperation

Because my technical level and knowledge is limited, if there is a flaw or need to correct the content, welcome you to correct, but also welcome you to provide some other good debugging tools for inclusion, I thank you here


Software performance analysis, often need to look at the CPU time, know where the bottleneck.

The Flame Graph is a useful tool for performance analysis

1 Introduction to flame diagram


Many people catch a cold fever, tend to imitate the emperor shen nung direction and grass bouquet: taste of antiviral drugs, try antibacterial drugs, regardless of what medicine each test in the home, what Chinese traditional medicine, western medicine blind cat always meet with dead mouse, so practice is not desirable, naturally the correct approach should be to go to the hospital to have a blood test, after diagnosis to suit the remedy to the case.

Let’s think back to how we debug programs in general: often relying on guestions without data, rather than thinking about what caused the problem!

It goes without saying that when tuning for performance problems, you also need to address them. The good news is Brendan D. Gregg invented the fire diagram

Figure 1.1 flame


Common types of flame maps are on-CPU, off-CPU, Memory, Hot/Cold, Differential, etc.

For a detailed introduction to the Blazing Performance with Flame Graphs, in short: the whole graph looks like a pulsating Flame, hence the name. Burning at the tip of the fire is what the CPU is doing, but it’s important to note that the color is random and has no special meaning in itself. Vertical indicates the depth of the call stack, and horizontal indicates the elapsed time. Because call stacks are sorted alphabetically horizontally and the same stack is merged, the wider a cell, the more likely it is to be a bottleneck. To sum up, the main point is to look at the wider flames, especially those similar to the mesa fire.

To generate flame charts, you must have a handy Tracer tool, and if Linux is the operating system, the choice is usually perf, one of systemTap’s. Perf is more commonly used because it is a performance tuning tool built into the Linux Kernel and is included in most Linux systems. Interested readers may refer to Linux Profiling at Netflix later. In particular, the description of how to deal with Broken Stacks is recommended to read it several times, while SystemTap is more powerful, but the downside is that you need to learn its own programming language first.

If you are an Nginx developer or optimizer, then I highly recommend using the Nginx-Systemtap-Toolkit. At first you might think the name of the toolkit is nginx-specific. In fact, many of these tools are suitable for any program written in C/CPP:

The program function
sample-bt Sample data used to generate on-CPU flame charts (DEMO)
sample-bt-off-cpu Sample data used to generate the off-CPU flame chart (DEMO)

1.2 On/Off CPU flame diagram


So when to use on-CPU flames? When to use an off-CPU flame map?

Depending On exactly what the current bottleneck is, use an on-CPU flame chart for a CPU or an off-CPU flame chart for IO or locks. If you are not sure, you can use a pressure measuring tool to confirm: Use the pressure gauge to see if you can saturate the CPU usage. If you can use the on-CPU flame chart, if no matter how hard you press the CPU usage, it is likely that the program is stuck by IO or lock, so use the off-CPU flame chart.

If you still can’t confirm, then you might as well do both on-CPU and off-CPU flame charts. Under normal circumstances, their differences will be relatively large. If two flame charts are similar, then it is usually considered that the CPU was preempt by other processes.

When sampling data, it is best to keep pressure on the program with the pressure tool so that sufficient samples are collected. As for the selection of pressure tools, if ab is selected, it is important to turn on the -k option to avoid running out of available ports on the system. In addition, I recommend trying a more modern manometry tool such as WRK.

##1.3 Flame map visualization generator

Brendan D. Gregg’s Flame Graph project implements a set of scripts to generate a Flame Graph.

The Flame Graph project is on GitHub

Github.com/brendangreg…

Clone it with Git

git clone https://github.com/brendangregg/FlameGraph.git
Copy the code

The following steps are required to generate and create a flame diagram

process describe The script
Capture the stack useperf/systemtap/dtraceTools such as grab the program’s run stack perf/systemtap/dtrace
Folded stack traceThe tool captures the stack information of the system and program at each moment in time, and it is necessary to analyze and combine them, accumulating the duplicate stacks together to show the load and critical path FlameGraphIn thestackcollapseThe program
Generate flame chart Analyze stackCollapse output stack information to generate a flame diagram flamegraph.pl

Different trace tools capture different information, so Flame Graph offers a range of StackCollapse tools.

stackcollapse describe
stackcollapse.pl for DTrace stacks
stackcollapse-perf.pl for Linux perf_events “perf script” output
stackcollapse-pmc.pl for FreeBSD pmcstat -G stacks
stackcollapse-stap.pl for SystemTap stacks
stackcollapse-instruments.pl for XCode Instruments
stackcollapse-vtune.pl for Intel VTune profiles
stackcollapse-ljp.awk for Lightweight Java Profiler
stackcollapse-jstack.pl for Java jstack(1) output
stackcollapse-gdb.pl for gdb(1) stacks
stackcollapse-go.pl for Golang pprof stacks
stackcollapse-vsprof.pl for Microsoft Visual Studio profiles

2 Generate flame chart with PERF


2.1 PerF Data Collection


Let’s start with the perf command (short for performance), which is a performance analysis tool native to Linux. It returns the name of the function being executed by the CPU and the stack.

sudo perf record -F 99 -p 3887 -g -- sleep 30
Copy the code

[Img-kngleyyP-1624459176139)(./perf_record_chrome. PNG)]

Perf record indicates the collection system event. If -e is not used to specify the collection event, the default collection cycles(namely CPU clock cycle) are cycles. -f 99 means 99 times per second. -g records the call stack, and sleep 30 lasts 30 seconds.

-f Sets the sampling frequency to 99Hz(99 times per second). If the same function name is returned for all 99 times, it indicates that the CPU is executing the same function for the second and may have performance problems.

This results in a large text file. If a server has 16 cpus and samples 99 times per second for 30 seconds, you get 47,520 call stacks, hundreds of thousands or even millions of rows long.

For ease of reading, the perf Record command counts the percentage of occurrences in each call stack and then ranks it from highest to lowest.

sudo perf report -n --stdio
Copy the code

2.2 Generate flame diagram


Perf. data is parsed first with the Perf script tool

# Generate a collapsed call stack perf script -I perf.data &> perf.UNFOLDCopy the code

Save the parsed information to generate the flame chart

Perf. unfold first fold the symbols from perf.unfold using Stackcollapse -perf.pl:

# Generated fire./ StackCollapse - Perf. pl perf. unfolded &> Perf. foldedCopy the code

Finally, an SVG diagram is generated

./flamegraph.pl perf.folded > perf.svg
Copy the code

We can use pipes to reduce the above process to a single command

perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > process.svg
Copy the code

3 Parse the flame diagram


Finally, you can use your browser to open the flame map for analysis.

3.1 Meaning of flame diagram


The flame chart is an SVG image generated based on stack information to show the CALL stack of the CPU.

The Y-axis represents the call stack, and each layer is a function. The deeper the call stack, the higher the flame, with the executing function at the top and its parent functions below.

The X-axis represents the number of samples. The wider a function occupies along the X-axis, the more times it is drawn, or the longer it takes to execute. Note that the X-axis does not represent time, but rather all call stacks are grouped in alphabetical order.

The flame diagram is to see which function on the top takes up the most width. Any “flat top “(plateaus) indicates that the function may have a performance problem.

Colors have no special meaning, because the flame chart shows how busy the CPU is, so warm colors are generally used.

3.2 interaction


Flame diagrams are SVG images that can be interacted with by the user.

  • The mouse is suspended

Each layer of the flame will mark the function name, when the mouse hover shows the complete function name, sampling times, the percentage of total sampling times

  • Click to enlarge

Click on a layer and the flame will zoom in horizontally. The layer will take up all the width and display the details.

“Reset Zoom” will also appear in the upper left corner. Click the link and the image will be restored to its original state.

  • search

Pressing Ctrl + F displays a search box where the user can enter a keyword or regular expression, and all matching function names are highlighted.

3.3 limitations


In both cases, the fire diagram cannot be drawn and the system behavior needs to be corrected.

  • The call stack is incomplete

When the call stack is too deep, some systems return only the first part (such as the first 10 layers).

  • Missing function name

Some functions do not have names and the compiler only uses memory addresses to represent them (such as anonymous functions).

3.4 Browser Flame Chart


Chrome can generate flame maps of page scripts for CPU analysis.

Open developer tools and switch to the Performance panel. Then, click the “Record” button to start recording data. At this point, you can do various things on the page and then stop recording.

At this point, the developer tool displays a timeline. Below it is the fire chart.

The browser flame chart differs from the standard flame chart in two ways: it is inverted (that is, the function at the top of the stack is at the bottom); The X-axis is the time axis, not the number of samples.

4 Red and blue bifurcation flames


Refer to www.brendangregg.com/blog/2014-1…

Thanks to flame Graphs, CPU usage questions are generally easier to locate. But to deal with performance backsliding, you have to constantly switch back and forth between fire maps before and after modification, or between different periods and scenarios, to find the problem, which feels like searching for Pluto in the solar system. Although, this method can solve the problem, I think there should be a better way.

Red/Blue differential Flame Graphs

4.1 Examples of red and blue differential flame diagrams


Above is an interactive SVG image. Two colors are used to represent states, red for growth and blue for decay.

The shape and size of each flame is the same as that of the CPU flame in the profile captured for the second time (Y-axis represents stack depth, X-axis represents the total number of samples, and stack frame width represents the proportion of the function in the profile). The top layer represents the running function, and the stack that calls it is just below.

The following example shows an increase in CPU usage for a workload after a system upgrade. Below is the corresponding CPU flame map (in SVG format)

In general, the colors of the stack frames and towers are chosen randomly in a standard fire diagram. In the red/blue differential flame diagram, different colors are used to represent the different parts of the two profiles.

The deflate_slow() function and its subsequent calls run more times in the second profile than in the previous one, so the stack frame is marked red in the figure above. You can see that the problem is caused by ZFS compression enabled, which was turned off before the system was upgraded.

This example is so simple that I can analyze it without even using a differential flame diagram. But imagine if you were analyzing a small performance drop, say less than 5%, and the code was more complex.

4.2 Introduction to red and blue differential flame diagram


I’ve been talking about this for years, and I’ve finally written an implementation myself that I personally think is valuable. Here’s how it works:

  1. Grab stack profile1 file before modification

  2. Grab the modified stack profile2 file

  3. Use profile2 to generate the flame map (so that the stack frame width is based on the profile2 file).

  4. Recolor the flame map using the “2-1” difference. The coloring principle is that if the stack frame appears in profile2 more times, it is marked red, otherwise it is marked blue. The colors are filled in according to the difference before and after modification.

The purpose of this is to compare the profile before and after the change, which can be useful when performing functional verification tests or evaluating the performance impact of a code change. The new flame map is generated based on the modified profile (so the stack frame width still shows the current CPU consumption). By comparing colors, you can understand the reason for the difference in system performance.

Only functions that have a direct impact on performance are colored (for example, a running function), and the child functions it calls are not labeled twice.

4.3 Generate red/blue differential flame diagram


The author’s GitHub repository FlameGrdph implements a program script, Difffolded. Pl to generate red and blue differential flame diagrams. To show how the tool works, use Linux perf_Events to demonstrate the steps. You can also use other profilers/tracers.

  • Grab profile 1 file before modification:
Perf record -f99 -a -g -- sleep 30Perf script > out.stacks1 # collapse stack./stackcollapse-perf.pl.. /out.stacks1 > out.folded1Copy the code
  • After some time (or modification of the program code), grab the profile 2 ‘file
Perf record -f99 -a -g -- sleep 30# stacks2 # collapse stack./stackcollapse-perf.pl.. /out.stacks2 > out.folded2Copy the code

Generate red and blue differential flame map

./difffolded.pl out.folded1 out.folded2 | ./flamegraph.pl > diff2.svg
Copy the code

Difffolded. Pl can only operate on stack profile files that have been “folded”, which was done by the previous StackCollapse series of scripts. The script outputs three columns of data, one of which represents the collapsed call stack and the other two columns represent the statistics of the profile before and after modification.

func_a; func_b; func_c31 33[...].Copy the code

In the example above “funca()->funcb()->func_c()” represents the call stack, which appears 31 times in profile1 and 33 times in profile2. Then, using the flamegraph.pl script to process the 3 ‘columns of data, a red/blue differential flame map is automatically generated.

Some more useful options:

The other options describe
difffolded.pl -n This option normalizes the data in the two profiles to match each other. If you don’t, fetching statistics for all stacks will be different because fetching times and CPU loads are different. In this case, it looks either red (increased load) or blue (decreased load). The -n option balances the first profile so that you get the full red/blueprint spectrum
difffolded.pl -x This option will delete the hexadecimal address. Profilers often fail to convert addresses to symbols, so there will be hexadecimal addresses in the stack. If the address is different in the two profiles, the two stacks are thought to be different stacks, when in fact they are the same. If you have a problem like this, use the -x option
flamegraph.pl –negate Used to reverse the red/blue color scheme. This feature is used in the following sections

4.4 Deficiencies


While the red/blue differential flame diagram is useful, there is actually a problem: if a code execution path disappears completely, there is no place to highlight blue in the flame diagram. You can only see the current CPU usage, not why it is that way.

One idea is to reverse the comparison order and draw a differential flame diagram in reverse. Such as:

The flame chart above is based on the profile before the modification, and the colors represent what will happen. The part highlighted in blue on the right shows that CPU Idle consumes less CPU time after the modification (in fact, cpuIDLE is filtered out by running the grep -v cpuidle command).

The missing code is also highlighted (or should I say, not highlighted), because compression was not enabled before the change, so it does not appear in the profile before the change, so there is no red section.

Here is the corresponding command line:

./difffolded.pl out.folded2 out.folded1 | ./flamegraph.pl --negate > diff1.svg
Copy the code

Thus, using diff2.svg together with the previous generation, we get:

Flame chart information describe
diff1.svg The width is based on the profile before the modification, and the color indicates what will happen
diff2.svg The width is based on the modified profile and the color indicates what has happened

If I were doing a functional verification test, I would generate both diagrams.

4.5 CPI Flame diagram


These scripts were originally used to analyze THE CPI flame chart. Unlike the profile before and after the modification, the CPI flame chart can analyze the difference between the CPU working period and the pause period to highlight the CPU working status.

4.6 Other differential flame diagrams


There are others who have done similar work. Robert Mustacchi did something similar a while back, using a color-coded approach similar to code review: only the differences are shown, with red indicating new (up) code paths and blue indicating deleted (down) code paths. A key difference is that the stack frame width only reflects the sample number of the difference. Here’s an example on the right. This is a good idea, but it feels a little strange in practice, because without the context of the full profile, it’s a little hard to understand.

Cor-paul Bezemer also made a differential display method called Flamegraphdiff. He put 3 flame images in the same image at the same time, modified one standard flame image before and after, and added a differential flame image below, but the stack frame width was also a different sample number. The picture above is an example. Hover the mouse over the stack frame in the difference diagram and the same stack frame will be highlighted in all three diagrams. This method is supplemented with two standard flame charts, thus solving the context problem.

All three of us have our own differential flame maps. The three can be used in combination: for the top two diagrams in the Cor-Paul method, I can use diff1.svg and DIFF2.svg. The fire diagram below could be Robert’s. For consistency, the flames below can be colored as I did: blue -> White -> red.

The flame chart is spreading widely and is now used by many companies. I wouldn’t be surprised to learn that there are other ways to implement differential flame diagrams. (Let me know in the comments)

4.7 summarize


If you are experiencing performance fallback problems, red/blue differential flame plots are the fastest way to find the root cause. This takes two ordinary flame maps, compares them, and colors the differences: red for up, blue for down. The differential flame map is based on the current (” modified “) profile and remains the same shape and size. So you can see the difference by the color and see why there is a difference.

Differential flame maps can be applied to daily builds of projects so that performance fallbacks can be detected and corrected in a timely manner.

Via: www.brendangregg.com/blog/2014-1…

5 reference


Use Linux Perf tool to generate Java program flame map

Generate Flame Graph using PerF

Brendangregg’s site


  • This work/blog (AderStep- Purple Night Appendix – Qingling Lane Grass Copyright ©2013-2017), by Cheng Jian (Gatieme).

  • usingCreative Commons Attribution – Non-commercial Use – Same way Share 4.0 International LicenseWelcome to reprint, use and re-publish the article, but be sure to keep the bylineAs thou dost gatieme(Includes links to:blog.csdn.net/gatieme) shall not be used for commercial purposes.

  • Any work modified based on this article must be distributed under the same license. If you have any questions, please contact me.