Welcome to Tencent Cloud + community, get more Tencent mass technology practice dry goods oh ~.
Author: Zhao Kun | tencent Lord studio background development engineer
In project development, it is common to encounter problems such as long program startup time and high CPU usage. At this time, performance analysis tools are needed to locate performance consumption points. This article introduces the entry level use of three commonly used tools and graphical methods for your reference.
This article introduces three performance analysis tools, Perf, Gprof and Valgrind, as well as the method of analyzing results graphically. Due to the limitation of space, this article will not give a detailed introduction to the parameters and result analysis of each tool. This article only provides the introductory instructions. Please Google for more detailed instructions.
The introduction of each tool is divided into three parts: introduction, usage instructions, and graphical methods.
The results for each tool are based on this code:
#include
using namespace std; #define NUM 500000void init(int* int_array){
for(int i=0; i<NUM; i++){ int_array[i]=i; }}void accu(int* int_array,long& sum ){for(int i=0; i<NUM; i++){ sum+=int_array[i]; usleep(3); }}intmain(){ int int_array[NUM]; init(int_array); long sum=0; accu(int_array,sum); }Copy the code
This code was executed for 31 seconds on a normal PC with a maximum CPU utilization of 8.3%
Perf1.1 profile
Perf is a profiling tool built into the Linux kernel source tree. Based on the event sampling principle and performance events, it is often used to find performance bottlenecks and locate hot codes.
1.2 the use of
Perf can be used in two ways:
-
Start the service directly using perf
-
Hang to a started process
The first method does not require root permission, and the second method requires root permission
Based on the premise of entry-level use, let’s introduce how to use it directly:
perf record -eCpu-clock-g./run or perf record-e cpu-clock -g -p 4522Copy the code
Use CTRL + C to interrupt the perf process, or at the end of the program execution, will produce perf.data file, use
perf reportCopy the code
Results analysis is produced, as shown
1.3 Graphical method
The result of perF can generate a flame map. To generate the Flame Graph, use the Flame Graph
The Flame Graph project is located on GitHub:
https://github.com/brendangregg/FlameGraph
Clone the code or download the zip directly to the server. For example, the compressed package is named flamegraph-master. zip. Assume that the decompressed directory is /data
Based on perf.data generated in 1.2, the next steps are as follows:
Perf script -i perf.data &> perf. Unfold 2. /data/stackcollapse- Perf. pl perf. unfolded &> Perf. folded 3, Generated SVG: /data/flamegraphCopy the code
The resulting flame diagram is as follows:
There are many articles on the Internet about the meaning and analysis of the flame chart
Gprof2.1 profile
Gprof is used to monitor the execution time and the number of times each method is called in your application to identify the most time-consuming functions in your application. After the program exits normally, the gmon.out file is generated, which can be parsed to generate a visual report
2.2 Usage
With gprof, you need to add the -pg option at compile time
In addition, gmon.out is generated only after the program exits normally, and the kill process method cannot generate gmon.out. For services where threads run all the time, the code needs to be modified to stop the program at some point.
After recompiling, start the program normally. The gmon.out file is then generated after the program finishes running
Generate the registration file with the following command (where run is the binary name) :
gprof -b run gmon.out >>report.txtCopy the code
Report.txt is opened as shown below:
2.3 Graphical method
The result files of gprof need to be displayed with gprof2dot.py and Graphviz
Generate dot files using gprof2dot.py
python gprof2dot.py report.txt >report.dotCopy the code
It should be noted that the server is required to have Python installed and that gprof2dot.py matches the installed Version of Python. Whether the two match is a matter of luck and boring to solve. The Python installed on my server is 2.6.6. The gprof2dot-2017.9.19 downloaded from the Internet for the first time does not match the Python version and the execution will fail. The current version is compatible with 2.6.6, please contact me if necessary.
To open dot, you need graphviz. I installed Graphviz on Windows. It’s easy to download. Use gvedit.ext to open the report. Dot file generated in the previous step
This graph is a little bit cute, because our program is relatively simple to write, for ordinary business, this graph would be more complicated.
Valgrind3.1 profile
Valgrind is not native to Linux and requires installation. Valgrind itself contains several tools:
-
Memcheck: Used for memory leak check
-
Callgrind: Used for performance analysis, which collects program runtime and call relationships
-
Cachegrind, Helgrind, etc
Here we mainly use the Callgrind tool
3.2 Usage
First you need to install ValGrind:
http://valgrind.org/downloads/valgrind-3.12.0.tar.bz2
After decompressing the installation package, run./configue, make, and make install in sequence
To use ValGrind to analyze performance, you must use ValGrind to start the program:
valgrind --tool=callgrind --separate-threads=yes ./runCopy the code
Separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads separate-threads Otherwise, it will be printed to different files according to the thread.
After the program is executed, a file in the form of callgrind.out.4263-01 will be generated. This file is difficult to analyze directly and must be viewed graphically
3.3 Graphical method
Kcachegrind. Exe is required for valGrind graphics. You can download it and run it on Windows. This is the result of opening callgrind. Out.4263-01:
4 Tool Comparison
All three tools can be used to address our need to locate the functions that take the most time to execute and consume the most CPU. But there is still a gap between the three:
4.1 Startup Mode
Perf can attach processes but requires root privileges. Under normal permissions, Perf and Valgrind must be prefixed to start the program, which affects the performance of the program to some extent. In the process of pressure testing, we found that the total number of online users supported by Valgrind startup was much lower than that supported by running the program directly.
4.2 Program Intrusion
Neither Perf nor Valgrind requires modification of the Makefile or program, but gprof requires recompilation of the file and, for services where threads are always running, modification of the code to exit naturally, which intrudes into the program at some point. However, in terms of its impact on performance, Gprof can retain the performance of the original application to the maximum extent possible
4.3 Result Display
The result of gprof is an inverted tree that shows the time consumption of all nodes from root to leaf; Perf is a pyramid, similar to Gprof; The result of Valgrind is a single path, indicating the elapsed time on a particular invocation path, and is not a global presentation.
4.4 Monitoring Principles
This is a very professional topic, at present, the monitoring principle of the three has not touched too thoroughly, so here is temporarily empty. If you are interested, you can study it first.
Question and answer
Linux Real-time scheduling algorithm?
Multithreading in Linux?
reading
Introduction to common Linux performance tools
Common Linux tuning commands and tools
Performance optimization: Properly configure large memory pages in Linux
This article has been published by Tencent Cloud + community authorized by the author, please note if reproducedThe article citations
The original link: https://cloud.tencent.com/developer/article/1063652