Recently, I made a simple technical sharing in the company, mainly focusing on the analysis and investigation of performance problems. For the first time, IT mainly focused on CPU. PPT was also organized into an article here.

This document introduces basic knowledge about CPU performance and provides common tools and methods for troubleshooting CPU performance problems through simple examples.

(Examples all have source code, find time to upload on GitHub)

Know before you use – CPU performance

When talking about the advantages and disadvantages of CPU performance, many people may first think of the number of cores, think that the more cores, the stronger the performance, more students may also care about frequency.

Then the actual CPU performance, through what parameters to judge it, let’s take a look at the official Inte i7 8700 performance parameters.

There are a number of parameters related to CPU performance that I have covered in previous articles and will not repeat here.

The number of cores and threads is one of the most common, so let’s go through a series of examples to see how CPU performance problems can be analyzed in various scenarios.

Are you busy or not? – CPU resource usage

We are often faced with such a problem, such as service, middleware compaction, or locating sudden delays on the line, and need to determine the pressure on the server or container. In this case, we usually look at the CPU first, but how to determine whether the CPU is busy?

It has reached 100%. Is the CPU busy?

Let’s look at the first example. We run the code and see the thread information from top:

CPU usage is 100%. Is it busy?

The /proc/cpuinfo command is used to determine how many concurrent threads a CPU can support.

Here is a faster way to see the usage of all cores (physical + logical) by pressing the number key 1 in the top command interface.

As you can see, only 1 core is running full in user mode. The remaining 11 cores are idle, so the CPU is not busy.

1200%! The CPU is busy this time

Let’s look at the second example,

CPU usage is up to 1200%. Looking at the usage of each core,

Almost every core is full, and the CPU is really busy this time.

So the question is, why is the CPU so busy, and why is this example running full compared to the first example?

To really use all the core, then certainly need multithreaded support, how to verify?

Top -p {pid} // Displays information about a process. 2. H // Displays information about all subthreads of a processCopy the code

So it’s easy to see why.

The process creates 12 threads, each of which is assigned to one core, and runs the core full, so the entire CPU is busy.

Full load or overload?

The above two scenarios are relatively simple and easy to judge, but often we need to know the load of a server, CPU load is one of the indicators that need to be measured, so how to determine whether the CPU is overloaded?

It’s 1200% full in terms of CPU usage, but is it any different from the second example? Is the CPU full or overloaded?

Generally, to determine the CPU Load, you can pay attention to the data in the Top interface, Load Average.

Load Average (upper right) describes the CPU Load in the last 1 minute, 5 minutes, and 15 minutes.

To understand what a number means, for example:

Drive across the bridge.

A single-core CPU (which does not support hyperthreading) can execute only one thread at a time point, just like a bridge can pass only one car at a time point, and the Load Average of the bridge is 1.

If another car wants to cross the bridge at the same time, it must wait for the car in front to pass before moving forward, then the load at this time is 2, which means that one car is crossing the bridge and one car is waiting, the bridge at this time is in overload state.

The regression example itself shows that there are 12 cores in total in the current environment, while the Load Average of the recent 1 minute has reached 24, indicating that each core at this time has an extra thread waiting to be executed in addition to the executing thread, and the CPU is overloaded at this time.

(Compared with the second example, the Load Average is around 12, which is in the state of full Load rather than overload.)

What is this about? – CPU behavior analysis

The previous examples were used to help us determine if the CPU is busy, but you must be wondering, what is the CPU busy about?

What does a speeding CPU do

Looking back to the example of a full CPU,

The whole CPU is already full. We have run a total of 12 threads, each running a full core. According to the Load Average, we can judge that the current CPU is fully loaded.

The CPU is so busy, what on earth is busy?

To find out what the CPU is doing, we need to find out what each thread is doing. The most common method is pStack/jstack.

You can see a snapshot of the current execution stack for each thread, and all threads are executing calc().

In addition to the pstack/jstack command, you can also use the debug GDB to attach to the corresponding thread and view the execution information of all child threads using info Threads.

The result is the same as that of pStack, where calc() is executed.

In some relatively simple scenarios, you can use these two methods to quickly see what the CPU is doing and see the results at a glance.

What’s the difference between the busy CPU and running?

Let’s take a new example,

The current CPU usage of the thread is 100%, with only one core (CPU1) running full.

Is it different from the first example in the article?

In this example, 16.5% of the 100% CPU1 runs are in US and 83.5% are in SY. In the first example, 100% of the CPU1 runs are in US.

To understand this, you first need to understand the dimensions in the top command that describe the CPU state.

The dimension instructions
us User Time, the percentage of users running in User space
sy Kernel Time, the percentage of running space in the Kernel
ni Nice Time, the percentage of low-priority threads running
id Idle Time: Percentage of Idle Time
wa IO Wait Time: percentage of CPU occupied by I/O waits
hi Hardware Interrupts Time (CPU usage) specifies the percentage of hard Interrupts
si Software Interrupts Time specifies the percentage of CPU used by soft Interrupts

This example has a higher usage ratio in SY, that is, the CPU runs more in the kernel space, which involves the management and use of system calls and underlying system resources.

So how do you know what the CPU is doing in kernel space?

As mentioned earlier, you can use pStack/jSTACK or GDB to check the system call behavior, but the call stack of normal threads is very deep and it is not easy to find the corresponding system call behavior. In this case, you can use Strace to check the system call behavior.

The thread is writing two characters (” a\0 “) to the file in the write method, and the file IO is done in the kernel space, so the CPU SY is high.

If you want to find the root cause of a problem, many times you need to find the file that was written so that you can locate the code and troubleshoot the problem. How do you use the information above?

First, we need to look at the write method. The last two parameters describe what is being written. The first parameter is the file descriptor FD, and the corresponding file of FD is found by lsof or /proc/{pid}/ FD.

Instantaneous what you see is not real – CPU behavior statistics

Although there are ways to see what the CPU is doing at a given point in time, in actual problem analysis, you need to observe the CPU behavior over a period of time and perform statistical analysis to find the root cause of the problem.

What is the CPU doing most of the time?

There’s still a CPU running down there,

If you want to know what the CPU is doing, you can go to pStack/jstack or GDB Info Threads and see something like this:

Different threads are doing different things, and you can’t just look at the CPU behavior at a particular point in time and see which way is using the most resources.

To find the root cause of CPU running full, need more than a period of CPU behavior statistics, here introduces the Linux performance analysis magic – PERF, it is a comprehensive performance analysis tool, we can through the PERF record CPU execution stack data sampling, Statistical analysis was conducted through PERF report.

perf record -F 90 -g -a -p {pid}
perf report
Copy the code

As can be seen from the statistical results, CALC3 occupies 59% of CPU resources, which is the highest, followed by CalC2 with 26%, which can be seen exactly which method occupies too much resources.

Of course, if you want more comprehensive CPU behavior statistics, you can use FlameGraph to generate on-CPU fire charts.

You can see not only the CPU usage ratio of each function, but also the call dependency information. (GitHub FlameGraph has a detailed description of on-CPU flame map generation. How to eat the flame map can be found on your own.)

Why is the CPU so idle?

Let’s take a new example,

We created a lot of threads, but the CPU usage was low and only a few threads were running.

So the question is, why is the CPU so idle when I’ve created so many threads to execute?

If the Thread is not executed and the CPU resource is still idle (91.3%), it indicates that the Thread itself is in the interrupted state (Thread Status = R, dormant, blocked or resource waiting). In this case, we need to pay attention to what the CPU is not doing, but that the Thread is not scheduled to run. What are you waiting for?

In this case, we need to focus on the off-cpu. We can use the offcputime provided by BCC for data sampling, and finally produce the off-cpu FlameGraph with FlameGraph.

// Select * from user space; // select * from user space; Folded 7 > folded 7 > manna Generated FlameGraph/flamegraph.pl output.folded > output.svgCopy the code

As you can see, nearly half of the time in nanosleep, and half of the time in resource waiting lock_wait, from the source we can also be corresponding.

pthread_mutex_t mute; void thread_func() { while (1) { pthread_mutex_lock(&mute); int loop = 10000; int x, y, sum = 0; for (x = 0; x < loop; ++x) { for (y = 0; y < loop; ++y) { sum += y; } } pthread_mutex_unlock(&mute); sleep(1); }}Copy the code

conclusion

  • CPU performance

There are many parameters related to CPU performance. The most important ones are core and frequency. Before locating performance problems, you need to understand the basic CPU hardware performance parameters and check the current CPU running status by using some basic commands.

  • CPU Resource Usage

In terms of CPU resource usage, we can pay attention to the overall usage and the usage of each core, and judge the current CPU Load by Load Average.

  • CPU behavior Analysis

When analyzing CPU behavior, the user mode is busy and the kernel mode is busy. You can use the corresponding methods to find out the cause. In fact, there are more cases here, such as soft interrupt occupation (in sync flood, which can be simulated by hping3), hard interrupt occupation, Steal occupation (encountered once, occurred in the container, which is caused by overselling of host resources), and different cases may need to be checked with other commands. Common ones include LSOF and DMESG.

  • CPU Behavior Statistics

In both cases, the on-CPU and off-CPU flame charts can be analyzed. In fact, when the CPU is not used, it is not what the CPU is waiting for, but more about why the thread is blocked and suspended.