Brief introduction: Talk about the systematic analysis of Linux high load, Ali Cloud system group engineer Yang Yong through the systematic analysis of various problems online.

The topic of how to troubleshoot Linux Load high is a platitude, but most articles only focus on a few points, lacking the introduction of the overall troubleshooting idea. It is better to teach a man to fish than to give him fish. This paper attempts to establish a method and routine to help readers have a more comprehensive understanding of the problem of high Load troubleshooting.

Start by clearing up misunderstandings

Loads without baselines are unreliable loads

Since the first day of Unix/Linux System management, many people have been exposed to the monitoring index System Load Average. However, not everyone knows the true meaning of this index. In general, it is common to hear the following misconceptions:

  • High Load indicates a high CPU Load…… Traditional Unix is designed differently from Linux. On Unix, Load high means multiple runnable processes, but not on Linux. For Linux, high Load can occur in two ways:

  • Raised when the number of processes in the R state in the system increases

  • The number of processes in the D state increases

  • If the Loadavg value is greater than a certain value, there must be a problem… Loadavg values are relative and are influenced by CPU and IO devices, and even by certain software-defined virtual resources. The judgment of Load high should be based on a historical Baseline, and the comparison of loads across systems should not be unprincipled.

  • Load high system must be busy… . Load High The system is busy. For example, the CPU is busy when the Load is high. If the Load is high, the system is not always busy. For example, if the I/O Load is high, disks can be busy but cpus can be idle, for example, if the IOWAIT is high. Note here that IOWait is essentially a special CPU idle state. The other Load is high. The CPU and disk peripherals are idle and may support lock contention. In this case, the IOwait of the CPU is not high, but idle is high.

Brendan Gregg in recent blog [Linux Load Averages: Solving the Mystery] (www.brendangregg.com/blog/2017-0)… , discusses the differences between Unix and Linux Load Average, and goes back 24 years to discussions in the Linux community to find out why Linux changed the Unix Load Average definition back then. According to the article, it is the d-state thread calculation method introduced by Linux that causes the high Load to become ambiguous. Because there are too many reasons for the D state switch in the system, it is not IO load, lock competition so simple! Because of this ambiguity, Load values are more difficult to compare across systems and application types. All loads should be based on historical baselines. This wechat public account has also written a related article, you can see Linux Load Average those things.

How to troubleshoot the problem of high Load

As mentioned earlier, checking loadavG is a complex process because Load is a vaguely defined metric in Linux. The basic idea is to enter different processes depending on whether the root cause of Load changes is an increase in R state tasks or an increase in D state tasks.

Here is the general routine for the investigation of Load increase, for reference only:

In Linux, the number of processes in the R state can be obtained by reading /proc/stat. However, the number of tasks in D state is probably the most direct way to use the PS command is more convenient. The procs_blocked file in /proc/stat gives the number of processes waiting for disk I/OS:

By simply distinguishing between an increase in R state tasks and an increase in D state tasks, we can enter into different screening processes. Below, we do a simple combing on the investigation of this big picture.

R Status tasks increased

This is commonly referred to as high CPU load. The main idea of troubleshooting and locating such problems is to analyze the running time of the system, container, and process, find the hot path on the CPU, or analyze which section of code the CPU runs on.

The distribution of CPU user and SYS time often helps people quickly determine whether it is related to user-mode processes or the kernel. In addition, the length of the CPU’s Run queue and the scheduling wait time, as well as the number of non-voluntary context switches, can provide an overview of the problem scenario.

Therefore, dynamic tracing tools such as PERf, SystemTap, and fTrace are often used to correlate problem scenarios to related code.

After the code path is associated with the code time analysis, some invalid runtime in the code is also the primary concern in the analysis, such as the user state and the kernel state Spin Lock.

Of course, if the CPU is running very meaningful, very efficient code, the only thing to consider is whether the load is really too much.

D Status tasks increase

According to the design of the Linux kernel, the D-state task is essentially an active sleep induced by TASK_UNINTERRUPTIBLE, so the possibilities are numerous. However, due to the special definition of sleep caused by IO stack in idle time of Linux kernel CPU, namely IOWAIT, IOwait has become an important reference for determining whether Load is high caused by IO in D state classification.

Of course, as mentioned earlier, the trend of procs_blocked in /proc/stat can also be a good reference for determining high loads caused by IOWAIT.

The CPU high iowait

A common misconception is that CPU IOWait is high because the CPU is busy doing I/OS. On the contrary, when the IOWait is high, the CPU is idle and no task can be run. The idle state is identified as IOWAIT, not idle, because disk I/OS have been issued.

However, if the perf probe command is used, we can clearly see that the CPU in iowait state is actually running on the idle thread with pid 0:

Related idle thread loops how to count CPU IOWAIT and idle respectively, as shown in the following code:

Linux IO stack and file system code calls io_schedule and waits for disk I/O to complete. In this case, the atomic variable Rq -> nr_ioWAIT, which counts CPU time as the key count for IOwait, is increased before sleep. Note that the IO_schedule state is usually explicitly set by the caller to TASK_UNINTERRUPTIBLE before it is called:

High CPU idle

As mentioned earlier, a significant number of kernel blocks, known as TASK_UNINTERRUPTIBLE sleep, are actually unrelated to waiting for disk I/O, such as lock contention in the kernel, sleep on direct memory page reclaim, or active blocking on some code paths in the kernel, waiting for resources.

Brendan Gregg in recent blog [Linux Load Averages: Solving the Mystery] (www.brendangregg.com/blog/2017-0… The perf command generates a TASK_UNINTERRUPTIBLE sleep fire graph, which nicely illustrates the variation that causes CPU idle to be high. I will not repeat it in this article.

Therefore, the analysis of high CPU idle is essentially to analyze the main cause of blocking caused by the code path of the kernel. In general, we can use perf inject to handle the context switch event of perf record, correlate the kernel code path of process swtich out and switch in. Generate what is called an Off CPU flame map.

Of course, for relatively simple problems like lock contention, the Off CPU flame diagram is sufficient to locate the problem in one step. But for the more complex delay problem of blocking due to the D state, perhaps the Off CPU flame diagram only gives us a starting point for investigation.

For example, when we see that the main sleep time of the Off CPU flame graph is caused by the epoll_wait wait. Therefore, we should continue to examine the network stack latency, namely the Net Delay section in the larger figure of this article.

At this point, you might find that the high performance analysis of CPU IOwait and IDLE is essentially delay analysis. This is the big picture. According to the general direction of resource management in the kernel, delay analysis has been broken down into six delay analysis:

  • CPU delay
  • Memory latency
  • File system delay
  • IO stack delay
  • Network stack delay
  • Lock and synchronization primitive contention

TASK_UNINTERRUPTIBLE sleep caused by any of the above code paths is the object for analysis!

End with a question

It’s hard for this article to go into all the details because reading this, you may find that the original high Load analysis is actually a full Load analysis of the system. No wonder it’s called System Load. This is why Load analysis is difficult to cover in one article.

This article also kicks off the first chapter of our Series on Linux Performance analysis. Stay tuned for a series of articles on each of the six delays mentioned above…

About the author

Oliver Yang is a Linux kernel engineer from Aliyun System Group. He worked for EMC, Sun China Engineering Research Institute, in storage system and Solaris kernel development.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.