Linux system load average case study for performance analysis

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

One, foreword

In this article, we will do a few case studies to deepen our understanding of the concepts related to load averaging on Linux systems.

Second, preparation

1. Test environment

Operating system: CentOS 7.2 Dual-core
Monitoring tools: IOTOP, HTOP, TOP, uptime, and sysstat
Pressure measuring tool: Stress

Get the number of physical CPU cores
[zzw@7dgroup2 ~]$ lscpu -p | egrep -v '^ #' | wc -l
2
# fetch the number of logical cpus (including the number of hyperthreaded logical cpus)
[zzw@7dgroup2 ~]$ lscpu -p | egrep -v '^ #'| sort - u - t, 2, 4 - k | wc - 1 lCopy the code

2. Tool introduction

Iotop is a top-like tool for monitoring disk I/O usage
Htop is a piece of monitoring and process management software running on Linux. It is used to replace top in Linux. Unlike TOP, which only provides a list of the processes that consume the most resources, HTOP provides a list of all processes, with processor, swap, and memory status colored out.
Stress is a Linux stress test tool that generates Cpu/Menory/IO/Disk loads on Posix operating systems.
Sysstat contains common Linux performance tools for monitoring and analyzing system commands.

3. Tool installation

# installation stress
sudo yum install -y epel-release
sudo yum install -y stress

# installation iotop
sudo yum install -y iotop

# installation htop
sudo yum install -y htop

# installation sysstat
sudo yum install -y sysstat   
Copy the code

4. Other work

We opened five terminals and logged into the same Linux machine.

Terminal 1: STRESS Simulates Linux pressure testing scenarios
Terminal 2: TOP Monitors the process status
Terminal 3: IOTOP Monitors process I/O usage
Terminal 4: HTOP Monitors process details
Terminal 5: mpstat monitors system IOWAIT details

Case analysis

With all of the above preparations in place, let’s first use the uptime command to take a look at the current Linux load average

[zzw@7dgroup2 ~]$uptime 20:12:34 UP 148 days, 3:09, 7 Users, Load Average: 0.06, 0.10, 0.13Copy the code

Scenario 1: CPU-intensive processes

First, we run the stress command on terminal 1 to simulate a 100% CPU utilization scenario.

Simulate a scenario with 100% CPU utilization.
[zzw@7dgroup2 ~]$ stress --cpu 1 --timeout 600
stress: info: [2395] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
Copy the code

Viewing the current CPU usage and load average on terminal 2, we can see that the 1-minute load average has slowly increased to 1.11, and one CPU1 User has reached 72% usage.

In terminal 5, we find that the iowait of the system is almost 0. This indicates that the increase in average load is due to increased CPU User usage. The culprit is the Stress process with a PID of 9717.

On terminal 4, we can also intuitively understand the current load through Htop. Here we can see that the CPU User usage is green and high.

Scenario 2: I/ O-intensive process scenario

We continue to simulate THE I/O stress by running the stress command on terminal 1, that is, continuously executing sync

# Simulate I/O pressure and sync continuously
[zzw@7dgroup2 ~]$ stress -i 1 --timeout 600
stress: info: [11041] dispatching hogs: 0 cpu, 1 io, 0 vm, 0 hdd
Copy the code

Viewing the current CPU usage and load average on terminal 2, we can see that the 1-minute load average has slowly increased to 1.25, with CPU0 System using 68%

On terminal 5, we found ioWait on both cpus. This indicates that the increase of system load average is caused by the increase of IOWAIT.

So which process is it? On terminal 3, we found through IOTOP that two processes were performing a large number of IO write operations. Combined with the usage of top S column (process status code) (R and D, ready running state and non-interruptible state), we could find that stress process with PID 19241 was the cause.

Here is what happens when we stop the Stress process.

Through hTOP of terminal 4, we can also intuitively understand the current load. Here we can see that the CPU usage is red and high.

Scenario 3: Lots of processes

A process waiting for the CPU occurs when the number of processes running on the system exceeds the CPU’s capacity. Here again, we use stress to simulate four processes.

[zzw@7dgroup2 ~]$ stress -c 4 --timeout 600
stress: info: [13473] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd
Copy the code

Since the system had only two cpus, significantly less than four processes, the system was heavily overloaded with an average load of 4.30

Looking further at the length of the run queue (the number of processes waiting to run), it can be seen that stress processes are frantically competing for two cpus, which leads to a large run queue. These processes exceed the computing capacity of the CPU and eventually lead to system overload.

The following is using the vmstat command.

On terminal 4, we can also intuitively understand the current load through Htop. Here we can see that the CPU User usage is green and high.

Four, summary

Load averaging provides a quick way to view the overall system performance and reflect the overall system load status. But it doesn’t exactly correspond to CPU usage. Such as:

Cpu-intensive processes, using a large number of cpus can lead to a higher load average, which is the same.
I/ O-intensive processes, waiting for I/ OS can also lead to higher load averages, but CPU usage is not necessarily high.
A large number of processes waiting for the CPU can also result in high load averages and high CPU utilization

In addition, hTOP is colored for different types of loads (F2 can be customized). For example, CPU intensive applications, the load color is green and high, and ioWAIT operations, it is red and high.

Finally, a classical performance analysis thinking diagram is attached:

Operating system (CPU/IO/Mem/Net)-> process -> thread -> stack -> code. If both CPU and I/O are high, look at I/O first.