“This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!”

Introduction to the

CPU context switch is a core function to ensure the normal operation of Linux system. According to different scenarios, IT can be divided into process context switch, thread context switch, and interrupt context switch.

In the case of too many context switches, tools such as vmstat, pidstat, and /proc/interrupts can help you identify the root cause of the performance problem.

Vmstat & pidstat tools

Vmstat is a common tool used to analyze system memory usage and CPU context switches and interrupts.

#Output one set of data every 5 seconds
vmstat 5
Copy the code

Meaning of each column:

  • Context Switch (CS) is the number of context switches per second.
  • In (interrupt) is the number of interrupts per second.
  • R (Running or Runnable) is the length of the ready queue, which is the number of processes that are Running and waiting for the CPU.
  • B (Blocked) is the number of processes in an uninterruptible sleep state.

Vmstat only shows context switches for the system as a whole. To see the details of each process, use the pidstat -w option to view context switches for each process

Meaning of each column:

  • PID: process
  • CSWCH /s: indicates the number of voluntary context switches per second
  • NVCSWCH /s: Indicates the number of non-voluntary context switches per second.

Voluntary context switch: A context switch occurs when a process fails to obtain the required resources. For example, voluntary context switches occur when system resources, such as I/O, memory, and so on, are insufficient. Involuntary context switch: Refers to the context switch that occurs when a process is forcibly scheduled by the system due to reasons such as time slice. For example, an involuntary context switch can occur when a large number of processes are competing for CPU.

Case assumptions

To prepare

Sysbench is a multithreaded benchmark tool used to evaluate database load across different system parameters. In this case, we use it as an exception process to simulate the problem of too many context switches.

#Install sysbench and sysstat• Apt install sysbench sysstat or yum install sysbench sysstatCopy the code

Scenario 1: Simulate a multithreaded scheduling bottleneck

Context switch check

  1. Sysbench –threads=10 –max-time=300 threads run (5 minutes)

  2. inSecond terminalrun vmstat 1To observe the context switchOutput 1 group of data every 1 second (Ctrl+C)

    And it turns out,Cs columnThe number of context switches jumped from 1 to 1.42 million. Meanwhile, other indicators:

    R column: The ready queue length has reached 6 to 8, which is far more than the number of system cpus 2, so there must be a lot of CPU competition.

    The US (user) and SY (system) columnsThe combined CPU usage of these two columns rose to nearly 100%, with the system CPU usage, i.e. Sy column, rising to 70%, indicating that the CPU is being used mainly by the kernel.

    In the column: The number of outages also rose to around 60,000, indicating that outage handling is also a potential problem.

    Based on these indicators, it can be seen that the ready queue of the system is too long, that is, the number of processes that are running or waiting for THE CPU is too large, which leads to a large number of context switches, which in turn leads to a high CPU usage of the system.

  3. inThird terminalThen usepidstat -w -u 1What process is causing these problems(The -w parameter indicates the output process switching indicator, and the -u parameter indicates the output CPU usage indicator. 1 outputs one group of data every second.)

    The pidstat output showed that the increase in CPU usage was actually due to sysbench, which was up to 100% CPU usage. But context switches come from other processes, including Pidstat, which has the highest involuntary context switch (NVCSWCH /s), and the kernel threads kworker and SSHD, which have the highest voluntary context switch frequency.

  4. pidstat -wt 1Number of output thread context switches(The -wt parameter indicates the context switch indicator for the output thread)[3]The number of switches achieved by the step is significantly smaller than vmstat’s 1.42 million, because the basic unit of Linux scheduling is actually a thread rather than a scenarioSysbench simulates the scheduling of threadsBy running man pidstat, you can see thatPidstat displays process metrics by default.After the -t parameter is added, the thread’s metric is output.

Check the number and type of interrupts

How do you know what type of interrupt occurred?

Read from the /proc/interrupts file. /proc is actually a virtual file system for Linux for communication between kernel space and user space. /proc/interrupts is part of this communication mechanism, providing a read-only interruption usage.

Observe the interruption by running watch -d cat /proc/interrupts (the -d parameter highlights the changed area).

After observing a period of time, it can be found that the rescheduling interrupt (RES) changes the fastest,

Rescheduling interrupts (RES) : Wake up an idle CPU to schedule a new task to run. This is the mechanism used by the scheduler in a multiprocessor system (SMP) to spread tasks among different cpus. It is also commonly referred to as inter-processor Interrupts (IPI).

How many context switches per second is normal

This number depends on the CPU performance of the system itself. When the number of context switches exceeds 10,000, or the number of switches increases by an order of magnitude, it is likely that a performance problem has occurred.

Depending on the type of context switch, you need to do specific analysis. For example:

  • More voluntary context switches indicate that processes are waiting for resources and other issues such as I/O may occur.
  • If there are more involuntary context switches, it means that processes are being forcibly scheduled. In other words, processes are competing for CPUS, which means that the CPU is indeed the bottleneck.
  • The number of interrupts increases, indicating that the CPU is occupied by the interrupt handler/proc/interruptsFile to analyze specific interrupt types.

Follow + like 👍 collect ❤️ don’t get lost

Article weekly continuous update, wechat search “ten minutes to learn programming” the first time to read and urge, if this article is well written, feel something if you support and recognition, is the biggest motivation for my creation, we will see the next article!