One afternoon a few years ago, the company’s code farmers were quietly tapping on code when a number of mobile phones began beeping at the same time. I thought I got paid, and I was happy! The alarm message is displayed
Failure to review
The message “The number of threads exceeds the threshold” or “CPU idle rate is too low” is displayed. Open the monitoring system, all 20 service nodes of the order service are not working, and the service is not responding.
The number of threads on each SpringBoot node reached the maximum. However, there are no significant exceptions in JVM heap memory or GC. The CPU idle rate is almost 0%, but the CPU usage is not high, and the IO wait is very high. The following is a screenshot of running the top command to view the CPU status:
From the picture above, we can see:
CPU idle rate is 0% (id in red box)
CPU usage is 22% (the red box in the figure above is US 13% plus SY 9% (US is user process CPU usage, SY is system process CPU usage).
The percentage of CPU time spent waiting for disk IO operations is 76.6% (red box WA in the figure above)
By now, it is clear that the problem must have occurred on the IO wait. Using the monitoring system and the jStack command, the problem is eventually located on the file write. The large number of disk reads and writes causes the JVM to run out of thread resources (note that this does not mean that the system is running out of CPU). As a result, the order service cannot respond to requests from the upstream service.
IO, things you don’t know
Since IO has such an impact on system performance and stability, let’s delve into it.
The so-called I/O (Input/Output) operation is actually the data transmission behavior of Input and Output. Programmers pay the most attention to disk IO and network IO because these are the two IO operations that are most directly and closely related to the application.
Disk IO: Disk input/output, such as data transfer between disk and memory.
Network IO: Data transfer across a network between different systems, such as remote interface calls between two systems.
The following diagram shows a specific scenario in which IO occurs in an application:
From the figure above, we can understand the specific scenarios in which IO operations occur. A request process may perform many IO operations:
-
Network IO occurs when the page requests to the server
-
Network IO occurs for remote calls between services
-
Network IO occurs when an application accesses the database
-
Disk I/O occurs when data is queried or written to the database
Relationship between IO and CPU
Many siege lions will interpret a 0% CPU idle rate as a sign that the CPU is working at full capacity and has no time for other tasks. Is that true?
Let’s first look at how the computer manages disk IO operations. In the early days of the computer, data transfer between disk and memory was CONTROLLED by the CPU, meaning that data read from disk to memory had to be stored and forwarded by the CPU, and the CPU was occupied all the time. We know that disk read/write speed is not nearly as fast as CPU speed. In this way, data transmission consumes a large amount of CPU resources, resulting in serious CPU resource waste.
Later, an IO controller was designed to control disk IO. Before data is transferred between the disk and memory, the CPU sends instructions to the I/O controller. The I/O controller notifies the CPU after data is transferred. As a result, the CPU is no longer involved in the process of reading data from disk to memory, leaving the CPU free to do other things, greatly improving CPU utilization. This IO controller is called “DMA,” or Direct Memory Access. Most computers today use this DMA mode for data transfer.
We know from the above content that IO data transmission does not occupy the CPU. When the application process or thread has IO wait, the CPU releases the corresponding time slice resources in time and allocates the time slice resources to other processes or threads, so that the CPU resources can be fully utilized. So, even if the CPU idle rate (ID) is 0% when most of the CPU is spent on IO waits (WA), it does not mean that the CPU is completely exhausted. If a new task comes along, the CPU will still be able to execute it. The diagram below:
Performing IO operations in DMA mode takes no CPU, so CPU IO waits (wa in the figure above) are actually part of the CPU idle rate. In addition to CPU idle rate, CPU usage (US, sy), and IO Wait (wa), we need to pay attention to top. Note that WA only stands for disk IO Wait, not network IO Wait.
The relationship between thread state and IO in Java
When we look at the State of Java threads using JStack, we see various thread states. When AN IO wait occurs, such as a remote call, what state is the thread in, Blocked or Waiting?
The answer is a Runnable state, not surprisingly! In fact, at the operating system level Java’s Runnable states include not only Running, but also Ready and IO Wait states.
As shown in the figure above, the Runnable state annotation makes it clear that threads executing at the JVM level may be waiting for other resources at the operating system level. If the waiting resource is a CPU, at the operating system level the thread is in the Ready state waiting to be scheduled by the CPU. If the waiting resources are I/O resources such as disk nics, the thread is in THE I/O Wait state at the operating system level.
One might ask why Java threads don’t have a dedicated Running state.
At present, most of the mainstream operating systems are time sharding for polling tasks. The time slice is usually very short, about tens of milliseconds, which means that a thread can only execute for tens of milliseconds at a time on the CPU, and then it will be scheduled by the CPU to become Ready, waiting to be executed again by the CPU. The thread quickly switches between the Ready and Running states. In general, JVM thread state is primarily for monitoring usage and is for human viewing. By the time you see the thread state is Running, the thread state has already switched N times. So it doesn’t make sense to add a Running state to the thread.
In-depth understanding of the network IO model
The five Linux network I/O models include synchronous blocking I/O, synchronous non-blocking I/O, multiplexed I/O, signal-driven I/O, and asynchronous I/O.
Writing in the front
To better understand the network IO model, let’s look at a few basic concepts.
Socket: A Socket can be understood as a communication endpoint between two applications during network communication. During communication, one application writes data to the Socket and then sends the data to the Socket of another application through the network adapter. We usually say HTTP and TCP protocol remote communication, the bottom layer is based on Socket implementation. Five network I/O models are also based on Socket to achieve network communication.
Blocking and non-blocking: Blocking means that a request cannot be sent back immediately until all the logic has been processed. Non-blocking Conversely, a request is made and an answer is returned immediately, before all logic is processed.
Kernel space vs. user space: In Linux, applications are far less stable than operating system programs. To ensure the stability of the operating system, Linux distinguishes between kernel space and user space. The kernel space runs operating system programs and drivers, and the user space runs applications. In this way, Linux isolates operating system programs from applications, preventing applications from affecting the stability of the operating system itself. This is the main reason why Linux systems are super stable. All system resource operations take place in kernel space, such as reading and writing disk files, allocating and reclaiming memory, and making network interface calls. So during a network IO read, instead of reading directly from the network adapter to the application buffer in user space, data is copied first from the network adapter to the kernel space buffer, and then from the kernel to the application buffer in user space. For the network IO writing process, the process is the opposite, copying data from the application buffer in user space to the kernel buffer, and then sending the data from the kernel buffer through the network adapter.
Synchronous blocking IO
Let’s look at traditional blocking IO first. In Linux, all sockets are blocked by default. When the user thread calls the system function read(), the kernel begins to prepare the data (to receive the data from the network). When the kernel is ready, the data is copied from the kernel to the application buffer in user space. The request is returned only after the data is copied. The entire process is blocked from the time the read request is made to the final copy of the kernel to the application. To improve performance, you can assign one thread per connection. Therefore, a large number of threads are required in a large number of connection scenarios, resulting in a significant performance cost, which is the biggest drawback of traditional blocking IO.
Synchronize non-blocking IO
The user thread returns immediately after making the Read request, without waiting for the kernel to prepare the data. If no data is Read from the Read request, the user thread polls repeatedly to initiate the Read request and stops polling until the data arrives (the kernel is ready for the data). The non-blocking I/O model avoids the heavy thread consumption caused by thread blocking, but the frequent repeated polling greatly increases the number of requests and the CPU consumption. This model is rarely used in practical applications.
Multiplex I/O model
Multiplex I/O model, based on the multiplex event separation functions select, poll, epoll. Before issuing a read request, update the select socket monitor list and wait for the select function to return (this process is blocked, so multiplexing IO is also a blocking IO model). When data arrives on a socket, the select function returns. At this point, the user thread officially initiates a read request to read and process the data. This mode uses a dedicated monitoring thread to check multiple sockets and hand over data to a worker thread if any of the sockets arrive. Since waiting for the Socket data to arrive is very time-consuming, this method solves the problem of blocking I/O model that each Socket connection requires one thread, and does not have the problem of CPU performance loss caused by busy polling of non-blocking I/O model. The multiplexing I/O model is used in many practical scenarios, such as Java NIO, Redis and Dubbo’s communication framework Netty.
The following figure shows the detailed process of Socket programming based on the select function.
Signals drive the IO model
Signal-driven IO model. Application processes use the sigAction function and the kernel returns immediately. This means that the application process is non-blocking while the kernel prepares the data. When the kernel prepares the data, it sends a SIGIO signal to the application process. After receiving the signal, the data is copied to the application process.
In this way, the CPU utilization is high. This mode, however, can cause signal queue overflow and signal loss in the case of a large number of IO operations, with disastrous consequences.
Asynchronous I/O model
The basic mechanism of the asynchronous I/O model is that the application process tells the kernel to start an operation, and the kernel notifies the application process when the operation is complete. In the multiplexed I/O model, the application process reads and processes the data itself only after the socket status event arrives and is notified. In the asynchronous I/O model, when the application process is notified, the kernel has read the data and put the data into the application process buffer, and the application process can use the data directly.
Obviously, the asynchronous I/O model performs very well. So far, asynchronous and signal-driven IO models have not been widely used, and traditional blocking AND multiplexing IO models are still the mainstream applications. The asynchronous I/o model was introduced after Linux2.6, and the support for the asynchronous I/o model is not mature in many systems. Many application scenarios use multiplexed I/O instead of asynchronous I/O.
How can I avoid system faults caused by I/O problems
For disk file access operations, you can pool the thread and set the thread online to avoid contamination of the entire JVM thread pool, which can cause thread and CPU resource exhaustion.
For inter-network remote calls. Set a reasonable TImeout value to avoid a full link fault that is invoked between services. In scenarios with high concurrency, the circuit breaker mechanism can be used. In the same JVM, thread isolation mechanism is adopted to divide the threads into several groups, and different thread groups respectively serve different classes and methods, so as to avoid the failure of a small function point, which causes all threads in the JVM to be affected.
In addition, good operation and maintenance monitoring (disk I/O, network I/O) and APM (full link performance monitoring) are also very important. They can give early warning and prevent faults in time, and help us quickly locate faults when they occur.
Read three things ❤️
If you found this post helpful, I’d like to invite you to do three small favors for me:
-
Likes, retweets, and your “likes and comments” are what drive me to create.
-
Follow the public account “Java rotten pigskin”, irregularly share original knowledge.
-
In the meantime, look forward to a follow-up article at ing🚀
The author of this article: Feng Tao from the public account: Architecture advanced Road