I/O is an important part of the Java language and a module that is frequently used in the development process. This article is not about Java’s I/O architecture, but rather BIO and NIO.
What is the I/O
In learning Java for a long time, I/O was only used to manipulate files.
I/O input/output operation. Any access by the processor to any data resource outside the encapsulation, such as registers and caches, can be treated as an I/O operation. For example, a file operation is a disk IO, a network request is a network IO, and reading and writing to the database is also AN IO operation.
Because I/O operations are everywhere in the application, I/O performance is particularly important. Unfortunately, the performance bottleneck for most systems today is the I/O area, and there are many solutions to improve I/O performance.
- Add a caching layer to the system architecture.
- Real-time computing is done more in memory (Spark)
- Get better hardware (SOLID-state drives)
- Optimize the I/O model and introduce NIO in Java
This article discusses the last one: the I/O model in Java.
Understand the I/O
To understand the differences in I/O models, first understand how I/O works.
The kernel calls
Our applications are running on the operating system, the role of the operating system is to manage hardware resources and provide a good environment for application developers, but the computer system is limited in all kinds of resources, the operating system also needs to ensure that each process can safely execute. To ensure secure access to resources, the processor has two modes: user mode and kernel mode. Some security-prone operations are restricted to kernel mode only, such as I/O operations.
Our application runs in user mode, and when the application needs to implement kernel-mode instructions (such as reading a file from disk), it first sends a call request to the operating system. After receiving the request, the operating system executes the system call interface to switch the processor from the user state to the kernel state and start executing the corresponding request. When the processor finishes processing the system call operation, the operating system switches the processor from kernel state to user state again to continue executing the user program.
BIO
BIO stands for Block Input Output. This is the most common TYPE of I/O model and is generally used by default. This model, in terms of the steps in the diagram above, means that all the steps are sequential.
When the processor switches from user to kernel mode, the user thread blocks waiting for data to return (or an error). When the kernel is ready, the user thread has to wait to copy the data to the user-space buffer before it can proceed.
The benefits of this model lie in its simplicity, ease of understanding and implementation. The disadvantage is that threads block for a lot of time, wasting resources.
The BIO model is not applicable in the environment of large concurrent requests, because each socket request from the client has a corresponding socket on the server for sending and receiving data. As long as the client continues to open the connection, the socket on the server will continue to exist, which will quickly run out of resources in the case of high concurrency.
The server will have one thread for each client request.
NIO
BIO is inefficient because of the long blocking time and the easy breakdown of server resources under a large number of requests. Based on this problem, Java introduced a new I/O model in Java, NIO, in jdk1.4.
NIO is also a non-blocking Input Output(also known as New IO). The biggest difference between NIO and BIO is the process of waiting for the kernel to prepare data.
In BIO, the kernel blocks while preparing data until the data is returned or an error occurs. In NIO, a system call is made and immediately returned. The thread can do something else, and the process of waiting for data is replaced by the blocking wait in NIO by polling.
The application makes a request that requires system call participation and returns immediately without waiting for data to be ready. The returned process can do something else, polling every given time to check that the kernel data is ready. And then we process the data when it’s ready. In this way, the process does not have to wait and waste while the kernel prepares data.
At the same time, I/O multiplexing model is also introduced to improve efficiency. The advantage of multiplexing model is that each request does not immediately allocate a processing thread to it, but registers it with the Selector and waits for the request time to be ready before execution. The basic model is as follows:
Each client request creates a channel, and each channel is registered with a Selector. A channel has multiple states such as readable and writable. The selector polls the channels registered with it, and when a channel’s state changes, it puts it into the thread pool for execution. So one thread can manage thousands and thousands of requests, and not block when a client request comes in, and just register the selector.
AIO
AIO stands for (Async IO)- Asynchronous IO. The difference between AIO and NIO is how the application senses the data once the kernel is ready.
There are applications in NIO that periodically check to see if the data is ready. AIO is a callback mechanism, so when the data is ready it’s called back and forth by the operating system and our interface tells us that the data is ready and we can proceed with the following process. So NIO is synchronous and AIO is asynchronous.
Reference documentation
Netty’s Definitive Guide, second edition