JAVA IO system IO
The basic concept
-
Virtual File System (VFS): The Virtual File System (VFS) uses standard Unix System calls to read and write File systems on different physical media. The Virtual File System (VFS) provides a unified operating interface and application programming interface (API) for different File systems. The VFS is the glue layer that allows system calls like Open (), read(), and write() to work regardless of the underlying storage media and file system type.
-
FD(File Descriptor): The kernel accesses files using File descriptors. The file descriptor is a non-negative integer. When an existing file is opened or a new file is created, the kernel returns a file descriptor. Reading and writing files also requires a file descriptor to specify the file to be read or written to.
-
PageCache: Usually 4K, it improves performance by caching data on disk into memory, thereby reducing disk I/O operations. Also, make sure that changes to the data in the Page cache are synchronized to disk, which is called Page writeback. An inode corresponds to a Page cache object, and a Page cache object contains multiple physical pages.
-
BufferCache: A block buffer, usually 1K, corresponding to a disk block, used to reduce disk I/O. A pageCache usually has multiple BufferCaches
Read and write workflow Disk <– >bufferCache<– >pageCache<– > application
System I/O workflow
Network IO
TCP
Socket: quad (unique identifier) CIP_cport + SIP_sport –> kernel level (even if you do not accept, resources will be opened)
Interview question: Does the server need to assign a random port number to the client connection? If not, C-S will maintain the quad communication itself and carry the quad to find the connection to respond.
Common IO models
The I/O model determines to a large extent the performance of the framework which channels are used to send data to each other, BIO, NIO, or AIO.
Blocking I/O(traditional IO-BIO)
Summary: When a process calls recvfrom, the function is used until ① the datagram arrives and is copied to the application process buffer. ② Or error occurs (such as signal interruption) to return.
Therefore, blocking IO is characterized by blocking both phases of I/O execution — blocking waiting for data and blocking copying data. Features are as follows:
- Each request requires a separate thread to complete the operation of data Read, business processing, and data Write.
- When the number of concurrent connections is large, a large number of threads need to be created to process connections, occupying large system resources.
- After a connection is established, if the current thread has no data to Read temporarily, the thread blocks on the Read operation, resulting in a waste of thread resources.
Non-blocking I/O
Note: when a read is performed on a non-blocking socket and the data in the kernel is not ready, it does not block the user process but immediately returns an EWOULDBLOCK error. If the kernel has data ready, it immediately copies it to user memory and returns it successfully.
Since non-blocking I/O returns immediately when there is no data, user processes often need to call recvFROM in a loop, constantly actively asking the kernel if the data is ready.
Therefore, the characteristic of non-blocking IO is that the thread does not block during the first phase of I/O execution, but blocks during the second phase.
I/O multiplexing model
IO multiplexing is an event-driven IO multiplexing system that monitors multiple sockets in a single thread at the same time and polls all sockets using select or poll data. Notifies the user process.
Summary: As you can see, the process blocks on the SELECT call, waiting for a socket to become readable; When a socket is available to read, recvFROM is called to copy the datagram from the kernel to the user process buffer, at which point the process blocks in the second phase of IO execution.
The user process is actually blocked all the time, but the advantage of IO reuse is that you can wait for multiple descriptors to be ready.
So, IO reuse is characterized by two system calls, with the process blocking first on select/poll and then on the second phase of the read operation.
Signal driven IO model
Signal-driven IO(IO) allows the kernel to send SIGIO signals to inform user processes when descriptors are ready.
Summary: First, you need to enable signal-driven IO for the socket, and then register the SIGIO signal handler through the SIGAction system call — which returns immediately. When the data is ready, the kernel generates a SIGIO signal for the process, which can be read by calling recvFROM in the signal handler function.
Therefore, signal-driven IO is characterized by the process not blocking while waiting for data ready, and then blocking and copying data when it receives signal notification.
Asynchronous IO model
Asynchronous IO is rarely used, first appearing in the Linux 2.5 kernel and becoming a standard feature in the 2.6 kernel.
Summary: After the user process initiates the AIo_read operation, the system call returns immediately — the kernel then waits for the data to be ready and automatically copies the data to user memory. When the process is complete, the kernel sends a signal to the user process notifying IO that the operation is complete.
The main difference between asynchronous IO and signal-driven IO is that in signal-driven IO, the kernel tells us when to start an IO operation, while in asynchronous IO, the kernel tells us when the IO operation is complete.
Therefore, asynchronous IO is characterized by the fact that both phases of IO execution are performed by the kernel without user process intervention or blocking.
There are several IO models for multiplexing IO
In general, epoll and SELECT are the same. The only difference is that poll has a maximum value of 1024. Poll has no IO value
Poll v. Poll Mainly because epoll creates more space in the kernel to store the result set of stateful FDS than Poll, which can be directly used by user processes to operate. Poll needs to traverse all FDS for stateful FDS, wasting resources and affecting efficiency.