Blog Rewrite plan – Update log: 2021.4.23 -> 1. Added description of system call, kernel space and user process space 2. I/O multiplexing implementation function select and epoll introduction, and through the actual use of redis review; 3. Fix some semantic and logical incoherence problems. PS: Looking back at the blog I wrote a year ago is unbearable. I have to improve it. The optimization is mainly based on Unix Network Programming and Redis Design and Implementation.

TODO_LIST:

  • [Analysis of Java NIO application instance -Tomcat connection pool]
  • [Updates to the Java NIO module]

IO is introduced:

Java IO

  1. IO is processed according toThe data typeIt can be divided into: (1) Byte oriented I/O interface:InputStream and outputStream(2) Interface for character operation:Reader and Writer
  2. IO according to datatransportIt can be divided into: (1) DISK oriented I/O interface: File(2) NETWORK operation-oriented I/O interface:Socket
  3. Therefore, the main I/O operations can be summarized as what type of data is sent to where and how.

Unix IO:

The process uses recvFROM, SELECT, epoll, aiO_read and other system calls to perform IO operations, so we will first explain the concept of user mode, kernel mode, and system call, so that we can understand why the I/O model is copied from the kernel to user space.

User mode, kernel mode, and system call

Memory according to the usage rights are divided into several levels (represented by the size of the DPL), including the operating system using the memory area (kernel) descriptor of the DPL = 1, the area of memory used by the user process has the corresponding descriptors of DPL = 3, and computer in hardware level access limits the DPL from high to low, As a result, user processes do not have direct access to kernel space from their own process space;

So to solve this problem, the operating system implements the system interrupt -SystemCall (although the system interrupt program is in the kernel, but the DPL here is set to 3, so that the user process can access it), and maps the interrupt map table to different interrupt handlers through the systemCall command corresponding to the interrupt map. For example, the write() systemCall corresponds to an interrupt function number of 4 in the interrupt direction table, which can be understood as an implementation of the policy pattern – using the interrupt function number to give systemCall different behavior.

Based on this operating system, the kernel provides an interface – system call for user mode process access.

Five models of IO in Unix

I/O macro image, take network IO as an example:

When the client sends the network packet after forwarding routers and switches to the corresponding service to the network adapter (nic), then the network adapter will receive data through DMA transfers into the kernel, copy from kernel space to user process buffer, then the user process can access process using data buffer.

The five models for doing this are outlined in the Unix Network Programming book, starting with the concepts of blocking and non-blocking, synchronization and non-synchronization:

Blocking and non-blocking, synchronous and non-synchronous

These two sets of nouns are just two different ways of describing the same situation:

(1) Blocking and non-blocking: Blocking and non-blocking are mainly in terms of CPU consumption. Blocking is when the CPU stops and waits for a slow operation to complete before it moves on to something else. Non-blocking is when the CPU is doing something else while the slow operation is executing, and when the slow operation is finished, the CPU can finish the next operation.

(2) synchronization and non-synchronization: synchronization and non-synchronization is mainly from the aspect of the program, synchronization refers to the program issued a function call, no results will not return before. Asynchronous is when a program makes a function call, the program returns, and then notifies the program through a callback mechanism to process it.

Blocking IO model (BIO) :

PS: At this time, the communication parties have established the connection through the three-way handshake, and can exchange data through the socket file.

In this model, the application process will block after recvfrom system call, waiting for the data sent by the sender to reach the network adapter -> kernel -> user process buffer in turn, and then return.

The biggest problem with this model is the typical MISMATCH between CPU speed and peripheral speed in the operating system. The network adapter is extremely slow relative to the CPU speed, but the CPU keeps blocking.

Non-blocking IO model:

In the non-blocking IO model, when a user process initiates a recvFROM system call, if the socket file in the kernel is not ready, then recvFROM will return an error message.

At this point, the CPU can switch other processes so that the processor itself is no longer blocked, and the process keeps getting a slice of CPU time to poll for whether the socket file is ready.

Therefore, although this mode is non-blocking, process switching is very frequent, so the increased CPU usage time and the cost of process context switching still need to be considered. And when the data is ready and the process gets the time slice and calls recvFROM again, the process still has to wait for the data to be copied from the kernel to the user process buffer.

Multiplexing IO model :(Java NIO principles)

The model uses the SELECT system call, which blocks until the IO event (i.e., the socket file in the kernel can be read or written) and then returns. When recvFROM is called, the application process only needs to wait for the data to be copied from the kernel to the user process buffer.

In addition, the select method can register and listen for read and write events on multiple socket files. When associated with Java NIO, multiple threads can register multiple events with the same Selector, thus achieving the effect of multiplexing.

(TODO: Acceptor model)

The select function

The function is defined as follows:

int select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timeval *timeout);
Copy the code

Each socket file corresponds to a file descriptor (fd) in the file system, and the select function maintains a set of read and write file descriptors (fd_set), usually two arrays of integers, where each integer corresponds to a file descriptor. For example, if you use 32-bit integers, Then the first integer in the array represents the file whose file descriptor is O ~31, and the bit is set to indicate whether the select function cares about the read and write events on the file corresponding to the bit.

UNIX provides FD_ZERO | FD_SET | FD_CLR | FD_ISSET these four macros to set the file descriptor is concentrated, such as by FD_SET (4, & readset) set the read file descriptor set the first integer bit for 0000100… , indicates that the current select function cares about the read events of the socket descriptor whose fd is 4; After setting *readset and *writeset, a process call to SELECT blocks until a read/write event is generated on the socket of interest to the process and the ready file descriptor information is returned.

Note: in place of the file descriptor information not return through the int, but through to read and write descriptor concentration did not generate events file descriptor position 0, so the application can know which socket by traversal character set file ready (traverse the character set is very inefficient, this is the very point) epoll need optimization. But this also requires us to reset the set of read/write descriptors we care about each time we call SELECT (and this is one point that Epoll optimizes for).

PS: We keep talking about read events, write events, that is, the socket file can be read and write, so when can the socket file be read and write?

A read event occurs when a socket is readable, that is, the number of data bytes in the socket receive buffer is greater than or equal to the current size of the socket receive buffer low watermark mark;

A write event occurs when a socket is writable and the number of data bytes available in the socket send buffer is greater than or equal to the current size of the socket send buffer low watermark.

And finally, let’s seeredisHow to use select in:

#include <sys/select. H > #include <string. H > #include <string. H > /* We need to have a copy of the fd sets as it's not safe to reuse * FD sets after select(). */ fd_set _rfds, _wfds; } aeApiState; // The initialization method uses the FD_ZERO macro to initialize read/write fd_set, Static int aeApiCreate(aeEventLoop *eventLoop) {aeApiState *state = zmalloc(sizeof(aeApiState)); // Initialize fd_set FD_ZERO(&state-> RFDS); FD_ZERO(&state->wfds); eventLoop->apidata = state; return 0; } // Register the event, Static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) { aeApiState *state = eventLoop->apidata; If (mask & AE_READABLE) FD_SET(fd,&state-> RFDS); If (mask & AE_WRITABLE) FD_SET(fd,&state-> WFDS); return 0; } // Cancel event registration, Static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int mask) { aeApiState *state = eventLoop->apidata; if (mask & AE_READABLE) FD_CLR(fd,&state->rfds); if (mask & AE_WRITABLE) FD_CLR(fd,&state->wfds); } // Polling function, Call the socket descriptor static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) { aeApiState *state = eventLoop->apidata; int retval, j, numevents = 0; /* We need to have a copy of the fd sets as it's not safe to reuse * FD sets after select(). */ memcpy(&state->_rfds,&state->rfds,sizeof(fd_set)); memcpy(&state->_wfds,&state->wfds,sizeof(fd_set)); Retval = select(eventLoop->maxfd+1,&state->_rfds,&state->_wfds,NULL, TVP); If (retval > 0) {for (j = 0; j <= eventLoop->maxfd; j++) { int mask = 0; aeFileEvent *fe = &eventLoop->events[j]; If (fe->mask == AE_NONE) continue; / / if you read the file descriptor is setting in the fd_set marks the if (fe - > mask & AE_READABLE && FD_ISSET (j, & state - > _rfds)) mask | = AE_READABLE; / / if the file descriptor is setting in the fd_set marks the if (fe - > mask & AE_WRITABLE && FD_ISSET (j, & state - > _wfds)) mask | = AE_WRITABLE; eventLoop->fired[numevents].fd = j; eventLoop->fired[numevents].mask = mask; numevents++; } } return numevents; }Copy the code

Epoll function

The epoll function is an optimization of the select function by:

  • The select function needs to reset the set of read and write descriptors it cares about each time it is called, which is a relatively large cost after multiple calls. Because the set of socket descriptors we care about doesn’t change very often, we can use an additional data structure to represent the current set of sockets that epoll is interested in, rather than using a single value-result argument as select does. Therefore, we need to provide a set of add, delete, modify and check methods to operate on this additional data structure, and make epoll rely on the data structure.

  • The select function needs to traverse the read-write character set to know which socket files are ready. So we can avoid traversal by maintaining extra ready queues for read and write events.

This is the main idea and direction of epoll optimization, which can be referred to in this article, limited to the length of this article will not be repeated.

If this article doesn’t explain the nature of epoll, then come and strangle me!

Asynchronous IO (AIO) :

This model uses the asynchronous IO method AIO_read provided by the operating system, and the application returns directly after calling it, and does not need to wait for the data to be copied to memory like the previous models.

But the underlying implementation is quite complex, and the underlying implementation is BIO, which I won’t describe because it doesn’t seem to be of much use to programmers.

Signal driven IO:

Actually general point, AIO and multiplexing IO is also a signal to drive the IO, or don’t need the application block in the network adapter (nic) prepared in the process of data and signals are connected to notify the application, while the signal is implemented or use the select or more at the bottom of the way, But it’s essentially similar; But signal-driven IO also requires threads to wait for data to be copied to user space.

Java BIO:

2.1 introduction:

Note: As defined in “Deep Understanding of Computer Systems”, Linux abstracts all peripherals into files, and the communication with the peripherals is abstracts into the read and write of files; And the network is just one kind of peripheral; The client and server exchange their file descriptors when establishing a connection, and then write values to the socket files corresponding to the two file descriptors


Socket in Java is an abstraction of both ends of the communication, which encapsulates a series of TCP/IP layer of low-level operations; The code is as follows:

  1. Client:
Socket = new Socket("127.0.0.1", 8089); // Create a new Socket object with an IP:PORT Socket and determine the location and PORT of the server to connect to. OutputStream OutputStream = socket.getOutputStream(); PrintWriter PrintWriter = new PrintWriter(outputStream, true); Printwriter. println("GET /index.html HTTP/1.1"); printWriter.println("Host: localhost:8080"); printWriter.println("Connection Close"); printWriter.println();Copy the code
  1. Server:
// ServerSocket Listens for connection events on this socket ServerSocket ServerSocket = new ServerSocket(8089, 1, inetAddress.getByName ("127.0.0.1")); // The server blocks the Accept () method until the client requests connect() and returns a new socket object socket = serverSocket.Accept (); // The server blocks the accept() method until the client requests connect() and returns a new socket object socket = serverSocket.Accept (); InputStream InputStream = socket.getinputStream (); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream)); int i = 0; while (i ! = -1) { i = bufferedReader.read(); Println (" +(char) I); system.out. println(" +(char) I); } socket.close();Copy the code

Java BIO is essentially an encapsulation of the network I/O approach to Unix systems;

2.2 Problems with Java BIO:

We usually use the Acceptor model to create the server, that is, a ServerSocket() is used to create a listening socket to listen to connection requests from clients, and after the connection is established, a child thread will be obtained through the thread pool for logical processing.

And that logic creates a series of problems:

  1. Response time layer: Acceptor is a single-threaded Acceptor, that is, all connection requests are processed sequentially, with ServerSocket passing through backlogThis parameter indicates the number of requests that can be queued before the server rejects the connection request, so such a model predestine BIO performance limitations (queued communication threads may block for a period of time), processing limitations;
  2. Resource consumption: Blocking IO is a natural problem, that is, one thread for one connection, so the resource requirement is relatively high;
  3. Special application scenarios, such as when multiple threads need to share resources, whereas in the BIO model each thread does not share resources.

Java NIO:

3.1 Compared with BIO, what has changed and why?

The difference with BIO is:

  1. Java NIO implements a multiplexed IO model that manages multiple connections through a single Selector thread. This solves one of BIO’s most deadly problems — one thread for one connection.

  2. Unlike channels In both In/OutputStream and Java NIO, which are essentially abstractions of network I/O files, channels are dual-channel and can be both read and written.

So according to the I/O multiplexing model, when the data in the channel is ready, it will return a readable event, and it will be processed through the selector, and the corresponding Socket will be arranged to read the corresponding data. This is a data readable event, and there are four kinds of events that the selector can listen to:

OP_ACCEPT // Receive event selectionKey. OP_READ // Data can be read event selectionKey. OP_WRITE // Write eventCopy the code
  1. Why introduce buffers?In BIO we usually go through things likesocket.getInputStream.write() Method to read and write directly, while NIO is mediumchannelThe data to be written to must be obtained from the bufferchannel Data can only be written to buffers, which makes such operations more similar to how the operating system performs I/O. To be more specific, because it’s in the direction ofOutputStream of the write ()The data is in the Socket object to the receiverInputStream RecvQ queue, while if write()If the data in the queue is larger than the limited length of each data object in the queue, it needs to be split, and this process, we can not control, and involves the user space and kernel space address translation; But when we use a Buffer, we can control the length of the Buffer, whether to expand and how to expand it.

Reference article: www.ibm.com/developerwo…

3.2 Let’s look at an example code (server) :

/** * @CreatedBy:CVNot * @Date:2020/2/21/15:30 * @Description: */ public class NIOServer {public static void main(String[] args) {try {// Create a multiplexer Selector  = Selector.open(); // Create a ServerSocket channel, ServerSocketChannel = ServerSocketChannel.open().bind(new InetSocketAddress(8080)); / / set to non-blocking serverSocketChannel. ConfigureBlocking (false); / / listen to receive data event serverSocketChannel. Register (selector, SelectionKey. OP_ACCEPT); while (true){ selector.select(); Set keys = selection.selectedKeys (); Iterator iterator = keys.iterator(); while (iterator.hasNext()){ SelectionKey selectionKey = (SelectionKey)iterator.next(); iterator.remove(); // Since we only registered the ACCEPT event, So here only write when the connection is in a state of the processor if (selectionKey. IsAcceptable ()) {/ / to produce the event channel ServerSocketChannel serverChannel = (ServerSocketChannel)selectionKey.channel(); SocketChannel clientChannel = serverChannel.accept(); clientChannel.configureBlocking(false); // Register a read event for the channel clientChannel.register(selectionKey.selector(), selectionkey.op_read); } else if(selectionKey.isReadable()){ SocketChannel clientChannel = (SocketChannel)selectionKey.channel(); ByteBuffer byteBuffer = ByteBuffer.allocate(1024); long bytesRead = clientChannel.read(byteBuffer); while(bytesRead > 0){ byteBuffer.flip(); System.out.printf(" data from client "+ new String(byteBuffer.array())); byteBuffer.clear(); bytesRead = clientChannel.read(byteBuffer); } byteBuffer.clear(); Bytebuffer.put (" Hello client ".getBytes("UTF-8")); byteBuffer.flip(); clientChannel.write(byteBuffer); } } } } catch (IOException e) { e.printStackTrace(); }}}Copy the code

Client:

/** * @CreatedBy:CVNot * @Date:2020/2/21/16:06 * @Description: */ public class NIOClient { public static void main(String[] args) { try { Selector selector = Selector.open(); SocketChannel clientChannel = SocketChannel.open(); clientChannel.configureBlocking(false); clientChannel.connect(new InetSocketAddress(8080)); clientChannel.register(selector, SelectionKey.OP_CONNECT); While (true) {// block selector. Select () if the event does not arrive; Iterator<SelectionKey> iterator = selector.selectedKeys().iterator(); while (iterator.hasNext()) { SelectionKey key = iterator.next(); iterator.remove(); If (key.isconnecTable ()) {/** * Connect to server successfully ** Then register the OP_READ event for clientChannel */ clientChannel = (SocketChannel) key.channel(); if (clientChannel.isConnectionPending()) { clientChannel.finishConnect(); } clientChannel.configureBlocking(false); ByteBuffer byteBuffer = ByteBuffer.allocate(1024); byteBuffer.clear(); Bytebuffer.put (" Hello server, I'm a client ".getBytes(" utF-8 ")); byteBuffer.flip(); clientChannel.write(byteBuffer); clientChannel.register(key.selector(), SelectionKey.OP_READ); } else if (key.isreadable ()) {clientChannel = (SocketChannel) key.channel(); ByteBuffer byteBuffer = ByteBuffer.allocate(1024); long bytesRead = clientChannel.read(byteBuffer); while (bytesRead > 0) { byteBuffer.flip(); System.out.println("server data: "+ new String(byteBuffer.array())); byteBuffer.clear(); bytesRead = clientChannel.read(byteBuffer); }} else if (key.iswritable () && key.isValid()) {// the channel can write data}}}} catch (IOException e) {e.printStackTrace(); }}}Copy the code

3.3 Summarize the process with a picture: