The network IO process includes many layers, such as computer composition, network communication, operating system, application API, etc.

This discussion will only cover the basics of network IO above the operating system level.

View network IO from the operating system level

socket

Socket consists of five elements: communication protocol, client IP, client port, server IP, and server port. It can be understood as a layer of abstraction from the application layer to the transport layer. The operating system provides many socket-related system calls for applications to facilitate network communication.

You can view network connections using the netstat command

Netstat -natp # Check network connectionCopy the code

If you have used curl www.baidu.com, you will see the five elements of the socket as shown in the following figure:

File descriptor

The operating system maps socket connections to -> file descriptors (FDS), converts reads and writes on sockets to reads and writes on FDS, and processes input and output. Creating a socket in Linux and using fd to read and write:

#Set up socket connection with Baidu and give its read and write to file descriptor 88 < > / exec dev/tcp/www.baidu.com/80 # 8 is the file descriptor, < > on behalf of the input and output flows, by the kernel to establish a socket connectionCopy the code
Send TCP data from the application layer to the transport layer. Socket is an abstraction from application layer to transport layer
echo -e "GET/HTTP / 1.1 \ n"1 > & 8Copy the code
#The input comes from the file descriptor 8
cat 0<& 8 
Copy the code

After executing the third command, you will get the corresponding content of baidu home page

Server and client

C1, C2 and C3 are three clients, and Server is a Server. The process of establishing connections and reading and writing data is as follows:

  1. The Server starts, creates a socket, binds the address, and gets an S-FD Server file descriptor
  2. The client uses the Server socket address to make a TCP three-way handshake connection. After the connection succeeds, the client generates a file descriptor representing the socket connection (c1-fd, C2-FD, and C3-FD on the client).
  3. After the connection is successful, a file descriptor representing the socket connection of the client is generated on the Server (C1-fd, C2-FD, c3-FD of the Server in the figure).
  4. The client and server send and receive data through socket connection (c1-fd, C2-FD, c3-FD read and write)

Network IO phase

From the perspective of CPU work, the network IO reading process is roughly divided into two stages

  1. Data is read from the nic into the kernel buffer; (You need to initiate THE IO request and wait for the data to be ready)
  2. Copy data from the kernel buffer to user space

Similarly, the process of writing data may need to wait to be ready because the buffer memory highlighted in red is full.

To sum up, network IO differs from local file IO in that it may require a waiting process to read and write. This helps to understand the concept of blocking when writing network IO programs.

From Java network IO programs to system calls

The operating system provides a series of system calls for application programs to establish socket connections and read and write data. Specifically, these categories are included:

Socket # create socket bind # server bind address listen # accept # Server receive client connection recvfrom/read # read dataCopy the code

Here’s an understanding of the network IO process from several Java applications, combined with the system calls they make at run time.

BIO

BIO, the short for Blocking IO, refers to the implementation of the operating system level of the BIO Server through the implementation of a Java program.

The program

// BIOServer.java
ServerSocket serverSocket = new ServerSocket(8081);
while (true) {// Accept connections and block until there are connections
  final Socket socket = serverSocket.accept();
  // new Thread(()->{
    try {
      InputStream inputStream = socket.getInputStream();
      while (true) {byte[] bytes = new byte[1024];
        // Read data from the socket connection and block until there is data to read
        if (inputStream.read(bytes) > 0){
          System.out.println(String.format("got message: %s".newString(bytes))); }}}catch (IOException e) {
      e.printStackTrace();
    }
  // }).start();
}
Copy the code

Start the

Above is a simple Java BIO program that can be executed while using the strace command to see the system calls made while the program is running (under Linux)

Strace -ff -o out Java BIOServer # see which calls are made to the kernel by the thread of the processCopy the code

After follow the above orders, we will get several out as prefix of the file, the file on behalf of the program is running different threads in the process of the creation of a system call, in the main thread can be found in the printed by the system call that we mentioned above a few key system calls (actually prints a lot of system calls, list a few key) :

Socket (AF_INET, SOCK_STREAM, IPPROTO_IP) = 7 Bind (7, {sa_family=AF_INET, sin_port=htons(8081), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 # bind address, we can see the file descriptor returned by the socket system call, we can also see the port 8081 in our program, address 0.0.0.0 means that this is a server, Listen (7, 50) = 0 # listen allows other clients to connect. Actually represents the allowed to wait for the TCP three-way handshake client queue length poll ([{fd = 7, events = POLLIN | POLLERR}], 1, 1 # corresponding we accept method calls, in the program blocks here, because our program has just started, not the client to connectCopy the code

These key system calls are provided by the Linux operating system at the system level for applications to create a Server. Linux Man Pages provides detailed documentation for each system call. For example, the meaning of the second parameter to the LISTEN system call is described in the official documentation:

int listen(int sockfd, int backlog);

The backlog argument defines the maximum length to which the queue of
pending connections for sockfd may grow.  If a connection request
arrives when the queue is full, the client may receive an error with
an indication of ECONNREFUSED or, if the underlying protocol supports
retransmission, the request may be ignored so that a later reattempt
at connection succeeds.
Copy the code

The connection

As mentioned above, the current program is blocked at the accept method, we use Telnet to try to connect:

telnet localhost 8081 
Copy the code

The poll system call that blocked the poll system call was executed. Here are a few key system calls for analysis:

Poll ([{fd = 7, events = POLLIN | POLLERR}], 1, 1) = 1 ([= {fd = 7, revents POLLIN}]) # blocking call returns 1 accept (7, {sa_family = AF_INET, Sin_port =htons(58972), sin_addr=inet_addr("127.0.0.1")}, [16]) = 8 The client connection representing the server recvfrom(8, # reads data from file descriptor 8 (client connection), blockingCopy the code

Since we only connected successfully and did not send data, the main thread of the Java program blocks at the read method, the system call recvfrom blocks, and the system call print stops.

Read and write

Next, send the data aAAA at the client connection and look at the system call:

recvfrom(8, "aaaa\r\n", 1024, 0, NULL, NULL) = 6
Copy the code

As you can see, recvFROM completes the block and successfully reads the data we sent on the client side. The return value is the number of bytes read. After this step, the operating system has copied the read data into the memory area of the Java application, that is, into the byte array object in Java.

The problem

BIO program is relatively simple to write, simple to realize the server to accept the connection, read data and other functions. However, through the above analysis, it can be seen that no matter at the application level or the operating system level, the program has the problem of thread blocking, and there are two blocks of receiving connections and reading data respectively. If our program had only one main thread, we would find that we could handle only one client connection, because the server does not know when the client is sending data and blocks at the read method (the system-level recvfrom system call) even when there is no data.

To enable the Server to accept and process multiple client connections, one solution is to create a thread for each client connection, which eliminates the blocking problem. However, today’s server applications generally have high requirements for client concurrency. If you create a thread for each client connection, you must create many threads, and threads are very valuable resources. Such programming wastes thread resources, and the concurrency can not meet the requirements.

You can see that the problem is blocking, and if you solve the blocking problem, you can have a solution with a small number of threads handling a large number of concurrent tasks.

Both at the operating system level and at the Java API level, we developers already have support for non-blocking IO. Here is an analysis of a NIO program.

NIO

NIO, which stands for New IO in Java, refers to the New IO operation mode (based on pipes and buffers); At the operating system level, it refers to non-blocking I/O. The new IO in Java also provides a way to program non-blocking IO. The following program analyzes how a NIO Server program is implemented by following the same process as above:

The program

This is a simple non-blocking NIO Server program, but it is more difficult to program than the BIO Server program. You can compare this program with the BIO Server program we analyzed earlier. The necessary steps, such as opening the Server, binding the address, accepting the connection, and reading the data, still exist in one-to-one correspondence. Changes are just API level changes.

However, this program is different from the BIO Server program in that when the Server opens the connection and receives the connection to the client, it calls the configureBlocking(false) method to set it to non-blocking. After accepting the connection accept and read, it determines whether the connection is accepted or whether the data is read. This is the big difference between non-blocking and blocking.

// NIOSimpleServer.java
// Save the client connection
List<SocketChannel> socketChannelList = new ArrayList<>();
// Enable the server connection
ServerSocketChannel serverSocketChannel = ServerSocketChannel.open();
serverSocketChannel.bind(new InetSocketAddress("localhost".8080));
// Set it to non-blocking
serverSocketChannel.configureBlocking(false);

// In an infinite loop, first try to accept the connection, then save the connection, and process the existing connection after each loop
while (true){
  TimeUnit.MILLISECONDS.sleep(1000);
  // Since it is not blocking, this step will return immediately regardless of whether the client is connected, so the next step needs to determine whether null is returned
  SocketChannel socketChannel = serverSocketChannel.accept();
  if(socketChannel ! =null) {
    System.out.println("connect success");
    // Set it to non-blocking
    socketChannel.configureBlocking(false);
    socketChannelList.add(socketChannel);
  }
  for (SocketChannel socketChannel1 : socketChannelList) {
    ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
    // Read data from the client connection, because it is set to non-blocking, this step will return immediately regardless of whether there is any data to read, so the next step needs to determine whether data was read
    int read = socketChannel1.read(byteBuffer);
    if (read > 0) {
      byteBuffer.flip();
      System.out.println(String.format("got message: %s".newString(byteBuffer.array()))); }}}Copy the code

Start the

Similarly, we use the strace command to start the NIOSimpleServer program and, while executing it, look at the system calls made while the program is running

strace -ff -o out java NIOSimpleServer
Copy the code

Also, the key system calls we mentioned above can be found in the system calls printed by the main thread:

Socket (AF_INET6, SOCK_STREAM, IPPROTO_IP) = 4 Bind (4, {sa_family=AF_INET6, sin6_port=htons(8080), sin6_flowInfo =htonl(0), Inet_pton (AF_INET6, ":: FFFF :127.0.0.1", &SIN6_addr), sin6_scope_id=0}, 28) =0 Listen (4, 50) = 0 # listen Actually represents the allows for TCP three-way handshake client queue length is an FCNTL (4, F_SETFL, O_RDWR | O_NONBLOCK) = 0 # set non-blocking socket connection accept (4, 0 x7f56140f5420, [28]) = 1 EAGAIN (accept is temporarily unavailable) Accept (4, 0x7F56140F5420, [28]) = -1 EAGAIN accept(4, 0x7F56140F5420, [28]) = -1 EAGAIN (resource temporarily unavailable)... If there is no client connection, the accept loop is consistent, and -1 is returned each time, indicating no connectionCopy the code

It can be seen that the system call involved in the startup process of a non-blocking IO program is basically the same as that of a blocking IO program, except that the poll that blocks is no longer called. Instead, the accept that accepts the connection is directly called. If there is no connection, -1 will be returned. A consistent loop makes the accept system call.

The connection

Also, use Telnet to try to connect:

telnet localhost 8080
Copy the code

The system call print for the main thread is as follows: accept (1); accept (-1);

Accept (4, {sa_family=AF_INET6, SIN6_port =htons(40896), SIN6_flowInfo =htonl(0), inET_pton (AF_INET6, ":: FFFF :127.0.0.1", Sin6_addr), sin6_scope_id=0}, [28]) = 5 Represents a server client connect an FCNTL (5, F_SETFL, O_RDWR | O_NONBLOCK) = 0 # set channel nonblocking accept (4, 0 x7f56140df710, [28] read(5, 0x7F56140F6440, 1024) = -1 EAGAIN Accept (4, 0x7F56140DF710, [28]) = -1 EAGAIN (Resource temporarily Unavailable) Read (5, 0x7F56140F6440, 1024) = -1 EAGAIN (resource temporarily Unavailable)...Copy the code

Since we only connected successfully, we did not send data, so every time we read, the program returns -1. There is no data to read, but because we set it to non-blocking, no data to read does not cause the thread to block and stop. Instead, the thread repeats accept (receiving new connections) and read (reading data from existing connections).

Read and write

Next, send data AAA at the client connection and look at the system call:

Read (5, "aaa\r\n", 1024) = 5 Accept (4, 0x7F5614129d80, [28]) = -1 EAGAIN Accept (4, 0x7F5614129d80, [28]) = -1 EAGAIN Accept (4, 0x7F5614129d80, [28]) = -1 EAGAIN Read (5, 0x7F56140f6440, 1024) = -1 EAGAIN (Resource temporarily unavailable)...Copy the code

You can see that one of the reads reads the sent data AAA and returns the number of bytes read. At the same time, as soon as the data is read, the system calls to accept and read from the client are repeated in an infinite loop, which proves that the program is not blocking.

Here you’ll see that the underlying support for us from the operating system, the system call functions, is essentially the same whether we’re blocking IO or non-blocking IO. The difference is whether the FCNTL system call makes the socket connection non-blocking at the time of creating the socket connection and generating the file descriptor. If this is set to non-blocking, the operating system will not block when the application calls Accept or read, and will return -1 if there is no data, indicating that the resource is temporarily unavailable. If there is data, the operating system will return a non-negative number, indicating the file descriptor (Accept) or the number of bytes read (read). Writing data to the socket is not demonstrated here, but in an analogy, a write system call writes data to the socket and returns the number of bytes that have been written.

The problem

With this set of operations, we can see that we have solved some of the problems of the BIO Server program, although we have slightly increased the coding complexity. We now use only one thread, which can not only receive connections continuously, but also handle read and write data from multiple client connections.

However, there are some problems with this program. The first problem is that we use non-blocking, so that the thread basically needs to use the CPU constantly, regardless of whether there is a new connection, whether there is data to process, every time the loop polls all socket connections. As a result, we don’t waste more thread resources, but we waste a lot of CPU resources.

The second problem is that we’re storing connected clients in a set, and the set gets bigger and bigger as we add more clients to the set. Each time we loop we need to iterate over the set, call socketChannel’s read to try to read, and this read operation involves the system call read, and once there’s a system call, This involves switching between user – and kernel-mode cpus, which has a performance cost.

To solve these two problems, IO multiplexing can be used.

IO multiplexing

IO multiplexing can implement a system call, by the operating system kernel to check the status of multiple socket connections, and can set the thread to block until the concerned event occurs on the socket connection, which solves the previous NIO program two problems.

The OS provides three system calls select, poll, and epoll to implement I/O multiplexing.

select

Previous non-blocking Java programs stored client connections in collections, traversed at the application level trying to read data, and looped through system calls. The SELECT system call can be understood as putting the step of traversing to check whether there are readable/writable events in the operating system kernel, and it can block the timeout event until the event of interest occurs, which improves the performance.

NFDS = fd maximum + 1; readfds, Writefds, ExcepTFDS = file descriptors of read events, write events and exception events; timeout = block time; The return value is an integer representing the number of file descriptors with events; The specific event is obtained in the corresponding fd_set
int select(int nfds, fd_set *readfds, fd_set *writefds,
                  fd_set *exceptfds, struct timeval *timeout); 
Copy the code

Select uses I/O multiplexing to some extent to solve two problems with non-blocking I/O programs, but it also has some problems:

  • Call directly passed infd_setListen to the descriptor collection,byFD_SETSIZElimitIn Linux, the default value is 1024.
  • Block after the call, traverse the descriptor set, find ready, as the listening file descriptor is more, the efficiency will decrease;

poll

The poll system call works in much the same way as select, passing in the relevant file descriptors and events, and polling by the kernel to return whether an event occurred and set the event in the parameters passed in.

Poll solves the problem of limiting the number of listening file descriptors for SELECT because instead of using fd_set, structure arrays are used.

But as illustrated in the figure above, poll still needs to pass in all of the file descriptors it cares about each time it is called, and the kernel does the traversal checking event.

FDS is an array of structures
int poll(struct pollfd *fds, nfds_t nfds, int timeout);

struct pollfd {
    int fd; /* File descriptor to listen on */
    short events; /* Current events to watch, if negative, will not detect */
    short revents; /* Returned Events Witnessed, events that occurred */
};
Copy the code

epoll

Epoll is also an I/O multiplexing function provided by the operating system. It solves the same problems as SELECT and poll, but is more powerful than the latter two. Here’s how epoll works with the above figure and the definitions of several epoll functions:

int epoll_create(int size); // create a fd to manage the size of FDS to listen on;
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); // Listen for a fd in a red-black tree. Once ready, the kernel uses a callback mechanism similar to callback to activate the descriptor and be notified when epoll_wait is called
int epoll_wait(int epfd, struct epoll_event *events,
                      int maxevents, int timeout); // Query a wave of events
Copy the code

Epoll contains epoll_create, epoll_ctl, and epoll_wait functions

  1. First callepoll_createTo create aepollInstance that returns a file descriptorepfdThis file descriptor will be used to manage file descriptors that need to listen for events
  2. useepoll_ctltoepfdRegister the file descriptor (socket connection) to listen on,epoll_eventContains file descriptors to listen on, events to listen on, configuration items, etc. After this step,epollThe file descriptor to be listened on is added to a red-black tree, and when the event is ready, the file descriptor is placed into another ready set
  3. Application callepoll_wait, directly returns the ready set in 2

Epoll has the following characteristics:

  • Listen to thefdThe number of files is basically unlimited. The upper limit is the number of open files
  • IO efficiency will not be monitoredfdThe number increases and decreases without traversal, taking eachfdCallback notification mode
  • epoll_waitReplication is not required when querying for ready eventsfdBecause the previous oneepoll_ctlHas been registered in

In addition, epoll supports some configurations for event readiness, such as edge-triggered, and epoll has two ways of handling notification when an event is ready

  • LT (level-triggered) : This function is triggered when no operation is triggered. (Select, Poll, or Java NIO)

  • ET (edge-triggered) mode: triggered only once; This mode is supported only by epoll. In this mode, if, for example, 2M data is ready but 1M data is read by an application, the next time you call epoll_wait, the file descriptor will not appear in the ready set, and the rest of the data will be read by the application itself (until a special flag is read to indicate the end), which nginx uses

Multiplexing program

Java NIO provides multiplexing support, mainly using the Selector class with several classes in non-blocking IO program encoding, the following is a multiplexing program main code:

// NIOServer.java
Selector selector = Selector.open();
// Omit duplicate code in non-blocking IO programs
serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT);
// Query events
selector.select();
Set<SelectionKey> selectionKeys = selector.selectedKeys();
// Handle client connection events
if (selectionKey.isAcceptable()) {
    SocketChannel channel = ((ServerSocketChannel) selectionKey.channel()).accept();
    channel.configureBlocking(false);
    SelectionKey readKey = channel.register(selector, SelectionKey.OP_READ);
} else if (selectionKey.isReadable()) {
    // Handle read data events
    ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
    SocketChannel channel = (SocketChannel) selectionKey.channel();
    channel.read(byteBuffer);
    byteBuffer.flip();
    System.out.println(String.format("got message: %s".new String(byteBuffer.array())));
}
Copy the code

Similar to the previous analysis, we use Strace to start the program and connect, and we can see the key system calls as follows:

Bind (7, {sa_family=AF_INET6, {sa_family=AF_INET6, Sin6_port =htons(8080), sin6_flowInfo =htonl(0), inet_pton(AF_INET6, ":: FFFF :127.0.0.1", &SIN6_addr), sin6_scope_id=0}, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=362164703394267143}}) = 0 epoll_wait(6, [{EPOLLIN, {u32=7, u64=362164703394267143}}], 8192, Accept (7, {sa_family=AF_INET6, sin6_port=htons(37078), sin6_flowInfo =htonl(0), inet_pton(AF_INET6, ":: FFFF :127.0.0.1", &SIN6_addr), sin6_scope_id=0}, [28]) =8 U64 =352448473758433288}}) = 0 # read(8, "aaaa\r\n", 1024) = 6 # Read data epoll_wait(6, # next loop blockCopy the code

You can see that the Selectors in Java use the ePoll IO multiplexing pattern provided by the operating system.

conclusion

  • Firstly, this paper introduces the performance of network IO on the level of operating system, and understands it through an examplesocketRelationships such as connections and file descriptors; According to the process of network IO, the characteristics of network IO are introduced.
  • Then the main running process of blocking IO (BIO) operating system is analyzed according to Java network IO program, and the running process of non-blocking IO (NIO) is also analyzed according to the problems existing in BIO.
  • Finally, the IO multiplexing technology provided by the operating system level is introduced, and how to use Java multiplexing program is analyzedepollthe