Introduction: In this paper, from the perspective of the actual call of the operating system (taking CentOS Linux Release 7.5 as an example), we try to trace the root to see what happened in each step of the IO operation.

The authors | | solid source ali technology to the public

preface

In this paper, from the actual call point of view of the operating system (taking CentOS Linux Release 7.5 as an example), we try to trace back to the root to see what happened in each step of IO operation.

On how to view system calls, Linux can use Strace to view system calls of any software (a good way to analyze learning) : strace -ff -o./out Java TestJava

A BIO

/** * Alipay.com Inc. Copyright (c) 2004-2020 All Rights Reserved. */ package io; import java.io.*; import java.net.ServerSocket; import java.net.Socket; /** * @author xiangyong.ding * @version $Id: TestSocket.java, V 0.1 2020 08月02日 20:56 xiangyong. Ding Exp $*/ public class BIOSocket {public static void main(String[] args) throws IOException { ServerSocket serverSocket = new ServerSocket(8090); System.out.println("step1: new ServerSocket "); while (true) { Socket client = serverSocket.accept(); System.out.println("step2: client\t" + client.getPort()); new Thread(() -> { try { InputStream in = client.getInputStream(); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); while (true) { System.out.println(reader.readLine()); } } catch (IOException e) { e.printStackTrace(); } }).start(); }}}Copy the code

1 The system call that occurs

At the start

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 5 bind(5, {sa_family=AF_INET, sin_port=htons(8090), Sin_addr = inet_addr (" 0.0.0.0 ")}, 16) = 0, listen (5, 50) = 0 poll ([{fd = 5, events = POLLIN | POLLERR}], 1, 1) = 1 ([{fd = 5, revents=POLLIN}])Copy the code

The poll function blocks until an event occurs on either fd.

After the client is connected

Accept (5, {sa_family = AF_INET, sin_port = htons (10253), the sin_addr = inet_addr (" 42.120.74.252 ")}, [16]) = 6 clone(child_stack=0x7f013e5c4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7f013e5c59d0, tls=0x7f013e5c5700, child_tidptr=0x7f013e5c59d0) = 13168 poll([{fd=5, events=POLLIN|POLLERR}], 1, -1Copy the code

After the Thread is thrown (that is, the new Thread() in our code), the poll block continues to wait for the connection.

Clone out the thread

recvfrom(6, "hello,bio\n", 8192, 0, NULL, NULL) =
Copy the code

With regard to the recvfrom function, the fourth argument, 0, indicates that this is a blocking call.

After the client sends data

recvfrom(6, "hello,bio\n", 8192, 0, NULL, NULL) = 10
Copy the code

2 the advantages and disadvantages

advantages

Simple code, clear logic.

disadvantages

  • Since the read operation of a stream is a blocking read, each connection needs to be per-thread for multiple connections. Unable to handle large numbers of connections (C10K problem).
  • Myth: Poll is also a relatively efficient multiplexing function (non-blocking, multiple sockets checking events at the same time). However, it is limited by the JDK’s original STREAM API and cannot support non-blocking reads.

NIO (Non Block)

Improvements: Use NIO API to make blocking non-blocking without requiring a large number of threads.

/** * Alipay.com Inc. Copyright (c) 2004-2020 All Rights Reserved. */ package io; import java.io.IOException; import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; import java.util.LinkedList; /** * @author xiangyong.ding * @version $Id: NioSocket.java, V 0.1 2020 08月09日 11:25 xiangyong. Ding Exp */ public class NIOSocket {private static LinkedList< SocketChannel> clients = new LinkedList<>(); private static void startClientChannelHandleThread(){ new Thread(() -> { while (true){ ByteBuffer buffer = ByteBuffer.allocateDirect(4096); Int num = 0; for (SocketChannel C: clients) {SocketChannel C: clients) {int num = 0; try { num = c.read(buffer); } catch (IOException e) { e.printStackTrace(); } if (num > 0) { buffer.flip(); byte[] clientBytes = new byte[buffer.limit()]; // Read from buffer to memory buffer.get(clientBytes); System.out.println(c.socket().getPort() + ":" + new String(clientBytes)); // Clear the buffer buffer.clear(); } } } }).start(); } public static void main(String[] args) throws IOException {//new socket, enable listening ServerSocketChannel socketChannel = ServerSocketChannel.open(); socketChannel.bind(new InetSocketAddress(9090)); / / set the blocked. Accept the client connected socketChannel configureBlocking (true); / / client processing threads startClientChannelHandleThread (); While (true) {// Accept client connection; Non-blocking, none Client returns NULL (operating system returns -1) SocketChannel client = socketchannel.accept (); if (client == null) { //System.out.println("no client"); } else {// Set read non-blocking client.configureblocking (false); int port = client.socket().getPort(); System.out.println("client port :" + port); clients.add(client); }}}}Copy the code

1 The system call that occurs

The main thread

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 bind(4, {sa_family=AF_INET, sin_port=htons(9090), Sin_addr = inet_addr (" 0.0.0.0 ")}, 16) = 0 listen an FCNTL (4, 50) = 0 (4, F_SETFL, O_RDWR | O_NONBLOCK) = 0, the accept (4, 0x7fe26414e680, 0x7fe26c376710) = -1 EAGAIN (Resource temporarily unavailable)Copy the code

After the connection, the child thread

read(6, 0x7f3f415b1c50, 4096)           = -1 EAGAIN (Resource temporarily unavailable)
read(6, 0x7f3f415b1c50, 4096)           = -1 EAGAIN (Resource temporarily unavailable)
...
Copy the code

Resource Usage:

2 the advantages and disadvantages

advantages

The number of threads is greatly reduced.

disadvantages

The program itself is required to scan each connection read, O(n) time complex system calls are required (only one connection may send data at this point), and high frequency system calls (resulting in CPU user-kernel switching) are required. CPU consumption is high.

Triple multiplexer (SELECT, Poll, epoll)

Improvements: Instead of requiring users to scan all connections, the kernel tells which connections have data, and the application reads data from those connections.

1 epoll

import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.channels.SelectionKey; import java.nio.channels.Selector; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; import java.util.Iterator; import java.util.LinkedList; import java.util.Set; /** ** socket ** @author xiangyong. Ding * @version $Id: MultiplexingSocket.java, V 0.1 August 9, 2020, when xiangyong, ding Exp $* / public class MultiplexingSocket {static ByteBuffer buffer = ByteBuffer.allocateDirect(4096); public static void main(String[] args) throws Exception { LinkedList< SocketChannel> clients = new LinkedList<>(); ServerSocketChannel socketChannel = ServerSocketChannel.open(); socketChannel.bind(new InetSocketAddress(9090)); / / set non-blocking, accept the client socketChannel. ConfigureBlocking (false); Select /poll/epoll/kqueue) Selector = Selector. Open (); / / Java proxy automatically, the default is epoll / / Selector. The Selector = PollSelectorProvider provider () openSelector (); Socketchannel. register(selector, selectionkey.op_accept); While (selector. Select () > 0) {system.out.println (); Set< SelectionKey> selectionKeys = selector.selectedKeys(); Iterator< SelectionKey> iter = selectionKeys.iterator(); while (iter.hasNext()) { SelectionKey key = iter.next(); iter.remove(); If (key.isacceptable ()) {// Accept client connections; Non-blocking, none Client returns NULL (operating system returns -1) SocketChannel client = socketchannel.accept (); // Set read non-blocking client.configureblocking (false); Register (selector, selectionkey.op_read); System.out.println("new client : " + client.getRemoteAddress()); Else if (key.isreadable ()) {readDataFromSocket(key); } } } } protected static void readDataFromSocket(SelectionKey key) throws Exception { SocketChannel socketChannel = (SocketChannel) key.channel(); Int num = socketchannel.read (buffer); int num = socketchannel.read (buffer); if (num > 0) { buffer.flip(); byte[] clientBytes = new byte[buffer.limit()]; // Read from buffer to memory buffer.get(clientBytes); System.out.println(socketChannel.socket().getPort() + ":" + new String(clientBytes)); // Clear the buffer buffer.clear(); }}}Copy the code

The system call that occurs

Start the

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 bind(4, {sa_family=AF_INET, sin_port=htons(9090), Sin_addr =inet_addr("0.0.0.0")}, 16) = 0 listen(4, 50) FCNTL (4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 epoll_create(256) = 7 epoll_ctl(7, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5, u64=4324783852322029573}}) = 0 epoll_ctl(7, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=4, u64=158913789956}}) = 0 epoll_wait(7Copy the code

The instructions for epoll_create (corresponding to Java’s Selector = select.open ()) essentially create an epoll data structure in an operating system reservation of memory. It is used to add listeners to the epoll area when there is a client connection.

Have a connection

epoll_wait(7,[{EPOLLIN, {u32=4, u64=158913789956}}], 8192, -1) = 1 accept(4, {sa_family=AF_INET, sin_port=htons(29597), Sin_addr = inet_addr (" 42.120.74.252 ")}, [16]) = 8 an FCNTL (8, F_SETFL O_RDWR | O_NONBLOCK) = 0 epoll_ctl (8, 7, EPOLL_CTL_ADD, {EPOLLIN, {u32=8, u64=3212844375897800712}}) = 0Copy the code

About epoll_ctl (corresponding to Java client.register(selector, selectionkey.op_read)). Where EPOLLIN corresponds exactly to Java’s selectionKey. OP_READ, which listens for data arrival read events.

The client sends data

epoll_wait(7,[{EPOLLIN, {u32=8, u64=3212844375897800712}}], 8192, -1) = 1
read(8, "hello,multiplex\n", 4096)      = 16
epoll_wait(7,
Copy the code

Note: epoll_wait The fourth parameter -1 indicates a block.

Poll compares with epoll

The comparison between the poll function call and the epoll function in “1.BIO” is as follows:

Poll and epoll are essentially synchronous I/OS, and differ from BIO in that multiplexing substantially reduces system calls, while ePoll further reduces the time complexity of system calls.

3 the advantages and disadvantages

advantages

  • The number of threads is also small, and you can even use the same acceptor thread as the worker thread.
  • The time complexity is low. The Selector implemented in Java (the epoll function used in Linux OS) supports the one-time acquisition of multiple clientChannel events, and the time complexity is maintained at O(1).
  • Low CPU usage: thanks to selectors, we don’t have to manually check for events in each ClientChannel that we need in “2.nio”, thus greatly reducing CPU usage.

disadvantages

  • Data processing trouble: Currently socketchannel.read reads data entirely bytes-based. When we need to act as an HTTP gateway, we need to parse the HTTP protocol ourselves, which is a huge, messy, error-prone task.
  • Performance Existing socket data reads (socketchannel.read (buffer)) are all received through a buffer buffer. Once more connections are made, this is a single thread read, and performance is an issue. What if the buffer is new every time we read it? If it were new every time, such memory fragmentation would be a disaster for the GC. Balancing buffer sharing to ensure both performance and thread-safety is a challenge.

Four Netty

1 research target source code (Netty provides a primer example)

TelnetServer

package telnet; import io.netty.bootstrap.ServerBootstrap; import io.netty.channel.EventLoopGroup; import io.netty.channel.nio.NioEventLoopGroup; import io.netty.channel.socket.nio.NioServerSocketChannel; import io.netty.handler.logging.LogLevel; import io.netty.handler.logging.LoggingHandler; import io.netty.handler.ssl.SslContext; import io.netty.handler.ssl.SslContextBuilder; import io.netty.handler.ssl.util.SelfSignedCertificate; /** * Simplistic telnet server. */ public final class TelnetServer { static final boolean SSL = System.getProperty("ssl") ! = null; static final int PORT = Integer.parseInt(System.getProperty("port", SSL? "8992", "8023")); public static void main(String[] args) throws Exception { // Configure SSL. final SslContext sslCtx; if (SSL) { SelfSignedCertificate ssc = new SelfSignedCertificate(); sslCtx = SslContextBuilder.forServer(ssc.certificate(), ssc.privateKey()).build(); } else { sslCtx = null; } EventLoopGroup bossGroup = new NioEventLoopGroup(1); EventLoopGroup workerGroup = new NioEventLoopGroup(); try { ServerBootstrap b = new ServerBootstrap(); b.group(bossGroup, workerGroup) .channel(NioServerSocketChannel.class) .handler(new LoggingHandler(LogLevel.INFO)) .childHandler(new TelnetServerInitializer(sslCtx)); b.bind(PORT).sync().channel().closeFuture().sync(); } finally { bossGroup.shutdownGracefully(); workerGroup.shutdownGracefully(); }}}Copy the code

TelnetServerHandler

package telnet;

import io.netty.channel.ChannelFuture; import io.netty.channel.ChannelFutureListener; import io.netty.channel.ChannelHandler.Sharable; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.SimpleChannelInboundHandler; import java.net.InetAddress; import java.util.Date; /** * Handles a server-side channel. */ @Sharable public class TelnetServerHandler extends SimpleChannelInboundHandler< String> { @Override public void channelActive(ChannelHandlerContext ctx) throws Exception { // Send greeting for a new connection. ctx.write("Welcome to " + InetAddress.getLocalHost().getHostName() + "! \r\n"); ctx.write("It is " + new Date() + " now.\r\n"); ctx.flush(); } @Override public void channelRead0(ChannelHandlerContext ctx, String request) throws Exception { // Generate and write a response. String response; boolean close = false; if (request.isEmpty()) { response = "Please type something.\r\n"; } else if ("bye".equals(request.toLowerCase())) { response = "Have a good day! \r\n"; close = true; } else { response = "Did you say '" + request + "'? \r\n"; } // We do not need to write a ChannelBuffer here. // We know the encoder inserted at TelnetPipelineFactory will do the conversion. ChannelFuture future = ctx.write(response); // Close the connection after sending 'Have a good day! ' // if the client has sent 'bye'. if (close) { future.addListener(ChannelFutureListener.CLOSE); } } @Override public void channelReadComplete(ChannelHandlerContext ctx) { ctx.flush(); } @Override public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { cause.printStackTrace(); ctx.close(); }}Copy the code

TelnetServerInitializer

package telnet;

import io.netty.channel.ChannelInitializer;
import io.netty.channel.ChannelPipeline;
import io.netty.channel.socket.SocketChannel;
import io.netty.handler.codec.DelimiterBasedFrameDecoder;
import io.netty.handler.codec.Delimiters;
import io.netty.handler.codec.string.StringDecoder;
import io.netty.handler.codec.string.StringEncoder;
import io.netty.handler.ssl.SslContext;

/**
 * Creates a newly configured {@link ChannelPipeline} for a new channel.
 */
public class TelnetServerInitializer extends ChannelInitializer< SocketChannel> {

    private static final StringDecoder DECODER = new StringDecoder();
    private static final StringEncoder ENCODER = new StringEncoder();

    private static final TelnetServerHandler SERVER_HANDLER = new TelnetServerHandler();

    private final SslContext sslCtx;

    public TelnetServerInitializer(SslContext sslCtx) {
        this.sslCtx = sslCtx;
    }

    @Override
    public void initChannel(SocketChannel ch) throws Exception {
        ChannelPipeline pipeline = ch.pipeline();

        if (sslCtx != null) {
            pipeline.addLast(sslCtx.newHandler(ch.alloc()));
        }

        // Add the text line codec combination first,
        pipeline.addLast(new DelimiterBasedFrameDecoder(8192, Delimiters.lineDelimiter()));
        // the encoder and decoder are static as these are sharable
        pipeline.addLast(DECODER);
        pipeline.addLast(ENCODER);

        // and then business logic.
        pipeline.addLast(SERVER_HANDLER);
    }
}
Copy the code

2 System call after startup

The main thread (23109).

## 256 has no actual effect, Epoll_create (256) = 7epoll_ctl(7, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=5, u64=5477705356928876549}}) = 0 epoll_create(256) = 10epoll_ctl(10, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=8, u64=17041805914081853448}}) = 0 epoll_create(256) = 13 epoll_ctl(13, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=11, u64=17042151607409573899}}) = 0 epoll_create(256) = 16 epoll_ctl(16, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=14, u64=17042497300737294350}}) = 0 epoll_create(256) = 19 epoll_ctl(19, EPOLL_CTL_ADD, 17, {EPOLLIN, {u32=17, u64=17042561450368827409}}) = 0 epoll_create(256) = 10 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 20 clone(child_stack=0x7fc3c509afb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7fc3c509b9d0, tls=0x7fc3c509b700, child_tidptr=0x7fc3c509b9d0) = 23130Copy the code

Summarized as:

  • Create a socket for the OS and enable Clone Boss thread 23130.
  • An epoll is created for the BOSS (see “BOSS” below for an argument), and each worker creates an epoll data structure (essentially creating a data structure in the kernel’s memory area for subsequent listening).
  • Create a socket that the boss thread listens to (essentially creating a data structure in the kernel).

Boss (23130).

Bind (20, {sa_family=AF_INET, sin_port=htons(8023), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(20, {sa_family=AF_INET, sin_port=htons(8023), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 Sa_family =AF_INET, sin_port=htons(8023), sin_addr=inet_addr("0.0.0.0")} [16]) = 0 getsockName (20, {sa_family=AF_INET, sin_port=htons(8023), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 # poll socket with fd 7 and FD 20 Epoll_ctl_add and epoll_ctl_mod epoll_ctl(7, epoll_ctl_add, 20, {EPOLLIN, {u32=20, u64=14198059139132817428}}) = 0 epoll_ctl(7, EPOLL_CTL_MOD, 20, {EPOLLIN, {u32=20, u64=20}}) = 0 epoll_wait(7, [{EPOLLIN, {u32= 6, u64=17295150779149058053}}], 8192, 1000) =1 epoll_wait(7, [], 8192, 1000) = 0Copy the code

Summarized as:

  • Bind the fd: 20 created by the main thread in the previous step to port 8023, and enable listening (the nic listens and accepts connections and data, while the kernel routes to specific processes, see: About sockets and bind and listen, TODO).
  • Bind the FDS corresponding to socket 7 to the epoll data structure corresponding to socket 20 (both operating memory in the kernel).
  • Start blocking once for 1S waiting for any connection or data to arrive from epoll.

3 Client connection

boss (23130)

Accept (20, {sa_family=AF_INET, sin_port=htons(11144), sin_addr=inet_addr("42.120.74.122")}, [16]) = 24 {sa_family=AF_INET, sin_port=htons(8023), sin_addr=inet_addr("192.168.0.120")}, [16]) = 0 {sa_family=AF_INET, sin_port=htons(8023), sin_addr=inet_addr("192.168.0.120")}, [16]) = 0 setsockopt(24, SOL_TCP, TCP_NODELAY, [1], 4) = 0 getsockopt(24, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0 getsockopt(24, SOL_SOCKET, SO_SNDBUF, Clone (child_stack= 0x7fc3C4c98fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7fc3c4c999d0, tls=0x7fc3c4c99700, child_tidptr=0x7fc3c4c999d0) = 2301Copy the code

worker (2301)

writev(24, [{"Welcome to iZbp14e1g9ztpshfrla9m"..., 37}, {"It is Sun Aug 23 15:44:14 CST 20"..., 41}], 2) = 78 epoll_ctl(13, EPOLL_CTL_ADD, 24, {EPOLLIN, {u32=24, u64=24}}) = 0 epoll_ctl(13, EPOLL_CTL_MOD, 24, {EPOLLIN, {u32=24, u64=14180008216221450264}}) = 0 epoll_wait(13, [{EPOLLIN, {u32=11, u64=17042151607409573899}}], 8192, Epoll_wait (13, [], 8192, 1000) = 0 epoll_wait(13, [{EPOLLIN, {u32=24, u64=24}}], 8192, 1000) = 1Copy the code

Summary:

  • When epoll_wait waits for a connection, it first accepts the socket’s fd.
  • The BOSS throws a thread (clone function) immediately after the connection is established.
  • The worker (the new thread) writes a piece of data (business logic in this case).
  • The worker binds the fd corresponding to the client to epoll no. 13.
  • The worker continues to rotate and listen on epoll 13.

4 The client proactively sends data

The worker (2301).

read(24, "i am daojian\r\n", 1024) = 14 write(24, "Did you say 'i am daojian'? Epoll_wait (13, [], 8192, 1000) = 0Copy the code

Summarized as:

  • Wait for data to be read into user control memory immediately (read 1024 bytes into user control buff).
  • Write data (business logic, don’t worry too much).
  • Continue rotation for Epoll 13.

5 The client sends a BYE message, and the server disconnects the TCP connection

The worker (2301).

read(24, "bye\r\n", 1024) = 5 write(24, "Have a good day! \r\n", 18) = 18 getsockopt(24, SOL_SOCKET, SO_LINGER, {onoff=0, linger=0}, [8]) = 0 dup2(25, Epoll_ctl (13, EPOLL_CTL_DEL, 24, 24) = 24 # delete socket with fd = 24 from epoll data structure (OS) 0x7F702DD531e0) = -1 ENOENT ## Close 24 Socket close(24) = 0 ## Continue to wait 13 epoll_wait(13, [], 8192, 1000) = 0Copy the code

Disconnecting the client is summarized as follows:

  • Remove the fd corresponding to this client from epoll.
  • Close Close fd 24 on the client.
  • Continue rotation training epoll.

6 Connect five clients at the same time

Boss Thread (23130)

Accept (20, {sa_family = AF_INET, sin_port = htons (1846), the sin_addr = inet_addr (" 42.120.74.122 ")}, [16]) = 24 clone(child_stack=0x7f702cc51fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7f702cc529d0, tls=0x7f702cc52700, child_tidptr=0x7f702cc529d0) = 10035 accept(20, {sa_family = AF_INET, sin_port = htons (42067), the sin_addr = inet_addr (" 42.120.74.122 ")}, [16]) = 26 clone(child_stack=0x7f702cb50fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7f702cb519d0, tls=0x7f702cb51700, child_tidptr=0x7f702cb519d0) = 10067 ...Copy the code

Woker thread (10035, first connection)

epoll_ctl(13, EPOLL_CTL_ADD, 24, {EPOLLIN, {u32=24, u64=24}}) = 0
epoll_ctl(13, EPOLL_CTL_MOD, 24, {EPOLLIN, {u32=24, u64=3226004877247250456}}) = 0
epoll_wait(13, [{EPOLLIN, {u32=11, u64=17042151607409573899}}], 8192, 1000) = 1                  = 1
epoll_wait(13, [], 8192, 1000)          = 0
Copy the code

Worker thread (10067, second connection)

epoll_ctl(16, EPOLL_CTL_ADD, 26, {EPOLLIN, {u32=26, u64=26}}) = 0
epoll_ctl(16, EPOLL_CTL_MOD, 26, {EPOLLIN, {u32=26, u64=3221483685433835546}}) = 0
epoll_wait(16, [{EPOLLIN, {u32=14, u64=17042497300737294350}}], 8192, 1000) = 1
epoll_wait(16, [], 8192, 1000)          = 0
epoll_wait(16, [], 8192, 1000)          = 0
Copy the code

Worker thread (10067, second connection)

epoll_ctl(19, EPOLL_CTL_ADD, 27, {EPOLLIN, {u32=27, u64=27}}) = 0
epoll_ctl(19, EPOLL_CTL_MOD, 27, {EPOLLIN, {u32=27, u64=3216966479350071323}}) = 0
Copy the code

Worker thread (8055, fourth connection)

epoll_ctl(10, EPOLL_CTL_ADD, 28, {EPOLLIN, {u32=28, u64=28}}) = 0
epoll_ctl(10, EPOLL_CTL_MOD, 28, {EPOLLIN, {u32=28, u64=3302604828697427996}}) = 0
Copy the code

Worker thread (10035, fifth connection, not clone thread, but reuse the worker corresponding to the first epoll)

epoll_ctl(13, EPOLL_CTL_ADD, 29, {EPOLLIN, {u32=29, u64=29}}) = 0
epoll_ctl(13, EPOLL_CTL_MOD, 29, {EPOLLIN, {u32=29, u64=29}}) = 0
Copy the code

Summarized as:

  • Relationship between epoll, boss and worker: there are 4 workers corresponding to 4 Epoll objects, and boss and each worker have their own epoll.
  • According to the number of epolls, the boss distributes connections to the epoll corresponding to each worker in a balanced way.

7 summary

The following figure shows the interaction between Netty and kernel based on the investigation of system calls:

After initialization, five epolls are directly created. Number 7 is used by the boss to process connections with clients. The other four are used by the worker, and the user handles data interaction with the client.

The number of work threads depends on how many Epolls are created during initialization. The reuse of worker is essentially the reuse of epoll.

Why use epoll independently between works? Why not share?

  • In order to avoid competing connection processing among workers, Netty directly performs physical isolation to avoid competition. Each worker is only responsible for handling the connection managed by itself, and the subsequent read and write operations of each client in the worker are completely handled by the thread alone, naturally avoiding resource competition and locking.
  • Worker single thread, performance consideration: The worker not only needs epoll_wait, but also processes read and write logic. If the worker processes too many connections, it is bound to consume too much time slice, and cannot process more connections, resulting in performance degradation.

8 the pros and cons

advantages

  • Data processing: Netty provides a large number of mature data processing components (ENCODER, DECODER), HTTP, POP3 for use.
  • Coding complexity and maintainability: Netty fully decouples the business logic from the network. Only a small amount of BootStrap configuration is required, with more focus on the business logic processing.
  • Performance: Netty provides ByteBuf(the underlying Java native ByteBuffer), which provides pooled ByteBuf with both read performance and ByteBuf memory allocation (more on this in future documentation).

disadvantages

  • Getting started is difficult.

Five AIO

1 start

The main thread

epoll_create(256) = 5 epoll_ctl(5, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, Proactor: clone(child_stack= 0x7f340AC06fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7f340ac079d0, tls=0x7f340ac07700, child_tidptr=0x7f340ac079d0) = 22704 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 8 setsockopt(8, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0 setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(8, {sa_family=AF_INET6, sin6_port=htons(9090), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 listen(8, 50) accept(8, 0x7f67d01b3120, 0x7f67d9246690) = -1 epoll_ctl(5, EPOLL_CTL_MOD, 8, {EPOLLIN|EPOLLONESHOT, {u32=8, u64=15380749440025362440}}) = -1 ENOENT (No such file or directory) epoll_ctl(5, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLONESHOT, {u32=8, u64=15380749440025362440}}) = 0 read(0,Copy the code

22704 (BOSS thread (Proactor))

epoll_wait(5, < unfinished ... >Copy the code

2 Requesting a Connection

** epoll_wait(5,[{EPOLLIN, {u32=9, u64=4294967305}}], 64, -1) = 1 Accept (8, {sa_family=AF_INET6, sin6_port=htons(55320), inet_pton(AF_INET6, ":: FFFF :36.24.32.140", &SIN6_addr), sin6_flowInfo =0, sin6_scope_id=0}, [28]) = 9 clone(child_stack=0x7ff35c99ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7ff35c9a09d0, tls=0x7ff35c9a0700, child_tidptr=0x7ff35c9a09d0) = 26241 epoll_wait(5, < unfinished ... >Copy the code

26241

Add client connected FD to BOSS epoll So that BOSS thread to monitor network events epoll_ctl (9, 5, EPOLL_CTL_MOD, {EPOLLIN | EPOLLONESHOT, {u32 = 9, u64=4398046511113}}) = -1 ENOENT (No such file or directory) epoll_ctl(5, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLONESHOT, {u32=9, u64=4398046511113}}) = 0 accept(8, 0x7ff3440008c0, 0x7ff35c99f4d0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(5, EPOLL_CTL_MOD, 8, {EPOLLIN|EPOLLONESHOT, {u32=8, u64=8}}) = 0Copy the code

3 The client sends data

22704(BOSS thread (Proactor)) handles the connection

Epoll_wait (5,[{EPOLLIN, {u32=9, u64=4294967305}}], 512, -1) = 1 1024) = 12 ## The data is processed by other threads. Clone thread clone(child_stack= 0x7FF35C99FFB0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHIL D_CLEARTID, parent_tidptr=0x7ff35c9a09d0, tls=0x7ff35c9a0700, child_tidptr=0x7ff35c9a09d0) = 26532Copy the code

Duplicate thread processing, thread number 26532

write(1, "pool-1-thread-2-10received : dao"... , 41) = 41 write(1, "\n", 1) accept(8, 0x7f11c400b5f0, 0x7f11f42fd4d0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(5, EPOLL_CTL_MOD, 8, {EPOLLIN|EPOLLONESHOT, {u32=8, u64=8}}) = 0Copy the code

4 summarizes

  • From the perspective of system call, Java AIO actually implements asynchronous event distribution based on synchronous IO such as multiplexing (epoll on Linux).
  • BOSS Threads handle connections and distribute events.
  • WORKER threads are only responsible for executing events received from bosses and not for monitoring any network events.

5 advantages and disadvantages of

advantages

Compared to the previous BIO and NIO, AIO has encapsulated the task scheduling and only needs to be concerned with the task processing.

disadvantages

  • Event processing is entirely done by Thread Pool, and concurrency issues can occur for multiple events on the same channel.
  • Compared to Netty, the Buffer API is unfriendly and error-prone; The encoding and decoding work is complicated.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.