This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

1 Simulate a single machine connection bottleneck

As we know, usually start a server will bind a port, such as port 8000, of course, the client connection port is limited, excluding the maximum port 65535 and the default port 1024 and below the port, only 1024 ~65 535, deducting some common ports, the actual available port is only about 60,000. So, how do we achieve single-machine million connections? Assuming that the 100 ports [8 000,8 100) are started on the server, 100× 60 thousand can realize about 6 million connections. This is a basic knowledge of TCP. Although the port number is the same for the client, it is different for the server. In other words, it is determined by the source IP address, source port number, destination IP address, and destination port number. When the source IP address and source port number are the same, but the destination port number is different, the underlying system will treat them as two TCP connections. This is the preparation for a single Megabyte connection, as shown below.

Ports 1024 and below on a single machine can only be reserved for ROOT, and the range of client ports is 1 025 65 535. The following code is used to implement the simulation scenario of single machine million connections. First look at the server class, loop open the 100 listening ports [8 000 8 100], waiting for the client to connect.


package com.tom.netty.connection;

import io.netty.bootstrap.ServerBootstrap;
import io.netty.channel.ChannelFuture;
import io.netty.channel.ChannelFutureListener;
import io.netty.channel.ChannelOption;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.nio.NioServerSocketChannel;

/ * * *@author Tom
 */
public final class Server {
    public static final int BEGIN_PORT = 8000;
    public static final int N_PORT = 8100;

    public static void main(String[] args) {
        new Server().start(Server.BEGIN_PORT, Server.N_PORT);
    }

    public void start(int beginPort, int nPort) {
        System.out.println("Server starting...");

        EventLoopGroup bossGroup = new NioEventLoopGroup();
        EventLoopGroup workerGroup = new NioEventLoopGroup();

        ServerBootstrap bootstrap = new ServerBootstrap();
        bootstrap.group(bossGroup, workerGroup);
        bootstrap.channel(NioServerSocketChannel.class);
        bootstrap.childOption(ChannelOption.SO_REUSEADDR, true);

        bootstrap.childHandler(new ConnectionCountHandler());


        for (int i = 0; i <= (nPort - beginPort); i++) {
            final int port = beginPort + i;

            bootstrap.bind(port).addListener(new ChannelFutureListener() {
                public void operationComplete(ChannelFuture channelFuture) throws Exception {
                    System.out.println("Listener port bound successfully:"+ port); }}); } System.out.println("Server started!"); }}Copy the code

The ConnectionCountHandler class is used to count the number of requests per unit of time. The number of requests per unit of time increases with each connection.


package com.tom.netty.connection;


import io.netty.channel.ChannelHandler;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInboundHandlerAdapter;

import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

/** * Created by Tom. */
@ChannelHandler.Sharable
public class ConnectionCountHandler extends ChannelInboundHandlerAdapter {

    private AtomicInteger nConnection = new AtomicInteger();

    public ConnectionCountHandler(a) {
        Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Runnable() {
            public void run(a) {
                System.out.println("Current client connections:"+ nConnection.get()); }},0.2, TimeUnit.SECONDS);

    }

    @Override
    public void channelActive(ChannelHandlerContext ctx) {
        nConnection.incrementAndGet();
    }

    @Override
    public void channelInactive(ChannelHandlerContext ctx) { nConnection.decrementAndGet(); }}Copy the code

Looking at the client class code, the main function is to loop requests to the 100 ports opened by the server until the server does not respond and the thread hangs.


package com.tom.netty.connection;


import io.netty.bootstrap.Bootstrap;
import io.netty.channel.*;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.SocketChannel;
import io.netty.channel.socket.nio.NioSocketChannel;

/** * Created by Tom. */
public class Client {

    private static final String SERVER_HOST = "127.0.0.1";

    public static void main(String[] args) {
        new Client().start(Server.BEGIN_PORT, Server.N_PORT);
    }

    public void start(final int beginPort, int nPort) {
        System.out.println("Client started...");
        EventLoopGroup eventLoopGroup = new NioEventLoopGroup();
        final Bootstrap bootstrap = new Bootstrap();
        bootstrap.group(eventLoopGroup);
        bootstrap.channel(NioSocketChannel.class);
        bootstrap.option(ChannelOption.SO_REUSEADDR, true);
        bootstrap.handler(new ChannelInitializer<SocketChannel>() {
            @Override
            protected void initChannel(SocketChannel ch) {}});int index = 0;
        int port;
        while(! Thread.interrupted()) { port = beginPort + index;try {
                ChannelFuture channelFuture = bootstrap.connect(SERVER_HOST, port);
                channelFuture.addListener(new ChannelFutureListener() {
                    public void operationComplete(ChannelFuture future) throws Exception {
                        if(! future.isSuccess()) { System.out.println("Connection failed, program closed!");
                            System.exit(0); }}}); channelFuture.get(); }catch (Exception e) {
            }

            if (port == nPort) { index = 0; }else{ index ++; }}}}Copy the code

Finally, package the server application and publish it to a Linux server, and package the client application and publish it to another Linux server. Next, start the server and client programs respectively. After running for a while, you see that the number of connections that the server is listening to stops at a value, as shown below.

Current client connections:870Current client connections:870Current client connections:870Current client connections:870Current client connections:870Current client connections:870Current client connections:870Current client connections:870Current client connections:870.Copy the code

And throws the following exception.


Exception in thread "nioEventLoopGroup-2-1" java.lang.InternalError: java.io.FileNotFoundException: /usr/java/jdk18.. 0 _121/jre/lib/ext/cldrdata.jar (Too many open files)
        at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1040)
        at sun.misc.URLClassPath.getResource(URLClassPath.java:239)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:503)
        at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2640)
        at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501)
        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465)
        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1419)
        at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361)
        at java.util.ResourceBundle.getBundle(ResourceBundle.java:845)
        at java.util.logging.Level.computeLocalizedLevelName(Level.java:265)
        at java.util.logging.Level.getLocalizedLevelName(Level.java:324)
        at java.util.logging.SimpleFormatter.format(SimpleFormatter.java:165)
        at java.util.logging.StreamHandler.publish(StreamHandler.java:211)
        at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:116)
        at java.util.logging.Logger.log(Logger.java:738)
        at io.netty.util.internal.logging.JdkLogger.log(JdkLogger.java:606)
        at io.netty.util.internal.logging.JdkLogger.warn(JdkLogger.java:482)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run (SingleThreadEventExecutor.java:876)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run (DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
				
Copy the code

At this point, it should be noted that the server has reached the maximum number of client connections it can accept, i.e. 870 connections the server can support. The next thing to do is figure out how to break through this bottleneck and get a single server to support a million connections, which is really exciting.

2 single machine million connection tuning solutions

2.1 Break local file handle limit

Start by typing the command on the server to see the maximum number of handles a single process can support.


ulimit -n

Copy the code

After you enter the command, a number of 1 024 is displayed, indicating the maximum number of files that can be opened by a process in Linux. The maximum number of files is restricted by the creation of a file in Linux when a TCP connection is enabled. So why did the number of server connections shown earlier settle at 870, which is less than 1,024? In fact, this is because in addition to the number of connections, the Class Class that the JVM opens also counts as in-process open files, so 1 024 minus the number of files that the JVM opens is the number of connections TCP can support. Next to find a way to break through the limit, the first command line, enter the following command, the server open/etc/security/limits file.


sudo vi /etc/security/limits.conf

Copy the code

Then add the following two lines to the end of the file.


* hard nofile 1000000
* soft nofile 1000000

Copy the code

The * indicates the current user, hard indicates the limit, and soft indicates the warning limit, respectively. Nofile indicates the maximum number of files. The following number 1 000 000 indicates that any user can open 1 million files, which is the maximum number supported by the operating system, as shown in the following figure.

Next, type the following command.


ulimit -n

Copy the code

At this point, we find that it is still 1024, unchanged, and restart the server. Re-run the server program and the client program separately and watch the connection count silently until it stops at 137 920 and throws an exception, as shown below.

Current client connections:137920Current client connections:137920Current client connections:137920Current client connections:137920Current client connections:137920
Exception in thread "nioEventLoopGroup-2-1" java.lang.InternalError: java.io.FileNotFoundException: /usr/java/jdk18.. 0 _121/jre/lib/ext/cldrdata.jar (Too many open files)
...

Copy the code

Why is that? There must be a limit to the number of connections, and to break that limit, you need to break the limit on the number of global file handles.

2.2 Breaking the global file handle limit

Run the following command on the Linux cli to view the number of files that can be opened by all user processes in Linux.


cat /proc/sys/fs/file-max

Copy the code

Using the command above, you can see the global limit and see that the result is 10 000. As you can imagine, the number of local file handles cannot be greater than the number of global file handles. Therefore, the global limit on the number of file handles must be raised to exceed this limit. Switch to the ROOT user first. Otherwise, the user has no permission.


sudo  -s
echo 2000> /proc/sys/fs/file-max
exit

Copy the code

Let’s make it 20,000 and test it out. Keep trying. Start the server and client programs respectively and find that the number of connections has exceeded the limit of 20 000. If you run the echo command to configure /proc/sys/fs/file-max, you can restart the server and change the value to 10 000. Therefore, run the vi command to change the value.


sodu vi /etc/sysctl.conf

Copy the code

Add the following to the end of the /etc/sysctl.conf file.


fs.file-max=1000000

Copy the code

The result is shown below.

Next, restart the Linux server and start the server and client programs.

Current client connections:9812451Current client connections:9812462Current client connections:9812489Current client connections:9812501Current client connections:9812503.Copy the code

The final number of connections is about 980,000. We found that this was largely limited by the performance of the machine itself. Using the htop command, the CPU is close to 100%, as shown in the figure below.

The above is about the tuning and performance improvement at the operating system level. The following is about the tuning at the Netty application level.

3 Netty application-level performance tuning

3.1 Reoccurrence of Performance Bottlenecks at the Netty Application Level

To start with the application scenario, here is a piece of standard server-side application code.



package com.tom.netty.thread;


import io.netty.bootstrap.ServerBootstrap;
import io.netty.channel.*;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.SocketChannel;
import io.netty.channel.socket.nio.NioServerSocketChannel;
import io.netty.handler.codec.FixedLengthFrameDecoder;

/** * Created by Tom. */
public class Server {

    private static final int port = 8000;

    public static void main(String[] args) {

        EventLoopGroup bossGroup = new NioEventLoopGroup();
        EventLoopGroup workerGroup = new NioEventLoopGroup();
        final EventLoopGroup businessGroup = new NioEventLoopGroup(1000);

        ServerBootstrap bootstrap = new ServerBootstrap();
        bootstrap.group(bossGroup, workerGroup)
                .channel(NioServerSocketChannel.class)
                .childOption(ChannelOption.SO_REUSEADDR, true);


        bootstrap.childHandler(new ChannelInitializer<SocketChannel>() {
            @Override
            protected void initChannel(SocketChannel ch) {
                // A custom length decoder is sent one long at a time
                // Pass the system timestamp one at a time
                ch.pipeline().addLast(newFixedLengthFrameDecoder(Long.BYTES)); ch.pipeline().addLast(businessGroup, ServerHandler.INSTANCE); }}); ChannelFuture channelFuture = bootstrap.bind(port).addListener(new ChannelFutureListener() {
            public void operationComplete(ChannelFuture channelFuture) throws Exception {
                System.out.println("The server is started successfully and the bound port is:"+ port); }}); }}Copy the code

We will focus on the ServerHandler class for logical processing on the server side.


package com.tom.netty.thread;


import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.channel.ChannelHandler;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.SimpleChannelInboundHandler;

import java.util.concurrent.ThreadLocalRandom;

/** * Created by Tom. */
@ChannelHandler.Sharable
public class ServerHandler extends SimpleChannelInboundHandler<ByteBuf> {
    public static final ChannelHandler INSTANCE = new ServerHandler();


    //channelread0 is the main thread
    @Override
    protected void channelRead0(ChannelHandlerContext ctx, ByteBuf msg) {
        ByteBuf data = Unpooled.directBuffer();
        // Read a timestamp from the client
        data.writeBytes(msg);
        // Simulate a business process, either a database operation or a logical process
        Object result = getResult(data);
        // write it back to the client
        ctx.channel().writeAndFlush(result);
    }

    // get a result from the database
    protected Object getResult(ByteBuf data) {

        int level = ThreadLocalRandom.current().nextInt(1.1000);

        // Calculate the time required for each response and use it as reference data for QPS

        //90.0% == 1ms 1000 100 > 1ms
        int time;
        if (level <= 900) {
            time = 1;
        //95.0% == 10ms 1000 50 > 10ms
        } else if (level <= 950) {
            time = 10;
        //99.0% == 100ms 1000 10 > 100ms
        } else if (level <= 990) {
            time = 100;
        //99.9% == 1000ms 1000 1 > 1000ms
        } else {
            time = 1000;
        }

        try {
            Thread.sleep(time);
        } catch (InterruptedException e) {
        }

        returndata; }}Copy the code

There is a getResult() method in the code above. The getResult() method can be thought of as a way to query data in a database, returning the results of each query to the client. In fact, to simulate query data performance, getResult() takes the timestamp passed by the client and ultimately returns the value passed by the client. It just does a random thread sleep before returning to simulate real business processing performance. The following table shows the performance parameters of the simulated scenario.

Ratio of business interfaces for data processing Time spent processing
90% 1ms
95% 10ms
99% 100ms
99.9% 1000ms

Let’s look at the client side, which is also a standard piece of code.


package com.tom.netty.thread;


import io.netty.bootstrap.Bootstrap;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.ChannelOption;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.SocketChannel;
import io.netty.channel.socket.nio.NioSocketChannel;
import io.netty.handler.codec.FixedLengthFrameDecoder;

/** * Created by Tom. */
public class Client {

    private static final String SERVER_HOST = "127.0.0.1";

    public static void main(String[] args) throws Exception {
        new Client().start(8000);
    }

    public void start(int port) throws Exception {
        EventLoopGroup eventLoopGroup = new NioEventLoopGroup();
        final Bootstrap bootstrap = new Bootstrap();
        bootstrap.group(eventLoopGroup)
                .channel(NioSocketChannel.class)
                .option(ChannelOption.SO_REUSEADDR, true)
                .handler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    protected void initChannel(SocketChannel ch) {
                        ch.pipeline().addLast(newFixedLengthFrameDecoder(Long.BYTES)); ch.pipeline().addLast(ClientHandler.INSTANCE); }});// The client makes 1 000 requests to the server per second
        for (int i = 0; i < 1000; i++) { bootstrap.connect(SERVER_HOST, port).get(); }}}Copy the code

As you can see from the code above, the client makes 1,000 requests to the server. Focus on the client logic handling the ClientHandler class.


package com.tom.netty.thread;


import io.netty.buffer.ByteBuf;
import io.netty.channel.ChannelHandler;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.SimpleChannelInboundHandler;

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

/** * Created by Tom. */
@ChannelHandler.Sharable
public class ClientHandler extends SimpleChannelInboundHandler<ByteBuf> {
    public static final ChannelHandler INSTANCE = new ClientHandler();

    private static AtomicLong beginTime = new AtomicLong(0);
    // Total response time
    private static AtomicLong totalResponseTime = new AtomicLong(0);
    // Total requests
    private static AtomicInteger totalRequest = new AtomicInteger(0);

    public static final Thread THREAD = new Thread(){
        @Override
        public void run(a) {
            try {
                while (true) {
                    long duration = System.currentTimeMillis() - beginTime.get();
                    if(duration ! =0) {
                        System.out.println("QPS: " + 1000 * totalRequest.get() / duration + "," + "Average response time:" + ((float) totalResponseTime.get()) / totalRequest.get() + "ms.");
                        Thread.sleep(2000); }}}catch (InterruptedException ignored) {
            }
        }
    };

    @Override
    public void channelActive(final ChannelHandlerContext ctx) {
        ctx.executor().scheduleAtFixedRate(new Runnable() {
            public void run(a) {
                ByteBuf byteBuf = ctx.alloc().ioBuffer();
                // Send the current system time to the serverbyteBuf.writeLong(System.currentTimeMillis()); ctx.channel().writeAndFlush(byteBuf); }},0.1, TimeUnit.SECONDS);
    }

    @Override
    protected void channelRead0(ChannelHandlerContext ctx, ByteBuf msg) {
        // Get a response time difference, the response time of this request
        totalResponseTime.addAndGet(System.currentTimeMillis() - msg.readLong());
        // Increments each time
        totalRequest.incrementAndGet();

        if (beginTime.compareAndSet(0, System.currentTimeMillis())) { THREAD.start(); }}}Copy the code

The above code mainly simulates the processing time of Netty’s real business environment. QPS is about 1 000 times, and statistics are performed every 2s. Next, start the server and client to view the console logs. First run the server and see the console log as shown below.

Then run the client and see the console log as shown in the figure below. After a period of time, it is found that the QPS remain within 1 000 and the average response time is getting longer and longer.

Going back to the getResul() method of the server ServerHander, there is thread sleep blocking in the getResult() method, which, as you can see, ends up blocking the main thread, causing all requests to be squeezed into one thread. If you put the following code into a thread pool, the effect will be completely different.


Object result =getResult(data);
ctx.channel().wrteAndFlush(result);

Copy the code

These two lines of code are put into a business thread pool and run continuously in the background, returning results as soon as they are finished.

3.2 Netty Application-level Performance Tuning Scheme

Let’s modify the code by creating a new ServerThreadPoolHander class in the server code.


package com.tom.netty.thread;


import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.channel.ChannelHandler;
import io.netty.channel.ChannelHandlerContext;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/** * Created by Tom. */
@ChannelHandler.Sharable
public class ServerThreadPoolHandler extends ServerHandler {
    public static final ChannelHandler INSTANCE = new ServerThreadPoolHandler();
    private static ExecutorService threadPool = Executors.newFixedThreadPool(1000);


    @Override
    protected void channelRead0(final ChannelHandlerContext ctx, ByteBuf msg) {
        final ByteBuf data = Unpooled.directBuffer();
        data.writeBytes(msg);
        threadPool.submit(new Runnable() {
            public void run(a) { Object result = getResult(data); ctx.channel().writeAndFlush(result); }}); }}Copy the code

The Handler Handler on the server side is then registered as ServerThreadPoolHander, and the original ServerHandler is removed as follows.


ch.pipeline().addLast(ServerThreadPoolHandler.INSTANCE);

Copy the code

Then, start the server and client programs to view the console logs, as shown in the figure below.

The final time was stable at about 15ms, and QPS exceeded 1000 times. In fact, this is not the optimal state, so keep adjusting. Change the number of threads for ServerThreadPoolHander to 20 as shown below.


    public static final ChannelHandler INSTANCE = new ServerThreadPoolHandler();
    private static ExecutorService threadPool = Executors.newFixedThreadPool(20);
		
Copy the code

Then start the program and find that the average response time is not much different, as shown in the figure below.

The conclusion is that the specific number of threads needs to be constantly adjusted and tested in the real environment to determine the most appropriate number. The purpose of this chapter is to show the method of optimization, not the result.

This article is “Tom play structure” original, reproduced please indicate the source. Technology is to share, I share my happiness! If this article is helpful to you, welcome to follow and like; If you have any suggestions can also leave a comment or private letter, your support is my motivation to adhere to the creation. Pay attention to “Tom bomb architecture” for more technical dry goods!

“Welcome to the discussion in the comments section. The excavation authorities will draw 100 nuggets in the comments section after project Diggnation. See the event article for details.”