In a pressure test, the number of concurrent connections of Ali Cloud SLB was full, resulting in a great delay of HTTP calls between services. The number of concurrent connections of SLB at that time is shown in the figure below.

After logging in to the container terminal, the number of connections in the ESTABLISH state of a single container in a front-end Node.js service reaches more than 20,000. Dozens of containers directly occupy the number of connections.

The following information is found through tcpdump packet capture:

  • HTTP Connection headers have Connection: keep-alive
  • None of the connections are multiplexed. After three handshakes, a connection is disconnected after 65 seconds by the Nginx timeout sending FIN packet

Therefore, the connection is keep-alive, does not close after the HTTP request is processed, and is not reused, so as soon as the pressure request comes in, the connection slowly increases, and immediately reaches the SLB bottleneck.

Node.js uses http.Agent to manage reusable connections. Create http.Agent instances as follows:

var agent = new http.Agent();
Copy the code

The request header Connection: keep-alive is specified in http.Agent, as shown below.

var agent = new http.Agent({keepAlive: true})
Copy the code

Let’s write the simplest example, as shown below.

let http = require('http');

const agent = new http.Agent({keepAlive: true});function sendHttp() {
    http.get({
        hostname: 'ya.test.me',
        port: 80,
        path: '/',
        agent: agent
    }, (res) => {
        console.log('STATUS: ' + res.statusCode);
        res.on('data', chunk => {
            console.log('BODY: ' + chunk);
        });
    });
}

sendHttp();

setTimeout(function () {
    console.log("start sleep");
    sendHttp();
}, 10 * 1000);

setTimeout(function() {}, 100000);Copy the code

After executing the node.js code above, the packet capture result is as follows

As you can see from the packet in this experiment, two HTTP requests 10 seconds apart reused the TCP connection, which was disconnected by Nginx after an idle of about 65 seconds.

This is where the search stops. Node.js has the ability to reuse connections, so why doesn’t this work? Node.js Agent maintains requests, freeSockets, and other data structures underneath, as shown below.

functionAgent(options) { this.requests = {}; this.sockets = {}; * test.ye.me:80 -> [socket1, socket2, socket3...]  * xxx.com:8080 -> [socket1, socket2, socket3...]  **/ this.freeSockets = {}; this.keepAliveMsecs = this.options.keepAliveMsecs || 1000; this.keepAlive = this.options.keepAlive ||false; / / to allow for a single host: port maximum number of connections enclosing maxSockets = this. Options. MaxSockets | | Agent. DefaultMaxSockets; / / to allow for a single host: port of the largest number of idle connections enclosing maxFreeSockets = this. Options. MaxFreeSockets | | 256; }Copy the code

When the connection is idle, the free event is called back, as shown in the core code comments below.

 this.on('free', (socket, options) => {// name is a concatenated string test.ya.me:80: var name = this.getName(options); Var freeSockets = this.freesockets [name]; var freeSockets = this.freesockets [name]; var freeLen = freeSockets ? freeSockets.length : 0; var count = freeLen; // If the current free connection is greater than the configured maxSockets or maxFreeSockets value, close the current socketif(count > this.maxSockets || freeLen >= this.maxFreeSockets) { socket.destroy(); // Close socket}else if(this keepSocketAlive (socket)) {if is Keep Alive - socket, add it to the freeSockets array freeSockets = freeSockets | | []; this.freeSockets[name] = freeSockets; socket[async_id_symbol] = -1; socket._httpMessage = null; this.removeSocket(socket, options); freeSockets.push(socket); }else// Implementation doesn't use it't want to keep socket alive socket.destroy(); }}}); }Copy the code

Through debugging, I found that the code of the online service did call back to the free event, but did not call the reuseSocket method. Each HTTP request creates a new http.Agent object, which is equivalent to creating a new connection pool for each HTTP call. After each HTTP request, the number of free connections in the pool is 1. Let’s do the experiment. The code is as follows.

let express = require("express");
let app = express();
let http = require('http');

app.get("/".function (req, res) {
    http.get({
        hostname: 'ya.test.me',
        port: 80,
        path: '/',
        agent: new http.Agent({keepAlive: true,})
    }, (result) => {
        console.log('STATUS: ' + result.statusCode);
        result.on('data', chunk => {
            console.log('BODY: ' + chunk);
        });
        res.send("hello");
    });
});
app.listen(3000);
Copy the code

Start the Node.js service and start fetching the package for port 80. Use the ab tool (or any other tool that can make bulk HTTP calls) to invoke the Node service.

ab -n 5000 -c 10  'http://10.211.55.10:3000/'
Copy the code

In a short period of time, There were thousands of connections established, none of which were reused. The same thing happens on the line.

netstat -lnpa | grep :80 | grep -v 8080  | awk '{print $6}' | sort | uniq -c | sort -rn

   2038 ESTABLISHED
      1 LISTEN
Copy the code

Trace one of the packages whose package interaction is as follows.

This connection is held for 65s before being disconnected by Nginx timeout. This connection is occupied and not reused, which is more harmful than short connections.

Next, we make the HTTP. Agent object global public and repeat the experiment using netstat. The result is shown below.

netstat -lnpa | grep :80 | grep -v 8080  | awk '{print $6}' | sort | uniq -c | sort -rn

     10 ESTABLISHED
      1 LISTEN
Copy the code

As you can see, only 10 connections were created at most, down several orders of magnitude from the previous 2000 + connections. The wireshark traces a packet as follows.

You can see that the connection is finally being reused.

summary

This problem was relatively simple, but it took some time to troubleshoot because I was not familiar with Node.js after wrapping many layers. This problem was also made early in Java, and when using OkHttp to initiate a connection, it would be a disaster if OkHttpClient instances were not singletons and were new each time they were called. Because OkHttpClient also maintains a connection pool internally.

public class OkHttpClient implements Cloneable, Call.Factory, WebSocket.Factory {
  final ConnectionPool connectionPool;
}
Copy the code

The lesson learned is that connection pools need to be used carefully and shared.

If you have any questions, you can scan the following TWO-DIMENSIONAL code to follow my official number to contact me.