There are many details about the three-way handshake that were not covered in the previous article, but in this article we delve into the process of building a connection using the Backlog parameter. Here are some things you can learn from reading this article:

  • What is backlog, half-connection queue, full connection queue
  • How does the Linux kernel calculate half-connection queues and full-connection queues
  • Why does changing the somaxCONN and tcp_MAX_syn_backlog for the system alone not affect the final queue size
  • How do I use the SystemTap probe to obtain the half-connection and full-connection queue information of the current system
  • What is the principle behind the SS tool in the iprouter library
  • How to quickly simulate half-connection queue overflow and full-connection queue overflow

Note: The code and tests in this article were performed under kernel versions 3.10.0-514.16.1.el7.x86_64.

Basic concepts of half-connection queue and full connection queue

To understand the backlog, we need to understand what’s going on behind the listen and Accept functions. The backlog argument is related to the listen function, which is defined as follows:

int listen(int sockfd, int backlog);
Copy the code

When the server calls LISTEN, TCP’s state changes from CLOSE to LISTEN, and the kernel creates two queues:

  • Incomplete Connection Queue, also known as THE SYN queue
  • The Completed Connection queue is also called the Accept queue

As shown in the figure below.

Let’s go into more detail about the two queues.

Semi-connected Queue (SYN Queue)

When a client sends a SYN to a server, the server replies with an ACK and its SYN. In this case, the TCP status on the server changes from LISTEN to SYN_RCVD (SYN Received). In this case, the connection information is put into the half-connection Queue, also called THE SYN Queue, which stores inbound SYN packets.

After replying to a SYN ACK packet, the server waits for an ACK reply from the client and starts a timer. If the CLIENT does not receive an ACK after the timeout, the server retransmits the SYN ACK packet. The number of retransmissions is determined by the tcp_SYNack_retries value. On CentOS this value is equal to 5.

Once it receives an ACK from the client, the server attempts to add it to another Accept Queue.

Calculation of the size of the half-connection queue

Here, the SystemTap tool is used to insert the system probe, and after receiving the SYN packet, the current SIZE of the SYN queue and the total size of the half-connected queue are printed.

The process for the TCP LISTEN socket to receive SYN packets is as follows

tcp_v4_rcv
  ->tcp_v4_do_rcv
    -> tcp_v4_conn_request
Copy the code

Inject the tcp_v4_conn_REQUEST method as shown below.

probe kernel.function("tcp_v4_conn_request") {
    tcphdr = __get_skb_tcphdr($skb);
    dport = __tcp_skb_dport(tcphdr);
    if (dport == 9090)
    {
        printf("reach here\n"); // Size of the current SYN queue syn_qlen = @cast($sk."struct inet_connection_sock")->icsk_accept_queue->listen_opt->qlen; // Total length of SYN queueslogValue max_syn_qlen_log = @ cast ($sk."struct inet_connection_sock")->icsk_accept_queue->listen_opt->max_qlen_log; // Total length of syn queues, 2^n max_syn_qlen = (1 << max_syn_qlen_log);printf("syn queue: syn_qlen=%d, max_syn_qlen_log=%d, max_syn_qlen=%d\n",
         syn_qlen, max_syn_qlen_log, max_syn_qlen);
        // max_acc_qlen = $sk->sk_max_ack_backlog;
        // printf("accept queue length limit: %d\n", max_acc_qlen) print_backtrace(); }}Copy the code

Execute the above script using STAP

sudo stap -g syn_backlog.c
Copy the code

In this way, after receiving SYN packets, you can print the number of connections and the total size of SYN queues.

Again using the previous Echo program as an example, the Backlog for Listen is set to 10, as shown below.

int server_fd = //...

listen(server_fd, 10 /*backlog*/)
Copy the code

Start the echo-server and listen on port 9090. Then use the NC command on the other machine to connect.

Nc 10.211.55.10 9090Copy the code

The current syn queue size is 0, and the maximum queue length is 2^4=16

So you can see that the actual SYN is not equal to net.ipv4.tcp_max_syn_backlog, which defaults to 128. Instead, the user’s passed in 10 is raised to the nearest exponential power of 2, 16.

The size of the semi-connected queue depends on three values:

  • The user layer listens to the backlog passed in
  • System variablesnet.ipv4.tcp_max_syn_backlogThe default value is 128
  • System variablesnet.core.somaxconnThe default value is 128

The calculation is shown in the source code below. A call to listen first enters the following code.

SYSCALL_DEFINE2(listen, int, fd, int, Backlog) {// syscTL_somaxconn is the system variable net.core.somaxconn. Int somaxconn = syscTL_somaxconn;if ((unsigned int)backlog > somaxconn)
		backlog = somaxconn;
	sock->ops->listen(sock, backlog);
}
Copy the code

The SYSCALL_DEFINE2 code tells us that if the user passes a backlog value greater than the value of the system variable net.core.somaxconn, the user’s backlog does not take effect. The system variable value is used, which defaults to 128.

This backlog value is then passed to the inet_LISTEN ()->inet_csk_listen_start()->reqsk_queue_alloc() method in turn. The final calculation is done in the reqsk_queue_alloc method. The simplified code looks like this.

int reqsk_queue_alloc(struct request_sock_queue *queue,
		      unsigned int nr_table_entries)
{
    nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
    nr_table_entries = max_t(u32, nr_table_entries, 8);
    nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
    	
    for (lopt->max_qlen_log = 3;
         (1 << lopt->max_qlen_log) < nr_table_entries;
         lopt->max_qlen_log++);
}
Copy the code

In this code, NR_table_entries are the previously calculated backlog values and syscTL_MAX_syn_backlog is the net.ipv4. tcp_MAX_syn_backlog value. The calculation logic is as follows:

  • Assign smaller values in nr_table_entries and syscTL_max_syn_backlog to NR_table_entries
  • Take the larger values of nr_table_entries and 8 and assign them to nr_table_entries
  • Nr_table_entries + 1 to the nearest maximum power of 2
  • Use the for loop to find the nearest log of 2 not greater than nr_table_entries

For a few practical examples, take listen(50), evaluate the backlog to min(50, somaxconn) in SYSCALL_DEFINE2, which is equal to 50, and then reqsk_queue_alloc.

// min(50, 128) = 50 nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog); // max(50, 8) = 50 nr_table_entries = max_t(u32, nr_table_entries, 8); // roundup_pow_of_two(51) = 64 nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); Max_qlen_log minimum value is 2^3 = 8for(lopt->max_qlen_log = 3; (1 << lopt->max_qlen_log) < nr_table_entries; lopt->max_qlen_log++); afterforLoop max_qlen_log = 2^6 = 64Copy the code

Several final half-connected queue size values are given for different combinations of somaxCONN, MAX_syn_backlog, and Backlog.

somaxconn max_syn_backlog listen backlog Half connection queue size
128 128 5 16
128 128 10 16
128 128 50 64
128 128 128 256
128 128 1000 256
128 128 5000 256
1024 128 128 256
1024 1024 128 256
4096 4096 128 256
4096 4096 4096 8192

You can see:

  • If system parameters are not modified, blindly increasing the Listen backlog has no effect on the size of the final semi-connected queue.
  • Blindly increasing the SOMaxCONN and max_syn_backlog has no effect on the size of the final semi-connected queue, as long as the backlog in Listen remains the same

The simulated half-connection queue is full

For example, somaxconn=128, tcp_MAX_SYN_backlog =128, and Listen backlog=50. The simulated principle is that in the second step of the three-way handshake, the client uses iptables to discard the packet after receiving the SYN+ACK from the server. In this experiment, the server is 10.211.55.10 and the client is 10.211.55.20. Add a rule using iptables on the client, as shown below.

Sudo iptables --append INPUT --match TCP --protocol TCP -- SRC 10.211.55.10 --sport 9090 --tcp-flags SYN SYN --jump DROPCopy the code

This rule dismisses SYN packets from IP address 10.211.55.10 and source port 9090, as shown in the following figure.

Now use your favorite language to initiate the connection. Here select Go and the code is as follows:

func main() {
	for i := 0; i < 2000; i++ {
		go connect()
	}
	time.Sleep(time.Minute * 10)
}
func connect() {
	_, err := net.Dial("tcp4"."10.211.55.10:9090")
	iferr ! = nil { fmt.Println(err) } }Copy the code

Run the go program and use netstat on the server to check the current connection status of port 9090, as shown below.

netstat -lnpa | grep :9090  | awk '{print $6}' | sort | uniq -c | sort -rn
     64 SYN_RECV
      1 LISTEN
Copy the code

You can see that the number of connections in the SYN_RECV state starts at 0 and then stops at 64, which is the size of the half-connection queue.

Let’s look at the fully connected queue

Accept Queue

The “Full connection queue” contains all the connection queues on the server that have completed the three-way handshake, but have not yet been removed by the application call Accept. The socket is in the ESTABLISHED state. Each application call to accept() removes the connection to the queue header. If the queue is empty, Accept () usually blocks. The full connection queue is also known as the Accept queue.

You can think of this process as a producer-consumer model. The kernel is a producer responsible for three handshakes, and the handshake is put into a queue. Our application is a consumer, fetching the connection from the queue for further processing. This kind of producer-consumer model, when production is too fast, consumption is too slow, there will be a backlog.

The second parameter to listen, backlog, is used to set the full connection queue size, but it is not always the backlog value. This is limited by somaxconn, which will be explained in more detail later.

int listen(int sockfd, int backlog)

If the full connection queue is full, the kernel will reject the ACK sent by the client (the application layer will assume that the connection is not fully established).

Let’s simulate a full connection queue. Since only Accept removes the queue for full connections, if we just listen and don’t call Accept, the full connection will soon be filled.

To get to the bottom of the call, which is implemented in C, create a new main.c file

#include <stdio.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <arpa/inet.h>

int main(a) {
    struct sockaddr_in serv_addr;
    int listen_fd = 0;
    if ((listen_fd = socket(AF_INET, SOCK_STREAM, 0))"0) {
        exit(1);
    }
    bzero(&serv_addr, sizeof(serv_addr));

    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    serv_addr.sin_port = htons(8080);

    if (bind(listen_fd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) == - 1) {
        exit(1);
    }
    
    // Set backlog to 50
    if (listen(listen_fd, 50) = =- 1) {
        exit(1);
    }
    sleep(100000000);
    return 0;
}
Copy the code

Run GCC main.c; ./a.out, initiate connect using the previous go program, and check TCP connection status using netstat on the server

netstat -lnpa | grep :9090  | awk '{print $6}' | sort | uniq -c | sort -rn
     51 ESTABLISHED
     31 SYN_RECV
      1 LISTEN
Copy the code

Although many requests were sent, only 51 of them were actually in the ESTABLISHED state, and a large number were in the SYN_RECV state.

Also note that the backlog is equal to 50, but there are actually 51 connections in the ESTABLISHED state, as discussed later.

TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP TCP

Proto Recv -q Send -q Local Address Foreign Address State TCP 0 0 10.211.55.20:37732 10.211.55.10:9090 ESTABLISHED 23618/./connect TCP 0 0 10.211.55.20:37824 10.211.55.10:9090 ESTABLISHED 23618/./connect TCP 0 0 10.211.55.20:37740 10.211.55.10:9090 ESTABLISHED 23618 /. / the connect...Copy the code

Systemstap allows you to observe the current full connection queue in real time, as shown in the probe code below.

probe kernel.function("tcp_v4_conn_request") {
    tcphdr = __get_skb_tcphdr($skb);
    dport = __tcp_skb_dport(tcphdr);
    if (dport == 9090)
    {
        printf("reach here\n"); // Size of the current SYN queue syn_qlen = @cast($sk."struct inet_connection_sock")->icsk_accept_queue->listen_opt->qlen; // Total length of SYN queueslogValue max_syn_qlen_log = @ cast ($sk."struct inet_connection_sock")->icsk_accept_queue->listen_opt->max_qlen_log; // Total length of syn queues, 2^n max_syn_qlen = (1 << max_syn_qlen_log);printf("syn queue: syn_qlen=%d, max_syn_qlen_log=%d, max_syn_qlen=%d\n",
         syn_qlen, max_syn_qlen_log, max_syn_qlen);
        ack_backlog = $sk->sk_ack_backlog;
        max_ack_backlog = $sk->sk_max_ack_backlog;
        printf("accept queue length, max: %d, current: %d\n", max_ack_backlog, ack_backlog)
    }
}
Copy the code

Execute the probe using STAP, rerun the above test, and you can see the output of the kernel probe.

. syn queue: syn_qlen=45, max_syn_qlen_log=6, max_syn_qlen=64 accept queue length, max: 50, current: 14 ... syn queue: syn_qlen=2, max_syn_qlen_log=6, max_syn_qlen=64 accept queue length, max: 50, current: 51Copy the code

Here we can also see that the size of the full connection queue changes, which confirms our previous statement.

Tracing a package on the server side results in the following:

Note client 10.211.55.20 is A and server 10.211.55.10 is B

  • 1: Client A sends A SYN packet to port 9090 of server B to initiate the three-way handshake
  • 2: Server B immediately replies with ACK + SYN, and server B’s socket is in SYN_RCVD state
  • 3: Client A receives an ACK + SYN from server B and sends the ACK for the last step of the three-way handshake to server B. The client A is in ESTABLISHED state. At the same time, because the full connection queue of server B is full, it will drop the ACK and the connection has not been ESTABLISHED
  • 4: Server B thinks that it has lost the SYN + ACK in 2 because it does not receive an ACK. Therefore, server B retransmits the packet expecting the client to reply to the ACK again.
  • 5: Client A replies with an ACK immediately after receiving the SYN + ACK message from client B
  • 6-13: However, this ACK will also be discarded by server B. Server B still considers that it has not received the ACK, and the retransmission process is also exponential (1s, 2s, 4s, 8s, and 16s). The retransmission lasts for five times in total for 31sSYN + ACKLater, server B considers that the TCP connection is hopeless, and after a period of time, the system reclaims the TCP connection.

The number of SYN+ACK retries is determined by the /proc/sys/net/ipv4/tcp_synack_retries file in the operating system. You can view the file using cat

cat /proc/sys/net/ipv4/tcp_synack_retries
5
Copy the code

The whole process is shown below:

Size of the full connection queue

The size of the fully connected queue is the smaller value in the Backlog and somaxconn that Listen passes in.

The function to determine whether the full connection queue is full is the sk_acceptq_is_full method in /include/net/sock.h.

static inline bool sk_acceptq_is_full(const struct sock *sk)
{
	return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
}
Copy the code

There is nothing inherently wrong with this, except that the sk_ack_backlog is calculated from 0, so the true full connection queue size is backlog + 1. When you specify a backlog value of 1, the number of connections can be 2. Unix Network Programming Volume I, p. 87, Section 4.5, has a detailed comparison of the individual operating system backlog to the actual maximum number of fully connected queues.

Ss command

You can run the ss command to view the size of the full connection queue and the number of connections waiting for accept. Run the ss-lnt command. For example, the accept queue is full.

ss -lnt | grep :9090
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port
LISTEN     51     50           *:9090                     *:*
Copy the code

For LISTEN sockets, Recv -q indicates the number of connections in the Accept queue and send-q indicates the total size of the full connection queue (that is, the Accept queue).

Let’s look at the underlying implementation of the SS command. Ss command source in iproute2 project, it cleverly used netLink and TCP stack tcp_DIag module communication to obtain socket details. Tcp_diag is a statistical analysis module, which can obtain a lot of useful information in the kernel. Recv-q and send-Q in ss output are obtained from the TCP_DIag module. These two values are iDIAG_rqueue and iDIAG_wqueue equal to the inet_DIAG_MSG structure. The source code for the tcp_diag section is shown below.

static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
			      void *_info)
{
	struct tcp_info *info = _info;

	if(inet_sk_state_load(sk) == TCP_LISTEN) {Recv -q r->idiag_rqueue = READ_ONCE(sk-> sk_ACK_backlog); Send -q r->idiag_wqueue = READ_ONCE(sk->sk_max_ack_backlog); }else if(sk->sk_type == SOCK_STREAM) { const struct tcp_sock *tp = tcp_sk(sk); r->idiag_rqueue = max_t(int, READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq), 0); r->idiag_wqueue = READ_ONCE(tp->write_seq) - tp->snd_una; }}Copy the code

From the above source can be known:

  • For the sockets in the LISTEN state, Recv -q corresponds to the sk_ack_backlog, which indicates the number of connections for the current socket to be accepted after completing the three-way handshake. Send -q corresponds to the sk_max_ack_backlog, which indicates the number of connections for the user process to accept. Indicates the maximum number of connections that the current socket full connection queue can hold
  • For non-LISTEN sockets, Recv -q indicates the byte size of the receive queue, and send-q indicates the byte size of the Send queue

other

The size of the backlog is appropriate

With all this said, how big a backlog is it reasonable for an application to have?

The answer is It depends, and you need to make corresponding adjustments according to different business scenarios.

  • If your interface has very high connection speed requirements, or you are doing stress tests, it is necessary to increase this value
  • If the business interface itself is not performing well and accept is slow to remove the existing connections, then scaling the backlog too large will not help and will only increase the likelihood of failed connections

For your reference, the backlog is typically 511 for Nginx and Redis, 128 for Linux, and 50 for Java

Tcp_abort_on_overflow parameters

By default, when the full connection queue is full, the server ignores the client’s ACK and then retransmits SYN+ACK. This behavior can also be modified. The value is determined by /proc/sys/net/ipv4/tcp_abort_on_overflow.

  • Tcp_abort_on_overflow 0 indicates that the server will discard the ACK sent by the client when the full connection queue of the last step of the three-way handshake is full. The server will retransmit the SYN+ACK.
  • Tcp_abort_on_overflow 1 indicates that the server directly sends RST to the client when the full connection queue is full.

However, returning the RST packet to the client causes another problem. The client does not know whether the RST packet the server is responding to is because “there is no process listening on this port” or “this port is being listened on but its queue is full”.

summary

In this article, we investigate the relationship between half-connected queue and full-connected queue from the perspective of backlog parameters. Just a quick review.

  • Half-connection queue: If the server replies with A SYN+ACK packet from the client but does not receive an ACK packet from the client, the server puts the connection information into the half-connection queue. The half-connection queue is also known as the SYN queue.
  • Full connection queue: a connection queue in which the server has completed the three-way handshake but has not yet been accepted. The full connection queue is also known as the Accept queue.
  • The size of the semi-connected queue is dependent on the backlog, net.core.somaxconn, and net.core.somaxconn passed in by the user listen
  • The size of the full connection queue is the smaller value of the backlog and net.core.somaxconn passed in by the user listen

All the conclusions mentioned above should not be correct, which is my opinion all the time: the conclusion is not important, but the research process is important. I’m more trying to teach you how to do things, teach you tools and techniques, and if you can learn from them, that’s great.

Don’t trust conclusions drawn from online articles, including this one. The proof is in the experiment. Do it yourself.

If you have any questions, you can scan the following TWO-DIMENSIONAL code to follow my official number to contact me.