Since I learned the congestion control algorithm of TCP/IP last time, I more and more want to have a deeper understanding of some underlying principles of TCP/IP, searched a lot of information on the network, saw the column of Tao Hui Da Shen on high-performance network programming, and benefited a lot. Sum up today, and add some of their own thinking.

I have a good understanding of the Java language, and my understanding of Java network programming is limited to the use of the Netty framework. Netty source contributor Norman Maurer has a piece of advice for Netty network development: “Never block the event loop, reduce Context-swtiching.” That is, try not to block IO threads and try not to switch threads. We will only focus on the first half of the sentence today. Those who are interested in this sentence can have a look at the ant communication framework practice.

Why not block the IO thread that reads the network? It’s time to start with the classic network C10K and understand how the server can support 10,000 concurrent requests. The roots of C10K lie in the IO model of the network. Linux networks use synchronous blocking, where each request is allocated a process or thread. To support 10,000 concurrent requests, why use 10,000 threads to process requests? Scheduling, context switching, and even the amount of memory they consume can become bottlenecks. The common solution to C10K is to use I/O multiplexing, as Netty does.

Netty has a thread group (mainReactor) that is responsible for server listening to establish connections, a THREAD group (subReactor) that is responsible for reading and writing connections, and a Worker thread group (ThreadPool) that deals with business logic.

All three are independent, which has many advantages. One is that there is a dedicated thread group that listens to and processes the establishment of network connections, preventing the TCP/IP semi-connection queue (SYNC) and the acceptable connection queue (TCP/IP) from being filled up. Second, the IO thread group is separated from the Worker thread to process the network I/O and business logic in parallel, which can prevent the IO thread from being blocked and prevent the QUEUE of receiving TCP/IP packets from being occupied. Of course, if the business logic is small, that is, the light computing business is IO intensive, you can put the business logic in the I/O thread to avoid thread switching, which is the latter part of Norman Maurer’s statement.

Why are there so many TCP/IP queues? Today we will take a closer look at several TCP/IP queues, including sync for establishing connections, Accept for full connections, and Receive, OutoForder, PreQueue, and Backlog queues for receiving messages.

The queue for establishing a connection

As shown in the figure above, there are two queues: syns queue(half connection queue) and Accept queue(full connection queue). In the three-way handshake, after receiving the SYN packet from the client, the server puts the related information in the half-connection queue and replies the SYN+ACK packet to the client. At the third step, the server receives the ACK from the client. If the full connection queue is not full at this time, it takes out the related information from the semi-connection queue and puts it into the full connection queue. Otherwise, it performs the related operations according to the value of tcp_ABORT_ON_overflow and directly discard or retry after a period of time.

Queue for receiving packets

Compared with establishing a connection, TCP processing logic for receiving packets is more complex, with more queues and configuration parameters involved.

The application receiving TCP packets and the TCP packets sent by the server system where the application resides are two independent processes. Both manipulate socket instances, but use lock contention to determine who is in control at any given moment, resulting in many different scenarios. For example, when the application is receiving packets, the operating system receives packets again through the NIC. What should I do? If an application does not call read or RECV to read a packet, what happens when the operating system receives a packet?

Next, we will introduce three scenarios of TCP receiving packets in three diagrams, and introduce four queues related to receiving packets.

Receiving packets Scenario 1

The figure shows the schematic diagram of TCP packet receiving scenario 1. The operating system first receives the packet and stores it in the Receive queue of the socket. Then the user process invokes RECV to read the packet.

1) When the nic receives the packet and determines that it is TCP protocol, it will finally call the tcp_V4_RCV method of the kernel after layer upon layer invocation. Because S1 is the next packet that TCP currently receives, the tcp_v4_rCV function adds it directly to the receive queue. Receive queue Is a queue that removes the TCP header of received TCP packets and places them in order. User processes can read the received TCP packets in sequence. Because the socket is not in the context of the user process (that is, no user process is reading the socket), and we need a message with S1 serial number, we received an S1 message, so it entered the receive queue.

2) S3 packets are received. As the serial number of the next packet to be received by TCP is S2, it is added to the out_OF_order queue, and all out-of-order packets are stored here.

3) After receiving the S2 packet expected by TCP, it directly enters the Recevie queue. Since the out_OF_ORDER queue is not empty at this point, you need to check.

4) The out_OF_ORDER queue is checked every time a packet is inserted into the receive queue. S3 packets in the OUT_OF_ORDER queue are moved to the Receive queue because the desired sequence number is S3 after S2 packets are received.

5) The user process starts reading the socket, allocates a chunk of memory in the process, and then calls the read or recv method. Sockets have a set of configuration properties with default values. For example, the socket is blocked by default and its SO_RCVLOWAT property is 1 by default. Of course, methods like recv also receive a flag parameter, which can be set to MSG_WAITALL, MSG_PEEK, MSG_TRUNK, and so on, assuming 0 as the most common. The process called the recv method.

6) Call the tcp_recvmsg method

7) The tcp_recvmsg method locks the socket first. Sockets can be used by multiple threads, and the operating system also uses them, so concurrency must be handled. To manipulate the socket, acquire the lock.

The socket in step 5 does not contain MSG_PEEK, so the first packet is removed from the queue and released from the kernel mode. Conversely, the MSG_PEEK flag bit causes the receive queue to not delete packets. Therefore, MSG_PEEK is mainly used when multiple processes read the same socket.

9) Copy the second message. Of course, before copying, it will check whether the remaining space of user-mode memory is enough to put the current message. If not, it will directly return the number of bytes copied. 10) Copy the third packet. 11) The receive queue is empty and the SO_RCVLOWAT threshold is checked. If the number of bytes copied is smaller than this number, the process sleeps and waits for more packets. The default value of SO_RCVLOWAT is 1, which means that a packet can be returned after being read.

12) Check the Backlog queue. The backlog queue is where the network card receives packets when user processes are copying data. If there is data in the Backlog queue at this point, process it along the way. The Backlog queue has no data, so release the lock and prepare to return to user mode.

13) The user process code starts executing, at which point methods such as recv return the number of bytes copied from the kernel.

Receiving packets Scenario 2

The second figure shows the second scenario, involving the Prequeue queue. When a user process calls recV, there are no packets in the socket queue and the socket is blocked, so the process falls asleep. The operating system then receives the packet, at which point the prequeue takes effect. In this scenario, tcp_low_latency is the default 0, and SO_RCVLOWAT of socket sockets is the default 1, which still blocks sockets, as shown in the following figure.

Steps 1, 2 and 3 are treated as before. Let’s go straight to step 4.

4) Because the Receive, prequeue, and Backlog queues are empty, no bytes are copied to user memory. The socket configuration requires that at least 1-byte SO_RCVLOWAT (SO_RCVLOWAT) packets be copied. Therefore, the waiting process for blocking sockets is entered. The maximum waiting time is the time specified by SO_RCVTIMEO. The socket will release the socket lock before entering the backlog queue, so that incoming packets in Step 5 can no longer only enter the backlog queue. 5) After receiving the S1 packet, add it to the PreQueue. 6) After being inserted into the prequeue queue, it wakes up the dormant process on the socket. 7) After the user process is woken up, the socket lock is reacquired. The packets received thereafter can only enter the Backlog queue. 8) The process first checks the receive queue, which of course is still empty; Then check the prequeue queue and find the packet S1 that is waiting for the sequence number. Copy the packet from the prequeue queue to the user memory and release the packet in the kernel. 9) One byte of the packet has been copied to the user memory. Check whether the length exceeds the minimum threshold, that is, len and SO_RCVLOWAT. 10) Because SO_RCVLOWAT used the default value 1, the number of bytes copied was greater than the minimum threshold, and we are going to return to user mode. We will check whether there is any data in the backlog queue, so we are going to put it back and release the socket lock. 11) Returns the number of bytes that the user has copied.

Receiving packets Scenario 3

In the third scenario, the system parameter tcp_low_latency is 1 and the SO_RCVLOWAT attribute value is set on the socket. The server receives packet S1, but its length is smaller than SO_RCVLOWAT. The user process calls the recV method to read, and although it reads part of it, it does not reach the minimum threshold, so the process falls asleep. At the same time, out-of-order S3 packets received before sleep are directly added to the Backlog queue. Then, packet S2 arrives, and since it does not use prequeue (because TCP lowAT is set) and its start sequence number is the next value to be copied, it is copied directly to user memory. The total number of bytes copied meets the requirement of SO_RCVLOWAT! Finally, copy S3 packets from the Backlog queue to the user before returning them.

1) The received packet S1 is the sequence number of the packet to be received. Therefore, the packet is directly added to the orderly Receive queue. 2) Set tcp_LOW_latency to 1, indicating that the server expects programs to receive TCP packets in a timely manner. The recV invoked by the user receives packets on the blocking socket whose SO_RCVLOWAT value is greater than the size of the first packet, and the user allocates a large enough memory of len length. 3) Call the tcp_recvmsg method to complete the receiving work, first lock the socket. 4) Prepare to process packets in each receiving queue of the kernel. 5) Packets in the Receive queue, whose size is smaller than len, can be directly copied to the user memory. 6) At the same time of step 5, the kernel receives S3 packets. At this time, the socket is locked and the packets go directly to the Backlog queue. The message is not in order. 7) In step 5, copy message S1 to user memory, whose size is smaller than SO_RCVLOWAT. Because the socket is blocked, the user process goes to sleep. The backlog queue is processed before sleep. S3 packets are out of order and enter the out_OF_order queue. User processes process the Backlog queue before going to sleep. 8) The process sleeps until it times out or the receive queue is not empty. 9) The kernel receives message S2. Note that since the TCP_LOW_LATENCY flag bit is enabled at this time, packets will not enter the Prequeue for processing. 10) Since packet S2 is the packet to be received, and a user process is in hibernation waiting for the packet, packet S2 is directly copied to the user memory. 11) After processing an ordered message, whether it is copied to the receive queue or directly to the user memory, the out_OF_ORDER queue is checked to see whether any message can be processed. S3 packets are copied to the user memory and the user process is woken up. 12) Wake up user processes. 13) Check whether the number of copied bytes is greater than SO_RCVLOWAT and whether the backlog queue is empty. Both satisfied, ready to return.

Summarize the role of the four queues.

  • The Receive queue is a real receive queue. After the operating system checks and processes the RECEIVED TCP packets, the operating system saves them to this queue.

  • Backlog is “standby queue”. When the socket is in the context of the user process (that is, the user is making a system call to the socket, such as RECV), the operating system receives packets and stores them in the Backlog queue and returns them directly.

  • A prequeue is a “pre-stored queue”. When the socket is not in use by a user process, that is, the user process makes a read or recV system call but is asleep, the operating system stores the received packets in the PreQueue and returns them.

  • Out_of_order is an out-of-order queue. Queues store out-of-order packets. If the packets received by the operating system are not the next packets to be received by TCP, they are placed in the out_OF_ORDER queue for subsequent processing.

Afterword.

If you found this article helpful, please give it a thumbs up. Meanwhile, please subscribe to my wechat official account.

reference

  • http://www.voidcn.com/article/p-gzmjmmna-dn.html

  • https://blog.csdn.net/russell_tao/article/details/9950615

  • https://ylgrgyq.github.io/2017/08/01/linux-receive-packet-3/