1 Overview of the Linux network sending process

This section briefly describes how the data sent by SEND is sent to the network adapter step by step.

The user data is copied to the kernel mode and then processed by the protocol stack into the RingBuffer. Then the nic driver actually sends the data. When the send is complete, the CPU is notified by a hard interrupt and the RingBuffer is cleaned.

Free memory such as cache queue.

2 Nic startup preparations

Today’s network cards on servers generally support multiple queues. Each queue is represented by a RingBuffer. After multiple queues are enabled, a network card will have multiple ringbuffers.

Note: The RingBUffer is in-memory, and the kernel and network card form a mMAP-like mapping relationship

When sent in the future, Pointers to the same location in the two circular arrays will both point to the same SKB. In this way, both the kernel and the hardware can access the same data. The kernel writes data to the SKB, and the nic hardware sends it.

3 Accept Creates a new socket

Before sending data, we often need a socket that has already established a connection.

Take accept as an example. After accept, the process creates a new socket and places it in the current process’s list of open files for communication with the corresponding client.

Suppose the server process establishes two connections to the client via Accept. Let’s take a quick look at the relationship between the two connections and the process.

A more specific structure of the socket kernel object that represents a connection is shown below.

You can find the receive queue mentioned in the previous Linux receive packet

4 Sending data really starts

4.1 Implementation of the Send system call

The source code for the send system call is in the file net/socket.c. In this system call, the sendto system call is actually used internally. The whole chain of calls, while not short, actually does two simple things,

The first is to find the actual socket in the kernel, the object that records the addresses of functions of various protocol stacks. The second is to construct a struct MSGHDR object that contains all the data passed in by the user, such as the buffer address, the length of the data, etc. The rest is left to the next level, the inet_sendmsg function in the stack, where the address of the inet_sendmsg function is found by the OPS member in the socket kernel object. The general flow is shown in the figure.

The sendto and Send functions we use in user mode are implemented by the sendto system call. Send is just a convenience, encapsulated way to make it easier to call.

In the sendto system call, the actual socket kernel object is first looked up based on the socket handle number passed in by the user. Next, the user requests the buff, len, flag and other parameters all packed into a struct MSGHDR object.

4.2 Transport layer processing

1) Transport layer copy

After entering the protocol stack inet_sendMSG, the kernel then finds the specific protocol send function on the socket. For TCP, this is tcp_sendMsg (also found through the socket kernel object).

In this function, the kernel requests a kernel-mode SKB memory and copies the user’s data to be sent. Note that it does not necessarily start to send at this point, and it is likely that the call will return directly if the send condition is not met.

Note: the data to be sent from user-mode memory is copied to the kernel-mode SKB, which involves the overhead of one or several memory copies.

At this point, the user thread can return, and when the kernel sends the data, it is up to the kernel to further determine.

2) Transport layer transmission

Assuming that the kernel send condition is now met, the kernel calls tcp_write_xmit.

So let’s start directly with tcp_write_xmit, which handles transport layer congestion control, sliding window related work. When the window is satisfied, set the TCP header and transmit the SKB to the lower network layer for processing.

Here are some details to note:

A new SKB will be cloned because the SKB will be released when the network layer is called and the network card is sent. As we know that TCP supports lost-retransmission, the SKB cannot be deleted before receiving an ACK from the other party. So what the kernel does is every time it calls the network card to send, it actually passes a copy of the SKB. Wait until you receive an ACK before deleting it.

The STRUCTURE of the SKB: The SKB actually contains all the headers in the network protocol. When setting the TCP header, you simply point the pointer to the appropriate SKB location. After the IP header is set, in the pointer to move a line, to avoid frequent memory application and copy, high efficiency.

Data such as ports are then written and sent to the next network layer for processing

4.3 Sending processing at the network layer

At the network layer, routing item lookup, IP header setting, NetFilter filtering, and SKB segmentation (if the value is larger than the MTU) are processed. After these tasks are completed, the neighbor subsystem at a lower level handles them.

If the data is larger than the MTU, shards will be executed.

Tips:

1. Maximum Transmission Unit (MTU), usually 1500

2. If fragments occur at the Ip layer, all fragments will be retransmitted even if a piece of data is lost, because the Ip layer itself has no timeout retransmission mechanism —— The higher layer (such as TCP) is responsible for timeout and retransmission. If a packet from a TCP packet segment is lost, TCP resends the entire TCP packet segment after a timeout. The packet segment corresponds to an IP datagram (not a fragment). You cannot retransmit only one data fragment in the datagram.

  1. UDP is easy to cause IP fragment, but TCP does not cause IP fragment. TCP is automatically fragmented when it reaches MSS. The value of MSS is 1460 = 1500-40 (IP packet header length).

4.4 Neighbor Subsystem

The neighbor subsystem is a system located between the network layer and the data link layer. Its function is to provide an encapsulation for the network layer, so that the network layer does not have to care about the address information of the lower layer, and the lower layer decides which MAC address to send.

In the neighbor subsystem, neighbor entries are mainly found or created. During the creation of neighbor entries, actual ARP requests may be sent. Then encapsulate the MAC header and pass the sending process to the lower network device subsystem. The general flow is shown in the figure.

4.5 Network Device Subsystem

Network cards have multiple send queues (especially today’s network cards). The above call to the netdev_pick_tx function selects a queue to send.

4.6 Soft Interrupt Scheduling

4.7 Sending the IGB NIC Driver

Here an element is removed from the RingBuffer of the network card’s send queue and the SKB is attached to the element.

4.8 Sending a Hard Interrupt Is complete

The SKB is cleaned up, DMA maps are unmapped, etc

summary

3. What memory copy operations are involved when sending network data?

By memory copy, we mean only the memory copy of the data to be sent.

The first copy operation is after the kernel has applied for the SKB. At this time, the content of the buffer passed by the user will be copied to the SKB. This copy operation can be expensive if the amount of data to be sent is large.

The second copy operation will clone a new copy of each SKB when it enters the network layer from the transport layer. The network layer and its underlying drivers, soft interrupts, and other components delete the copy when it is sent. The transport layer holds the original SKB and can resend it if the network counterpart does not have an ACK to achieve the reliable transmission required in TCP.

The third copy is not required and is required only when the SKB at the IP layer is larger than the MTU. It will apply for additional SKBS and copy the original SKBS into multiple smaller SKBS.

reference

Mp.weixin.qq.com/s/wThfD9th9…