Do you really know TCP?

TCP/IP reference model and protocol stack

To talk about the TCP/IP reference model, the OSI (Open System Interconnection) reference model, originally theorized by ISO, divides computer networks into seven layers. The TCP/IP reference model is a practical application model based on the OSI model, which simplifies the OSI model from 7 layers to 5 layers. Interestingly, generally speaking, the model is established first and then the protocol is formulated, but TCP/IP has the protocol first, and then according to the established TCP/IP protocol, the TCP/IP model is established by referring to OSI. Here’s how the three correspond:

Why is TCP/IP commonly referred to, and TCP, IP rarely mentioned separately?

In the lowest layer Ethernet protocol (Ethernet) provides the basic format of the basic packet, but only rely on Ethernet protocol can not realize the communication between multiple lans. At this time IP protocol by defining a set of its own address rules (IP address) to achieve the routing function, data packets by looking for the corresponding address transmission to achieve the communication between lans. However, IP protocol is only an address protocol, which is not responsible for the reliability and integrity of packets. If packets are lost or out of order during transmission, IP protocol cannot be traced and recovered. At this point, TCP is needed to handle it. So TCP and IP are essentially two protocols, but they are used together in most scenarios to ensure reliable transmission of packets.

TCP packet

The Ethernet protocol packet size is 1522 bytes, of which 1500 bytes are the payload (which can be understood as the actual information to be transmitted) and 22 bytes are the header information.

IP packets In the Ethernet packet payload, IP packets generally contain a minimum of 20 bytes of header information and a maximum of 1480 bytes of payload.

TCP packets In the IP packet payload, TCP packets generally also contain a minimum of 20 bytes and a maximum of 1460 bytes. The 1460-byte TCP packet payload may also be divided into application header information and user transfer information, depending on the application layer protocol.

So in HTTP 1.1, a 1500 byte message requires two TCP packets.

The cliche of three handshakes and four waves

Three-way handshake

The three-way handshake ensures that the connection is duplex, that is, the client and server must be able to send and receive messages. The detailed process of the three-way handshake is shown as follows:

First handshake:

The client sends a SYN message to the server, and the server receives it successfully. The SYN contains the initial sequence number of the client.

Client perspective: its ability to send messages is OK.

Server perspective: its own ability to receive messages and the client’s ability to send messages are OK.

Second handshake:

The server returns a SYN segment and ACK packet to the client, which is accepted successfully. The SYN segment sent by the server contains the server’s initial sequence number, and the ACK packet is the client’s SYN segment + 1.

Client view: When the server receives and responds to the message sent by itself, it indicates that the server’s ability to receive and send messages is OK, and the client’s ability to receive and send messages is OK. In this case, the client can confirm that the sending and receiving capabilities of both parties are normal.

Server perspective: its ability to send messages is OK.

Third handshake:

The client sends an ACK packet to the server. In this case, the ACK packet is the SYN + 1 segment of the server.

From the perspective of the server: If the client responds to its own message, it indicates that the client’s ability to receive messages is OK. In this case, the server can also confirm that the sending and receiving capabilities of both parties are normal.

Four times to wave

The four waves are to ensure that both sides of the connection are completely closed. The specific process of four waves is shown as follows:

First wave:

The client sends a FIN-WaIT-1 message to the server. The FIN segment contains a K identifier and the ACK packet sent last time, and the client enters the FIN-WaIT-1 state. The server received the message. Procedure

Second wave:

The server responds to the ACK packet. ACK is the K flag + 1, indicating that the server received the latest FIN message from the client. The server enters the close-wait state. Procedure After receiving the message, the client enters the FIN-WaIT-2 state and is about to shut down.

Third wave:

The server sends the end FIN segment message to the client. The FIN segment contains an L identifier and the ACK packet sent LAST time, indicating that the FIN segment is the follow-up closure message of the LAST time. The server then enters the last-ACK state. The client received the FIN message. Procedure

Fourth wave:

After receiving the FIN message from the server, the client sends an ACK packet to the server. The ACK is the L flag + 1 to inform the server that the client has been shut down. The server can also shut down. In this case, the client will wait 2MSL before actually shutting down. The server closes after receiving the FIN message from the client.

Why does the client need to wait 2MSL to shut down after sending the last ACK message when waving four times?

MSL (Full name: Max Segment Lifetime) RFC 793 defines MSL as 2 minutes. In practical applications, MSL is 30 seconds, 1 minute, and 2 minutes.

After sending the last ACK packet, the client enters the time-wait state and waits for 2MSL.

Ensure that TCP full-duplex connections can be closed reliably.

The client sends the last ACK packet. If packet loss occurs, the server requests retransmission. If the client closes the ACK packet immediately after sending the LAST ACK packet, the client cannot receive the retransmission request from the server. As a result, the server remains in the last-ACK state and cannot be closed.
Ensure that the repeated data segment of the TCP connection disappears from the network segment.

If the client shuts down immediately after sending the last ACK packet, a new TCP connection will be established on the same server and port. If the last ACK packet is lost, the server requests retransmission, and the new TCP receives the retransmission request, affecting the new session.

Why does a large number of close-wait occur when the netstat-nat command is used to check the TCP connection status of the server?

In this case, the number of concurrent requests from the client exceeds the limit of the server. As a result, the server cannot send the FIN end message to the client in time and stays in close-wait state.

TCP flow control

What is flow control? Why flow control?

The network rates and processing capabilities of the communication parties are different. If the communication party with superior capability sends a message too fast, the communication party with inferior capability cannot respond in time. In serious cases, the service of the receiving party may crash. Therefore, it is necessary to set up an intermediate buffer to control the flow. The message is put into the buffer first. If the buffer is full, the reply sender stops sending the message.

The implementation principle of TCP traffic control

By the picture above you can see, the first in the buffer window to spare, the sender and the receiver to normal communication, every communication, besides should return a message, the receiver also returns the remainder of the buffer window size to the sender, the sender number to decide whether to send the message by judging the rest of the window, at the same time, the receiving party get messages from the buffer. When the sender receives that the remaining window is 0, the sender stops sending and sends a request periodically to detect whether there are any remaining Windows in the buffer. After detecting the existence of remaining Windows in the buffer, the two parties resume normal communication.

TCP congestion control

What is congestion control? Why do we need congestion control?

Congestion control is different from traffic control. Traffic control resolves the mismatch between the rates of the sender and the receiver, whereas congestion control adjusts the network load to prevent network resources from being exhausted. When the network resources of the receiver are busy, the sender does not respond to the sender in time and retransmits data. As a result, the network becomes more congested, resulting in an avalanche effect.

The implementation principle of TCP congestion control

There is a state variable in congestion control called congestion window (CWND), which refers to the maximum number of packets that can be sent by a source data flow in an RTT.

In order to detect the congestion degree of the current network, the sending end starts to send CWND of 1 message segment to the receiving end, and then exponentially increases CWND through the slow start algorithm (in the figure, the CWND is increased to include 16 message segments through the slow start algorithm). In order to prevent CWND from growing too fast, A slow start threshold ssTHRESH (16 in the figure) is also set. When SSTHRESH is reached, the congestion avoidance algorithm is entered, and the CWND is increased by 1 until the critical value of network congestion (20 in the figure). The critical value of network congestion is determined according to the current network condition and is a dynamic value. When the threshold is reached, the congestion control window is adjusted to the initial value 1 to restart the entire congestion control algorithm cycle. At this time, a slow start threshold (SSthresh) is calculated, which is generally half of the threshold for network congestion in the previous period.

The above is the basic flow of congestion control, but the problem is also obvious. When a congestion control cycle reaches the critical value of network congestion, if CWND is adjusted to the initial value 1, the transmission rate will show a cliff drop, and such a large fluctuation is obviously unreasonable. Therefore, after the TCP Reno version, The congestion control is optimized by fast retransmission + fast recovery algorithm.

At the beginning of the first cycle, the slow start algorithm and congestion avoidance algorithm are also used to increase CWND until the critical value of network congestion is reached, and the sender receives three consecutive ACK acknowledgments. At this point, the sender will not wait for the expiration of the retransmission timer, but will execute the fast retransmission algorithm. At this time, the new threshold SSTHRESH is set to half of the current threshold for network congestion. When the fast retransmission algorithm is started, the CWND is set to the new threshold SSTHRESH + 3. After the data is retransmitted immediately, the fast recovery state is entered.

If the sender does not receive three repeated ACK messages due to network congestion or other conditions, the sender will still adjust the CWND to the initial value 1 after the retransmission timer expires and enter the slow start algorithm, and the fast retransmission + fast recovery algorithm will not be enabled.

TCP Packet sticking/unpacking

The following is a simple TCP connection data flow model.

As shown in the figure, the client application layer sends a request packet (occupying one or more data bytes depending on the size of the packet) to the TCP send buffer, which then transmits the data bytes in the buffer to the TCP receive buffer over the TCP connection. The TCP receive buffer sends the data bytes to the server application layer.

Stick package

The client application layer sends an A request packet once, occupying A data byte 15 into the send buffer. The back-end client application layer sends A request packet of B, occupying two data bytes 16 and 17 to the sending buffer. The sending buffer will glue the request packet of A (data bytes 15) and the request packet of B (data bytes 16 and 17) according to the size of the current buffer and some rules. After the packet is pasted, a new packet is formed and shares the same TCP packet header for transmission.

Generally speaking, sticky packets are two different TCP request packets into a new TCP request packet.

unpacking

If the client application layer sends a C request packet, it occupies four data bytes 9,10, 11, and 12. However, it is divided into two new packets 9,10, 11, and 12 for transmission over the TCP connection.

Generally speaking, unpacking is to split one TCP request packet into two TCP request packets for request.

As can be seen from the above process, the real control of SENDING and receiving TCP requests is not the application layer written by general developers, but the TCP buffer, which is controlled by the Linux kernel.

Why does sticking/unpacking occur

Unpacking occurs when the application writes more data than the Socket buffer size. If the data written by the application is smaller than the Socket buffer size, sticky packets may occur.
If the length of packets written by the application program is greater than MSS (maximum packet length), TCP packet fragmentation with MSS size is performed.
If the receiver does not receive buffer data in a timely manner, packet stickiness may occur.

How to handle sticking/unpacking

Because the transport layer is not aware of services, the packet sticking/unpacking process can only be handled through protocols formulated at the receiving and receiving ends of the service layer. Data can be assembled and disassembled according to rules specified in the protocols. Common protocol processing is:

Use a protocol with a message header. The length of the requested packet is recorded in the protocol header. The packet length information in the protocol header is read first, and then the corresponding length data of the packet content is read according to the length information.
Each time a fixed-length message is transmitted. Each time the packet is sent and read with a fixed length. When the length of the packet is less than the specified length, the remaining length is supplemented by a specific placeholder.
Separate packets with a specific delimiter. A specific delimiter is added to the end of each complete request packet.