TCP has done so much to ensure reliable transmission

This article was uploaded to CS-Wiki (Gitee recommended project, 0.8k star). Star ~ is welcome

The foreword 0.

This section is a bit much, but about TCP, in addition to three or four handshake is reliable transmission, high frequency key points, we still clear up better ~

1. Overview of TCP Reliable transmission

First, to explain what reliable transport is: reliable transport guarantees that the byte stream received by the receiver is exactly the same as the byte stream sent by the sender.

The network layer has no reliable transport mechanism and does its best to deliver. The transport layer uses TCP to achieve reliable transmission. TCP ensures reliable transmission through the following mechanisms:

1) Checksum (Checksum)
2) Serial number and acknowledgement response mechanism (important)
3) Retransmission mechanism (important)
4) Flow control (sliding window protocol) (very important)
5) Congestion control (important)

In addition to the above verification and you can only make a little understanding of the other is very important, be sure to know in mind.

2. The checksum

TCP Checksum: The sender calculates the Checksum of the TCP packet segment to be sent, and the receiver verifies the Checksum of the received TCP packet segment. The purpose is to discover whether the TCP header and data have changed between the sender and the receiver. If the receiver detects an error in the checksum, the TCP packet segment is discarded.

How the checksum is calculated and verified is not the key knowledge of high frequency. This article will not explain in detail. Interested children can search for their own

TCP computes the checksum with a 12-byte dummy header.

In fact, UDP also has checksum mechanism, but is optional, and TCP checksum is required, TCP and UDP in the calculation of checksum need to add a 12 byte false header.

The following describes the concept of the pseudo header. The data of the pseudo header is obtained from the IP data header and contains 12 bytes, including the source IP address, destination IP address, reserved bytes (0), transport layer protocol number (TCP is 6), and TCP packet length (header + data) :

The false header is used to increase the error detection capability of the TCP checksum. For example, check whether the TCP packet is sent to me according to the destination IP address and check whether the transport layer protocol is selected correctly according to the transport layer protocol number…… Dummy headers are only used for verification purposes.

3. Serial number and confirmation response mechanism

The header of a TCP packet contains an ordinal field, which refers to the ordinal number of the first byte of the packet.

After receiving a TCP packet segment, the recipient sends an acknowledgement message:

Confirm that the reply mechanism and the retransmission mechanism are closely related. Let’s take a closer look at the retransmission mechanism

4. Retransmission mechanism

On a complex network, packets may not be transmitted smoothly as shown in the figure above, and packets may be lost. Packet loss can be caused by multiple factors, including application failure, routing device overload, or temporary service breakdown. The packet level speed is very high, and packet loss is usually temporary. Therefore, it is very important for TCP to detect and recover lost packets.

The retransmission mechanism is the most basic error recovery function of TCP. Common retransmission mechanisms are as follows:

Timeout retransmission
The fast retransmission

① Retransmission timed out

Probably when it comes to retransmission, the first thing that comes to mind is timeout retransmission. Timeout retransmission means that the TCP sender sets a timer when sending packets. If the sender does not receive an ACK packet from the receiver within the specified period of time, the sender retransmits the sent packet.

In the case that the sender does not correctly receive the ACK acknowledgement packet from the receiver, timeout retransmission occurs in the following two cases:

Case 1: The packet segment is lost
In the second case, the ACK packet from the receiving end is lost

The Timeout Retransmission time is usually expressed as RTO (Retransmission Timeout). What is the most appropriate RTO, that is, how long is the best time for Retransmission?

Round-trip Time (RTT) is the Time it takes for data to travel from one end of the network to the other. It is the round-trip Time of a packet segment.

Obviously, the RTO value of timeout retransmission time should be slightly larger than the RTT value of the round-trip packet:

We can imagine what would happen if the timeout retransmission time RTO was much greater than or less than RTT:

RTO is much larger than RTT: The idle time of the network increases, reducing the transmission efficiency of the network
RTO less than RTT: unnecessary retransmission, resulting in increased network load

What if the retransmitted data times out again? TCP’s policy is to double the timeout interval for retransmission.

That is, for each timeout retransmission, the timeout interval for the next retransmission is set to twice the previous value.

The problem with timeout triggered retransmission is that the timeout period can be relatively long. Is there a mechanism to reduce the wait time for timeout retransmissions? Hence the “fast retransmission” mechanism came into being

② Fast retransmission

The Fast Retransmit mechanism is data-driven rather than time-driven.

The principle of the fast retransmission mechanism is as follows: When the receiver receives an out-of-order packet segment that is larger than the expected sequence number, it sends a redundant ACK to the sender, indicating the sequence number of the next expected byte.

For example, the sender has sent 1, 2, 3, 4, 5 message segments

The recipient receives an ACK message of segment 1 (the acknowledgment number is the first byte of segment 2).
The recipient still returns an ACK message of 1 after receiving segment 3 (the acknowledgment number is the first byte of segment 2).
The recipient still returns an ACK message of 1 after receiving segment 4 (the acknowledgment number is the first byte of segment 2).
The recipient still returns an ACK message of 1 after receiving segment 5 (the acknowledgment number is the first byte of segment 2).
After receiving three redundant ACKS for packet segment 1, the receiver considers that packet segment 2 is lost and retransmits packet segment 2
Finally, the receiver receives segment 2. Since segments 3, 4, and 5 have all been received, it returns an ACK message of 6 (the acknowledgment number is the first byte of segment 6).

A picture is worth a thousand words:

4. Sliding window protocol

Not knowing the sliding window protocol equals not knowing TCP. The weight of the weight of this knowledge point, we must grasp well.

① Cumulative confirmation

In the end, the receiver received segment 2, and then returned an ACK message of 6 because segments 3, 4, and 5 were received.

An acknowledgement is sent for each segment of a TCP packet. The next segment of a TCP packet can only be sent after receiving an acknowledgement from the previous segment. In this mode, we should first return an acknowledgement of segment 3.

In fact, the mode of sending the next packet segment only after receiving the acknowledgement from the previous packet segment is inefficient. The longer the round-trip time of each packet segment, the lower the throughput of the network and the lower the communication efficiency.

For example: If you finish a sentence and I’m busy with something else and I don’t get back to you in time, you’ll have to wait for me to get back to you before you can say the next sentence, which obviously isn’t realistic.

To this end, TCP introduced the concept of Windows. The window size is the maximum value at which data can continue to be sent without waiting for an acknowledgement.

The implementation of the window is actually a buffer created by the operating system in which the sender must hold the sent data until it waits for the acknowledgement reply message to return. If an acknowledgement reply is received within a specified time interval, the data can be purged from the buffer.

If the window size is three TCP segments, the sender can continuously send three TCP segments. Even if an ACK response packet is lost during the process, the sender can use the Next acknowledgement reply to confirm the packet.

If ACK 300 is lost, data is not retransmitted and can be confirmed by the next acknowledgement reply. As long as the sender receives an ACK 400 acknowledgement, it means that all data “recipients” prior to 400 have been received. This pattern is called cumulative acknowledgement or cumulative response.

② The sender’s sliding window

The pictures in this section are from the official account xiaolin Coding

Let’s take a look at the sender’s window first. The following figure shows the data cached by the sender, which is divided into four parts according to the processing situation:

Data was sent and received for an ACK acknowledgement
Data that was sent but did not receive an ACK acknowledgement
Data that is not sent but whose total size is within the range of processing by the recipient
Unsent data whose total size exceeds the receiver’s processing range

When the sender sends all the data, the size of the available window is 0, indicating that the available window is exhausted and data cannot be sent until an ACK is received:

If the size of the sending window does not change, the sliding window moves 5 bytes to the right, because 5 bytes of data are acknowledged, and the next 52 to 56 bytes become available again. The following five bytes of data can be sent:

③ The sliding window of the receiver

The receiver’s sliding window can be divided into three parts:

Data that has been successfully received and acknowledged
Data that is not received but can be received
Data not received and cannot be received (beyond the size of the receiver window)

Similarly, the recipient’s sliding window moves right after successfully receiving and confirming data.

5. Flow control

Imagine this scenario: Host A continues to send data to host B, regardless of host B’s receiving capability. As A result, host B’s receive buffer is full and can no longer receive data. As A result, A large number of data packets are lost, leading to A retransmission mechanism. In the process of retransmission, if the receiving buffer condition of host B is not improved, a large amount of time will be wasted in retransmission, reducing the data transmission efficiency.

Therefore, A flow control mechanism is introduced. Host B can control the amount of data sent by telling host A the size of its receiving buffer. To sum up: The so-called traffic control is to control the sending rate of the sender to ensure that the receiver can receive in time.

TCP realizes flow control mainly through sliding window protocol.

We mentioned the sliding Window size above, but did not say where to set the Window size. In fact, this is related to the Window size field in the TCP header. The header of a TCP packet contains a 16-bit Window size field:

The value of this field refers to the remaining size of the receiving buffer, so that the sender can send data according to the processing capacity of the receiver without causing the receiver to be overwhelmed.

Therefore, the window size is usually determined by the receiver.

The receiving end fills in the size of its immediate window (RWND) when sending an ACK acknowledgement packet and sends it with the ACK packet. The sender changes the sending speed according to the size of the window in the received ACK packet. If a value of 0 is received for the window size, the sender will stop sending data. And periodically sent to the receiver window detection data section, remind the receiver to tell the size of the window to the sender.

A picture wins the preface:

6. Congestion control

The pictures in this section are from the official account xiaolin Coding

Congestion means that in a certain period of time, the demand for a certain resource on the network exceeds the available part of the resource (that is, the demand exceeds the supply), and the network performance deteriorates.

If the network is congested, a large number of TCP packets may be lost. In this case, the retransmission mechanism is triggered in large numbers. As a result, the network congestion becomes higher and transmission is seriously affected.

If the sender does not receive an ACK packet within a specified period of time, that is, the retransmission mechanism is triggered, the network is considered congested.

Therefore, when congestion occurs, the rate of the sender should be controlled. This is similar to flow control, but from a different point of view.

Traffic control is to allow receivers to receive data in time, and congestion control is to reduce the congestion of the entire network and prevent excessive data from being injected into the network.

In order to adjust the amount of data to be sent by the sender, the concept of congestion window CWND is defined. Congestion window is a state variable maintained by the sender, which changes dynamically according to the congestion degree of the network:

Whenever there’s congestion in the network,cwndWill reduce
If the network is not congested,cwndWill increase

Before the introduction of the concept of congested Windows, the size of the sending window and the size of the receiving window were basically equal (depending on the size of the receiving window). After the congestion window is introduced, the size of the send window is equal to the minimum value of the congestion window and the receive window.

TCP congestion control adopts four algorithms:

Slow start
Congestion avoidance
Fast retransmission
Fast recovery

The four algorithms are described in detail

(1) the slow start

The slow start idea is that TCP immediately after establishing a connection, if a large number of data bytes are injected into the network, then it is likely to cause network congestion. A good approach is to probe first and increase the number of packets sent bit by bit, that is, gradually increase the value of the congestion window from small to large. The initial value of CWND is 1, and the CWND doubles (exponentially increases) with each transmission cycle.

Of course, slow start cannot always be performed, here is a slow start wheel limit ssthRESH state variable:

whencwnd < ssthresh, continue to use the slow start algorithm
whencwnd >= ssthresh, start using the Congestion Avoidance algorithm

② Congestion avoidance

The idea of congestion avoidance algorithm is to make the congestion window CWND increase slowly, that is, the CWND increases by 1 for each round trip time.

Note that the values of sSHRESH and CWND of congestion window size will change (multiplication reduced) whenever network congestion occurs, whether in slow start or congestion avoidance:

ssthreshSet tocwnd/2
cwndReset to1

Since the congestion window size is reset to 1, the slow start algorithm is restarted.

Fast retransmission and fast recovery

Fast retransmission and fast recovery algorithms are usually used together.

When the trigger fast retransmission mechanism, that is, the receiver receive three repetitive ACK, will perform fast retransmission algorithm (trigger fast retransmission mechanism and timeout retransmission mechanism is different, the TCP think trigger fast retransmission case is not serious, because most didn’t lost, lost only a small part), fast retransmission do are:

cwnd = cwnd/2
ssthresh = cwnd
Re-enter the congestion avoidance phase

The “fast recovery” algorithm was added after the “fast retransmission” algorithm. When three duplicate ACKS are received, TCP finally enters the fast recovery phase instead of the congestion avoidance phase.

The idea of fast recovery is the principle of “packet conservation”, that is, the number of packets in the network is constant at the same time. Only when the “old” packets leave the network, can a “new” packet be sent to the network. If the sender receives a repeated ACK, The ACK mechanism of TCP indicates that a packet has left the network, so CWND + 1. If this principle is strictly followed, congestion rarely occurs in the network. In fact, the purpose of congestion control is to correct the violation of this principle.

Specifically, the main steps of rapid recovery are:

thecwndSet tossthreshThe value of is increased by 3, and then the missing segment is retransmitted. The reason for the increase is that three duplicate ACKS are received, indicating that three “old” packets have left the network.
The window is congested when repeated ACKS are receivedcwndAdd 1
When receiving an ACK for a new packet, thecwndSet to step 1ssthreshThe value of the. The reason is that the ACK confirms the new data, indicating that the data from the repeated ACK has been received. The recovery process is over and the state before recovery can be returned to, that is, the congestion avoidance state again.

You can follow my official account at