Why is retransmission needed?
TCP provides reliable services by ensuring that the data is received by the recipient. If a period of time (timeout) passes and the recipient does not receive any confirmed packets, the TCP retransmits packets to ensure that all data can be received
When timeout occurs, the same segment may not be retransmitted. A larger segment can be regrouped to send, as long as it does not exceed the MSS declared by the receiver
Why calculate the timeout dynamically?
Network traffic and routers may change during packet transmission, so the Round Trip Time (RTT) will also change. If the timeout Time remains unchanged, if the RTT becomes larger, ACK packets may be re-sent while on the way to be re-sent, resulting in unnecessary waste
How to dynamically calculate timeout retransmission time?
The classical TCP algorithm RTT is: R <- αR + (1-α)M, the retransmission time is RTO=Rβ
Where M stands for measurement time meaning the round-trip time between sending a byte with a serial number and receiving an acknowledgement containing the serial number
α is a smoothing factor with a recommended value of 0.9: The formula is slightly deformed into R<- (1-g)R + gM and recombined with R< -r +g(m-r). Assuming that R is the estimation of the next measurement result, then M-r represents the deviation between the estimated result and the real result. What the whole formula shows is the estimated result plus part of the estimation deviation, which is the new estimated result. The deviation may include two parts: 1 is noise, which has a certain randomness and is represented by Er; 2 is error in estimation. There is a problem with the initial value of R, which is represented by Ee, then R< -r +g*Er+g*Ee. For Ee, its goal is to get the estimated result closer to the correct direction, while Er has a certain randomness. After multiple samples, the results may eventually cancel each other. Therefore, it is expected that the factor of Ee should be larger and the factor of Er smaller, so that the final value of R can converge to the correct average value, while for Er, no matter what the value of G is, the result will move in a good direction. Therefore, a good value would be g between 0.1 and 0.2, that is, α would be 0.9 and 0.8
R is the average of the estimated RTT
RTO stands for Retransmission Timeout meaning that if an ACK has not been received after this time, it will be resend
β is the variation coefficient of RTT. When the transmission time is negligible, the maximum delay and average delay change the most, which can be regarded as all delays are caused by processing. At this time, the maximum value is twice the average value, and β is recommended to be 2. [Assuming that the maximum round-trip time is R, if the transmission delay is ignored, then the average transmission delay of the two changes is 0.5R, that is, the maximum value is twice the average value] See chapter 5 and jacobson algorithm at https://tools.ietf.org/html/rfc813 http://www.cs.binghamton.edu/~nael/cs428-528/deeper/jacobson-congestion.pdf
This measurement method does not take into account that when THE RTT varies widely, the classical RTO cannot keep up with the change, resulting in unnecessary retransmission. At this point, the network is already saturated, and retransmission will increase the network load
According to Jacobson’s algorithm, the value of β is 2, and the load at this time is 30% at most, which is far from handling the real situation
Another unsolved problem is that if a packet is sent, when a timeout occurs, the packet is retransmitted with a longer RTO, and then an acknowledgement is received, is the received ACK for the first or the second packet? The solution to this scenario is the Karn algorithm. The main idea is that in case of timeout and retransmission, the RTT estimate cannot be updated until the confirmation of retransmission data finally arrives
What is the current TCP implementation method for estimating timeout times?
With jacobson’s algorithm, RTO depends on the deviation between the smoothed RTT and the smoothed mean rather than a constant multiple of the mean
The implementation code https://elixir.bootlin.com/linux/v2.6.32/ident/tcp_rtt_estimator
How do I prevent groups from being discarded?
Use the congestion avoidance algorithm, which assumes that packet loss is due to network congestion. The sender uses two variables for congestion control, one is congestion window CWND and the other is slow start threshold SSTHRESH. When CWND is less than or equal to SSTHRESH, slow start is used, otherwise, congestion avoidance algorithm is used. The principles are as follows:
- The bytes sent by the sender are less than or equal to the minimum size of the CWND and the recipient notification window
- When a timeout occurs, that is, no ACK is received when the timeout timer overflows, SSTHRESH is set to half the size of the current window, and CWND is set to 1 message segment
- When a new ACK is received, if CWND <= SSTHRESH, slow start is performed and CWND value is increased by 1, otherwise congestion control is performed and CWND is increased by 1/ CWND
Increasing the CWND value by 1 causes the window to grow exponentially, for example, if it starts at 1, then it sends two acks next time, then it receives two Acks, CWND immediately grows to 4, and so on
Increasing the CWND value by 1/ CWND is an additive increase, increasing by 1 for each CWND packet received
What should I do after receiving repeated ACK?
After receiving a duplicate ACK, it cannot be confirmed whether the packet is lost or the packet segment is reordered. Therefore, it waits for a small number of duplicate ACKS, usually three or more. If three or more consecutive ack packets are received, the device determines that the packets may be lost and retransmits the packets immediately without waiting for the timeout timer to overflow. This method is called the fast retransmission algorithm. Then the fast recovery algorithm is executed, and the whole process of the two is as follows:
- After receiving three duplicate ACKS, set the value of SSthRESH to half of the current congestion window. Retransmitting lost segments, set CWND to SSTHRESH plus 3 times the segment size.
- Each time it receives another duplicate ACK, the CWND increases the packet segment size by one, and if the new CWND allows it to send, it sends one packet
- When confirming the arrival of a new ACK, set CWND to the SSthRESH size set in step 1.
This new ACK should confirm all the packets sent from the moment the packets are lost in the first step to the packets resent in the first step, including the packets resent in the first step.
Slow start is not performed after repeated ACKS are received (that is, CWND is set to 1), because repeated ACKS only indicate that data is lost on the network. For the receiver, repeated ACK packets are generated only when the receiver receives another packet segment. However, the packet has left the network and entered the cache of the receiver. This indicates that data is still flowing between the receiver and the receiver and there is no need to perform slow start to reduce the data flow suddenly. See tools.ietf.org/html/rfc200… Chapter 4
The whole process is illustrated below
How do I initialize the metrics involved in the above process?
In newer TCP implementations, there is a routing table to maintain metrics, including smoothed RTT, smoothed mean deviation, and slow start threshold. When a TCP connection is closed, it is stored if 16 Windows of data have been sent (which is sufficient) and the routing table of the destination node is not the default table direction. The connection is (partially active or passive) initialized as long as there is a value in the routing table
How does TCP handle ICMP errors returned by a given connection?
Common TCP ICMP errors include source suppression, host unreachable, and network unreachable
- Both host unreachable and network unreachable are virtually ignored as they are considered transient phenomena (possible scenarios include router replacement taking several minutes to recover). The TCP connection is not closed, but sends data that causes errors
- Source station suppression causes the CWND to be set to a packet segment size, which initiates a slow start. However, the SSthRESH of a slow start does not change
The source station inhibits routing or the host receives data faster than it processes it
The appendix
Read the book (Chapter 21 of TCP/IP)