This paper introduces several classical TCP congestion control algorithms, including their principles and applicable scenarios.

Review the previous article: Talk about REDis delays

preface

TCP congestion control through maintaining a congestion window, congestion control principle is, as long as does not appear in the network congestion, the value of the congestion window can increase more, in order to get more data packets sent, but as long as the network congestion, the value of the congestion window should reduce some, to cut into the number of packets in a network.

During the development of TCP congestion control algorithm, several different ideas appeared as follows:

  • Congestion control based on packet loss: regard packet loss as congestion, adopt slow detection method, gradually increase the congestion window, when packet loss occurs, reduce the congestion window, such as Reno, Cubic, etc.

  • Delay based congestion control: the delay increase is regarded as congestion. When the delay increases, the congestion window increases, and when the delay decreases, the congestion window decreases, such as Vegas and FastTCP.

  • Congestion control based on link capacity: Measure network bandwidth and delay in real time. Congestion occurs when the total number of packets on the network is greater than the product of bandwidth and delay, such as BBR.

  • Learning-based congestion control: There is no specific congestion signal, but with the help of evaluation function, based on training data, using machine learning methods to form a control strategy, such as Remy.

The core of congestion control algorithm is to choose an effective strategy to control the change of congestion window. Several classical congestion control algorithms are introduced below.

Vegas

Vegas[1] took the increase of delay RTT as a signal of network congestion. When RTT increased, the congestion window decreased; when RTT decreased, the congestion window increased. Specifically, Vegas adjusts the size of the congestion window by comparing actual throughput with expected throughput.

Expected throughput: Expected = CWND/BaseRTT,

Actual throughput: Actual = CWND/RTT

Diff = (Expected-Actual) * BaseRTT,

BaseRTT is the minimum round-trip response time of all observations, generally the RTT of the first packet sent after the connection is established, and CWND is the size of the current congestion window. Vegas defined two thresholds a and b. When DIff > b, the congestion window decreased; when A <= DIff <= B, the congestion window remained unchanged; when DIff < A, the congestion window increased.

Vegas algorithm uses the change of RTT to judge the available bandwidth of the network, which can accurately measure the available bandwidth of the network with good efficiency. However, when Vegas and other algorithms coexist in the network, the congestion control algorithm based on packet loss will try to fill the buffer in the network, resulting in the increase of RTT calculated by Vegas, which reduces the congestion window and slows down the transmission speed. Therefore, Vegas has not been widely adopted on the Internet.

Applicable scenarios:

It is applicable to the situation where there is only Vegas congestion control algorithm in the network and competition is fair.

Reno

Reno[2] divided the process of congestion control into four stages: slow start, congestion avoidance, fast retransmission and fast recovery, which are the basis of many existing congestion control algorithms. These stages are described in detail below.

In the slow start stage, when no packet loss occurs, the size of the congestion window is increased by one (in MSS, the maximum length of a single packet segment) for each ACK received, and the sending window is doubled in each turn with exponential growth. If packet loss occurs, the congestion window is halved to enter the congestion avoidance stage. When the window reaches the slow start threshold or packet loss occurs, it enters the congestion avoidance stage, and the window increases by one in each turn, showing a linear growth. When receiving three duplicate ACKS for a packet, the device considers that the next packet of the packet is lost and enters the fast retransmission phase. The device retransmits the lost packet immediately instead of waiting for timeout retransmission. After the fast retransmission is complete, the system enters the fast recovery phase, changes the slow start threshold to half of the current congestion window value, and the congestion window value is equal to the slow start threshold. Then, the system enters the congestion avoidance phase and repeats the appeal process. The Reno congestion control process is shown in Figure 1.

FIG. 1 TCP Reno congestion control process

Reno algorithm takes the received ACK signal as the basis for the increase of congestion window, which can play a good role in the early low-bandwidth and low-delay network. However, with the increase of network bandwidth and delay, the disadvantages of Reno are gradually reflected. The sender experiences an RTT from sending packets to receiving ACK. In a High bandwidth-delay Product (BDP) network, the RTT is large, resulting in slow congestion window growth and a long time for the transmission speed to reach the maximum Bandwidth, resulting in low Bandwidth utilization.

Applicable scenarios:

Suitable for low latency and low bandwidth networks.

Cubic

Cubic[3] is the default TCP congestion control algorithm for Linux kernel 2.6 and uses a Cubic function.

As the growth function of congestion window, where C is the regulating factor, t is the time that the congestion window passed last time, Wmax is the window size when congestion occurred last time,

Beta is the multiplication decreasing factor. It can be seen from the function that the growth of congestion window is no longer related to RTT, but only depends on the maximum window when the last congestion occurred and the time interval between the last congestion occurred.

The growth curve of Cubic congestion window is as follows: the convex curve part is the stable growth stage, and the concave curve part is the maximum bandwidth detection stage. As shown in Figure 2, at the beginning, the congestion window grows rapidly, but when it approaches the Wmax port, the growth rate becomes gentle to avoid packet loss caused by sudden increase in traffic. [Fixed] Congestion Windows no longer increase near Wmax After that, the maximum throughput of the network is slowly detected to ensure stability (congestion is easy to occur near Wmax). When it is far away from Wmax, the window growth rate is increased to ensure bandwidth utilization.

FIG. 2. Congestion window growth function for TCP Cubic

When packet loss occurs, the congestion window is multiplied to reduce, and then the above growth process continues. This method can keep the congestion window close to Wmax and ensure the utilization of bandwidth. The congestion control process of Cubic is shown in Figure 3:

FIG. 3. Congestion control process of TCP Cubic

The advantage of Cubic algorithm is that as long as there is no packet loss, it will not actively reduce its own transmission speed, can maximize the use of network bandwidth, improve the throughput, and can play a better performance in the network with high bandwidth and low packet loss rate.

Cubic, however, the same as before the congestion control algorithm, can’t distinguish congestion packet loss and transmission error packet loss, as long as find lost package, can reduce the congestion window, to reduce the transmission speed, and in fact transmission error packet loss when network is not necessarily the congestion, but the transmission error packet loss probability is very low, so the impact on the performance of the Cubic algorithm is not very big.

Another shortcoming of the Cubic algorithm is that it is too aggressive, constantly increasing the size of the congestion window in the absence of packet loss, injecting traffic into the network, filling up the buffers of network devices, and causing Bufferbloat. As the buffer tends to be saturated for a long time, packets entering the network will queue up in the buffer, adding unnecessary queuing delay. The larger the buffer, the higher the delay. In addition, Cubic algorithm increases the congestion window at the same time with high bandwidth utilization, which indirectly increases the packet loss rate and aggravates network jitter.

Applicable scenarios:

It is applicable to networks with high bandwidth and low packet loss rate and can effectively utilize bandwidth.

BBR

BBR[4] is a new congestion control algorithm proposed by Google in 2016. It has been deployed on Youtube servers and Google’s wide area network across data centers. According to Youtube official data, the latency of accessing Youtube worldwide has been reduced by 53% after BBR is deployed. In developing countries with high latency, the delay was reduced by 80%. BBR has been integrated into Linux kernel versions 4.9 and above.

BBR algorithm does not regard packet loss or delay increase as Congestion signal, but considers that Congestion occurs when the total number of packets on the network is greater than the product of bottleneck link bandwidth and delay. Therefore, BBR is also called congestion-based Congestion Control algorithm. BBR algorithm periodically detects the network capacity, alternately measures the maximum bandwidth and minimum delay within a period of time, and takes their product as the size of congestion window (the reason for the alternating measurement is that the maximum bandwidth and minimum delay cannot be obtained at the same time. When the maximum bandwidth is filled up and causes queuing, the delay must be extremely large. When the delay is extremely small, packets are not queued and directly forwarded, so the bandwidth must be extremely small), so that the value at the beginning of the congestion window is always consistent with the network capacity.

Because the congestion window of BBR is accurately measured, it will not increase the congestion window infinitely and will not fill the buffer of the network device, avoiding the Bufferbloat problem and greatly reducing the delay. As shown in Figure 4, when the network buffer is filled, the delay is extended to 250ms. Cubic algorithm will continue to increase the congestion window, making the delay increase to 500ms and packet loss. Cubic has been in a high delay state during the whole process, while BBR has been in a low delay state because the network buffer is not filled.

FIG. 4 comparison of Cubic and BBR RTT

As BBR algorithm does not regard packet loss as congestion signal, BBR still has a very high throughput in a network with a high packet loss rate. As shown in Figure 5, under the network environment of 1% packet loss rate, the throughput of Cubic has been reduced by more than 90%, while the throughput of BBR is almost not affected. BBR throughput drops significantly when the packet loss rate is greater than 15%.

FIG. 5 comparison of transmission rate and packet loss rate between Cubic and BBR

BBR algorithm is feedback driven, has an independent adjustment mechanism, is not controlled by the TCP congestion control state machine, through the measurement of network capacity to adjust the congestion window, the transmission rate is controlled by itself, while the traditional congestion control algorithm is only responsible for calculating the congestion window, regardless of the transmission rate (pacing rate), how to send is determined by TCP itself. In this case, packet queuing or packet loss may occur due to a surge in the sending rate near the bottleneck bandwidth.

Through the test, in the environment of high delay and high packet loss rate, BBR has a great improvement in transmission speed compared with Cubic algorithm. The specific test results are shown in Table 1:

Table 1 Comparison of transmission speed between Cubic and BBR under 200ms delay

The disadvantage of BBR algorithm is that when the device queue cache is large, BBR may be unable to compete with the more aggressive algorithms such as Cubic, because BBR does not take the initiative to occupy the queue cache. If Cubic traffic occupies the queue cache for a long time, the minimum RTT measured by BBR will increase in multiple cycles. Thus, the bandwidth of BBR is reduced.

Applicable scenarios:

It is suitable for long fat networks with high bandwidth, high delay and certain packet loss rate, which can effectively reduce transmission delay and ensure high throughput.

Remy

Remy[5], also known as computer-generated congestion-control algorithm, uses machine learning to generate congestion control algorithm model. Through the model input parameters (such as the bottleneck link rate, time delay, the number of the sender on bottleneck link, etc.), using a quantitative evaluating algorithm of objective function, in the process of generation algorithm, different network status according to the different way to adjust the congestion window, adjustment methods to modify repeatedly, until the optimum objective function, In the end, a mapping table between network status and adjustment mode is generated. In the real network, the adjustment mode of congestion window is directly selected from the mapping table according to the specific network environment.

Remy tries to screen out the differences in the underlying network environment and uses a general congestion control algorithm model to deal with different network environments. This way is dependent on input training set network model (history), if the training set can comprehensively cover all possible network environment and the congestion control algorithms, Remy algorithm when applied to the real network environment performance is very good, but if the real network and training network differences, Remy algorithm performance is poor.

Application scenario: In a complex heterogeneous network, the computer is expected to automatically select an appropriate congestion control mode for different network scenarios. The existing network model is required to cover all possible scenarios.

conclusion

Each congestion control algorithm is born in a certain network environment, suitable for a specific scene, there is no one algorithm. As the network environment becomes more and more complex, the congestion control algorithm is constantly evolving. This paper is not to choose a best algorithm, just a simple introduction of several typical algorithm design ideas, advantages and disadvantages and applicable scenarios, hoping to give you some inspiration.

The reference papers

[1] S.O. L. Brakmo and L. Peterson. TCP Vegas: New techniques for congestiondetection and avoidance. In SIGCOMM, 1994. Proceedings.  1994 InternationalConference on. ACM, 1994.

[2] V.Jacobson, “Congestion Avoidance and Control,” In ACM SIGCOMM ComputerCommunication Review, vol. 18. ACM, 1988, pp. 314 — 329.

[3] L. X. I. R. Sangtae Ha. Cubic: A new TCP -friendlyhigh-speed TCP variant. In SIGOPS-OSR, July 2008. ACM, 2008.

[4] C.S. G. S. H. Y. Neal Cardwell, Yuchung Cheng and V. Jacobson. BBR:congestion-based congestion control. ACM Queue, 14 (5) : 20 {53, 2016.

[5] K.Winstein and H. Balakrishnan. TCP Ex Machina: Computer-generated Congestion Control.In Proceedings of the ACM SIGCOMM 2013  Conference, 2013.