This is the 8th day of my participation in Gwen Challenge
A TCP connection
Three-way handshake
How about the TCP three-way handshake?
- First handshake: The client sends packets with SYN flag = 1 and seq = X
- Second handshake: The server receives the packet, sees SYN = 1, and knows that the client wants to establish a TCP connection with the server. The server sends packets with SYN flag bit = 1, sequence number seq = Y, ACK flag bit = 1, ACK number = X + 1 (indicating that it hopes to receive x + 1 from the client next time)
- Third handshake: After receiving the packet, the client sees SYN = 1 and knows that the server wants to establish a TCP connection with the client. Then, the client checks whether ACK is 1 and X + 1. When the client sends a packet, the ACK flag is 1, the ACK number is Y + 1 (indicating that it expects to receive the packet from the server next time is Y + 1), and seq is x + 1
- The server receives the packet and confirms ACK = 1 and SEQ = X + 1. The TCP connection is established
Why three handshakes, not two or four?
To ensure that the sending and receiving capabilities of the two ends are normal, == Three handshakes are the minimum number of times required for the client and server to ensure that the sending and receiving capabilities of the two ends are normal. == :
- First handshake: The server confirms that the sending capability of the client is normal and that the receiving capability of the server is normal
- The second handshake: The client confirms that the sending capability of the server is normal and its receiving capability is normal. In addition, you can also confirm that your sending ability and the receiving ability of the server are normal, because if either capability is abnormal, the server cannot receive the first message sent by yourself and respond
- Third handshake: The server confirms that its sending capability and the client’s receiving capability are normal, because if either capability is abnormal, the client cannot receive its initial response and give a response
Therefore, if there is a two-handshake, the server cannot confirm its ability to send and the client’s ability to receive. If you shake hands four times, you can do it, but there is no need, because three times can ensure that both parties can send and receive normal.
Four times to wave
How does TCP wave four times?
At the beginning, both sides are in an established state
- First wave: The client sends a connection release packet (FIN=1.seq = U), indicating that it stops sending data and intends to close the TCP connection. It then enters the FIN_WAIT1 state.
- Second wave: The server responds with an acknowledgement packet (ACK = 1, ACK = U + 1, SEq = V), indicating that it has received the packet from the client. The CLOSE_WAIT state is then entered. The client receives the message and enters the FIN_WAIT2 state
- Third wave: In the second wave, the server only informs the client that it has received the packet. However, the server may not release the connection immediately because it has not finished sending data. After a period of time, the server sends a connection release packet (FIN = 1, SEq = W, ACK = 1, ACK = U + 1), indicating that it has stopped sending data and intends to passively close the TCP connection. Then enter the LAST_ACK state.
- Fourth wave: The client responds with an acknowledgement packet (ACK = 1, ACK = W + 1, SEq = U + 1), indicating that it has received the packet from the server. Then enter the TIME_WAIT state and wait 2MSL before entering the CLOSED state to completely close the TCP connection. After receiving the ACK, the server directly enters the CLOSED state and closes the TCP connection
Why does it take three handshakes and four waves?
In essence, this is because of the semi-closed mechanism of TCP: the client actively closes the TCP connection in a one-way manner, only saying that it no longer sends data to the server. However, unless the server also closes the TCP connection, the server can still send data to the client for receiving.
Three-way handshake is only need three times, because the service side, in the first response to the SYN and ACK and sent to the client, on the one hand to the client’s SYN do a confirmation, on the other hand do a sync, says he also wants to establish a TCP connection, = = pay attention to these two things completely can be done in a response at the same time, You don’t have to separate ==, so you save a handshake here; The reason why four waves are needed is that when the client unilaterally wants to disconnect, the server may not have finished sending data. In this way, the server can only send an ACK to confirm the FIN of the client, and then send a FIN to disconnect when all its data has been sent. == Note that these two things cannot be done at the same time in a response. Depending on the actual data transfer, they have to be separated, == so an additional wave is inevitable.
Why does it take a while for the client to close after it finally sends an ACK? Why is it 2MSL at this time?
In short: == Clients must be prepared to handle possible FIN timeout retransmissions ==.
It is important to be aware of one thing: the last ACK sent by the client can be lost, and we assume that it was:
- If the client finally sends an ACK, it closes directly. Since the ACK is lost, the server cannot receive the ACK, and will think that the FIN on its side has a problem, so it retransmits the TIMEOUT FIN. However, since the client is CLOSED, it cannot receive the TIMEOUT retransmission FIN, so it cannot send back an ACK, and the server cannot receive the ACK. Can’t close the connection;
- However, if the client finally sends an ACK, it will wait some time before closing. If the server fails to receive an ACK and retransmits the FIN due to timeout, it has a chance to reach the client. Then the client will resend the ACK and the server can automatically close the connection after receiving the ACK.
At this point, we know that the client must wait some time before closing after sending an ACK. But why is this time 2MSL?
MSL refers to the maximum lifetime of a message in the network and the maximum time it takes to get from one end to the other.
Two reasons:
-
It takes a maximum of 1MSL for the ACK sent by the client to reach the server, and a maximum of 1MSL for the FIN retransmitted by the server to reach the client. That is, if the server needs to retransmit the FIN due to timeout because it cannot receive the ACK, In other words, the client waits for 2MSL and does not receive a timeout retransmission. Therefore, it does not need to do anything but after the time is CLOSED. If you receive a timeout retransmission, it indicates that your ACK is faulty, so you need to resend the ACK and restart the timing of 2MSL.
In any case, the client must wait for 2MSL of time to prepare for a possible FIN timeout retransmission. If it doesn’t, wait for the time to pass and handle it yourself.
-
2 MSL can ensure that the packets generated in this connection use up their lifetime and do not go to the next TCP connection to affect them
TCP Fast Open
TCP Fast Open (TFO) TCP Fast Open. The client and server exchange cookies during the first three-way handshake to send data during the subsequent handshake.
First round of three handshakes:
- The client sends a SYN packet containing the Fast Open option and an empty Cookie, indicating that the client requests a TFO Cookie
- The server responds to an ACK + SYN packet containing the Fast Open option and a generated TFO Cookie
- The client receives the TFO Cookie and caches it
The next three handshakes:
- The client sends SYN + TFO Cookie + data to send
- The server performs Cookie verification and sends ACK + SYN + data to the client after confirming that the Cookie is valid and not expired
- The client sends an ACK for confirmation
Note that in a normal TCP connection, the data exchange needs to be completed after the three-way handshake, while TFO allows the client to send data before the three-way handshake is completely completed, and the server to respond to the data.
TCP keepalive
There is a keep-alive mechanism in HTTP. The purpose of keep-alive mechanism is to make TCP persistent connections, so that the same TCP connection can be reused for multiple rounds of request-response instead of rebuilding a TCP connection for each round of request-response. TCP also has a Keepalive mechanism (note that there is no line), which is different from HTTP keep-alive.
Simply put, HTTP keep-alive is an survivable mechanism, while TCP Keepalive is a survivable probe mechanism
A TCP connection is not transmitting data all the time, and there may be periods of time when there is no data interaction. During this period of time, both parties do not know the condition of the peer end. The peer end may crash, restart, crash, or even disconnect the intermediate network unexpectedly, but they still keep the TCP connection, wasting resources. Therefore, we need a mechanism to detect the peer and release the TCP connection in a timely manner — this is TCP Keepalive.
Specifically, after receiving an ACK, the sender starts a keepalive timer and waits for a period of time (keepalive time tcp_keepalive_time). If no data interaction occurs after this time, the sender sends a keepalive probe message, which is equivalent to asking the receiver, “I haven’t heard from you for a long time. Is something wrong with you?” There are two situations:
- If the receiving end replies with an ACK, it indicates that the peer end and the intermediate network are normal and there is no accident, except that the peer end does not send data. In this case, the keepalive timer is reset.
- If there is no response from the receiving end, it indicates the peer end or the intermediate networkmayThere’s been an accident. The sending end at a certain time interval (detection time interval
tcp_keepalive_intvl
) and sends the keepalive probe packet again until a response is received. If the maximum number of detection cycles is reached (tcp_keepalive_probes
) If no response is received, the peer or intermediate networkindeedIf an accident occurs, the sending end considers that the peer end is unreachable, and therefore it is unnecessary to maintain the TCP connection. Therefore, the sending end releases the TCP connection.
The purpose of HTTP keep-alive is different from that of TCP Keepalive. The former is to keep TCP alive for reuse, while the latter is to detect the condition of the peer or intermediate network and decide whether to release the connection.
The retransmission mechanism
To ensure reliable data transmission, TCP uses the retransmission mechanism to resend lost packets
Timeout retransmission
If the receiver does receive a packet from the sender, the receiver should return an ACK indicating which packet it expects to receive next. Timeout retransmission depends on Time to drive retransmission. After each packet is sent, the sender starts a timer for the packet and waits for a RTO (Timeout retransmission Time, which is dynamically changing). If the receiver does not receive the ACK after the RTO, Then the packet will be retransmitted. Note that there should be two cases:
- The packet from the sender is lost in transit and does not reach the receiver, so the receiver does not return an ACK
- The packet from the sender reaches the receiver, who returns an ACK, but the ACK is lost in transit
However, no matter what the real situation is, for the sender, it is not received ACK, is to carry out packet retransmission.
There are two obvious problems with timeout retransmission:
- Because the data packet is time driven, it must wait for an RTO to be retransmitted, which may be very long. As a result, the data packet cannot be retransmitted to the receiver for a long time
- The receiver will ACK only the largest consecutive packets, not jump ACK. For example, if the sender sends 123456 with 3 missing and 12456 reaches the receiver, the receiver will only issue ACK = 3 instead of ACK = 7. In fact, the problem here is that the receiver has clearly got 456, but has not given the corresponding ACK, so the sender will mistakenly think that 456 is also lost, which may lead to retransmission of 456
The fast retransmission
Fast retransmission is based on data-driven retransmission, and the sender does not need to get the end of the timer to retransmit the packet. In the following picture, the sender sends 12345,2 missing.
- Receiver: For 1, the receiver will first give a normal ACK = 2, indicating that it expects to receive 2 next time, and for the subsequent arrival of 345, the receiver will give three redundant ACK = 2.
- Sender: after receiving a normal ACK = 2, it receives three redundant ACK = 2. It knows that 2 is missing and retransmits 2.
Fast retransmission solves the problem of waiting for timeout, but like timeout retransmission, it cannot independently retransmit the lost packet, but retransmits the packet together with subsequent packets (because the sender does not know exactly how many packets are lost, it may think that the following packets are lost).
Selective retransmission of SACK
In response to the above problems, selective retransmission occurs. SACK can be carried in the packet sent from the receiver to the sender to inform the sender of which packets are received and which packets are lost. In this way, the sender can individually retransmit the lost packets.
Selective retransmission of D-SACK
D-sack refers to duplicate SACK, which is an upgraded version of SACK. It enables the sender to better understand the network transmission situation, which data is repeated, and the specific reason for the retransmission.
1) The sender can know whether the timeout retransmission occurs because its packet is lost or because the receiver’s ACK is lost:
If the ACK returned by the receiver is lost, the sender will not receive the ACK, and the timeout retransmission mechanism will occur. As a result, the receiver receives two repeated packets, and responds with a SACK packet to inform the sender: “I received two duplicate packets with XXX content”, the sender can know that the timeout retransmission is caused by the ACK loss of the receiver, and the fault is not its own.
2) The sender can know whether the fast retransmission occurs because of its packet loss, ACK loss of the receiver, or network delay:
If the packet from the sender does not reach the receiver due to network delay, the receiver responds with the same ACK three times and the sender retransmits the packet. At some later point, the delayed packet arrives at the receiver, causing the receiver to receive two duplicate packets, and it responds with a SACK message telling the sender: “I received two duplicate packets with XXX content.” The sender can know that the fast retransmission is not due to the packet loss (otherwise, the receiver cannot receive duplicate packets) or the ACK loss of the receiver, but the network delay.
Flow control
Why flow control?
It is not up to the sender to decide whether to send fast or slow, whether to send more or less, but to the receiver. After all, no matter how fast or how much the sender sends, it doesn’t work if it’s beyond the reach of the receiver. To this end, TCP provides a mechanism for the sender to dynamically adjust its sending capability according to the actual receiving capability of the receiver, which is called traffic control. Specifically, the sender controls its own sending ability through the sending window, and the receiver controls its own receiving ability through the receiving window. The receiver informs the sender of its own receiving window size, so that the sender can adjust its own sending window size and control the transmission traffic.
Sliding window mechanism
The realization of flow control depends on the sliding window mechanism. Take the sending end as an example:
- The first part is the data that has been sent and acknowledged by the receiver
- The second part is the window, which includes data that has been sent but has not been confirmed, and data that has not been sent but can be sent
- The third part is data that has not been sent and cannot be sent yet
During data transfer, the window moves to the right. For example, if the sender now receives an acknowledgement of 27 and 28, the window will move two squares to the right, putting 27 and 28 in the first section, and making room to put 41 and 42 in the window.
Examples understand the flow control process
Consider the following scenario:
- At first, the receiver notifies the sender that its receive window size is 400 bytes, so the sender adjusts its send window size to 400 bytes
- The sender sends 1-100, 101-200, which now belong to the window that has been sent but has not been acknowledged, so the window uses up 200 bytes and has 200 bytes left
- The sender sends 201-300, but it’s missing, but it still takes up the window size, so now the window has 100 bytes left
- The receiving end receives the data and makes cumulative confirmation. Since only 1-100,101-200 has been received, ack = 201 means that it expects to receive 201-300 next time. At the same time, the sender is told to adjust the size of the sending window to 300 bytes
- The sender received the data, 1-100 and 101-200 were confirmed, so the window moved right, freeing up 200 bytes, but because the receiver reduced the window size, now there are only 300 bytes in total available, and 100 bytes have been used (missing 201-300).
- The sending end sends 301-400, 401-500, the sending window has no available space
- After one RTO has passed, the sender still does not receive the acknowledgement of 201-300 (ACK = 301). Due to the timeout retransmission mechanism, the sender will retransmit 201-300. The new 201-300 and the old 201-300 will share the same space in the send window
- After receiving the data, the receiver makes cumulative confirmation. Since it has received all the data from 1-500, it can give ACK = 501. At the same time, shrink the send window again to 100 bytes
- The sender received the data and all three were confirmed, so the window was moved 300 bytes to the right, but now only 100 bytes are available because the receiver asked to shrink the window
- The sender sent 501-600, the sending window is out of free space
- The receiver receives the data, acknowledges it, and shrinks the window a third time to 0, which tells the sender not to send new data
- The sender window is moved 100 bytes to the right, but no more data can be sent because the window size has become 0
How do I resolve deadlocks?
When the receiver adjusts the sending window to 0, the data transfer is considered to be over. Later, if you want to start a new round of data transmission, the receiver can send a non-zero window notification message to inform the sender to expand the window to send data.
However, when the non-zero window notification message is lost, there will be a deadlock situation: the sender waits for the receiver to send the notification message to change the window size, and the receiver waits for the sender to send data. To break deadlocks, a poll-like mechanism can be used:
- After receiving the 0 window packet, the sender starts a timer. If no non-0 window notification packet is received, the sender sends a probe packet to inquire the receiver
- The receiver returns its current window value, which is equivalent to retransmitting a non-zero window value
- Of course, the retransmission window value may still be 0, if so, the sender will start the timer again and repeat the process until it reaches a non-0 window value.
Congestion control
Flow control VS congestion control
Why do you need congestion control after flow control? These are two different things:
- Traffic control is end-to-end. The sending end needs to dynamically adjust its sending capacity according to the receiving end’s receiving capacity to prevent the receiving end from being overwhelmed by the heavy traffic during data transmission.
- Congestion control applies to the entire end system. The sending end needs to dynamically adjust its sending capability based on network congestion to prevent excessive data from being sent at a time and aggravate network congestion
If only flow control, then can only ensure double-end transceiver ability match, to ensure smooth network – not fluent, congestion in the network, even if the receiver to accept ability is again good, data in transmission can also occur in the process of packet loss and delay, and congestion control is to solve this problem, so it is also essential.
Congestion control algorithm
The sending end reflects its sending capability by the size of the sending window. It is not difficult to see from the above description that the size of the sending window should not only be determined by the size of the receiving window, but also by the size of the congested window. In fact, the send window = Min (receive window, congestion window).
So how big should the congestion window itself be? Its size should be dynamically adjusted according to congestion control algorithms, depending on network conditions.
Slow start + Congestion Avoidance:
PS: The size of the receiving window is not considered here, so the size of the congestion window directly determines the size of the sending window, that is, the sending capability of the sender.
- A slow start starts with a very small congestion window value (1 in this case) and doubles the congestion window value in the next round of transmission (one round is a round trip), so the sender sends more data in the early part of the transmission
- Slow start is actually an exponential process. If unchecked, the congestion window becomes larger and more data is sent, causing the network to become congested quickly. So I’m going to define one here
ssthresh
As a slow start threshold, once this threshold is reached, the congestion avoidance algorithm is switched to – instead of doubling the congestion window in the next round of transmission, it simply increases by one, so that the network does not become congested so quickly - The congestion avoidance algorithm only prevents the network from becoming congested quickly, but at some point in the future, it will inevitably become congested, and at this point, the value of the congestion window will quickly drop to the slow start value of 1
- Then using exponential growth slow start algorithm, only this time the threshold to smaller, it is equal to the network into congestion when half of the congestion window value – this is actually a process of slowly adapt to, because the first threshold may be too high, so the set is relatively lower, might not be so fast to the network congestion
- Then change to congestion avoidance algorithm, continue linear growth
- … Repeat the above process
Fast retransmission + Fast Recovery:
In fact, this algorithm is similar to the above algorithm. At the beginning, it also goes through the process of slow start and congestion avoidance. The difference is that:
- When network congestion is reached at some point, a fast retransmission algorithm is performed. This is because, when the network is congested, the probability of packet loss is very high, so it is no longer necessary to rely on timeout retransmission algorithm. Instead, we need to use the faster fast retransmission algorithm to “save” those lost packets
- After that, instead of dropping directly to the slow start value of 1, it only drops to half of the congestion window value when the network reaches congestion, which can be regarded as eliminating the step of exponential growth and directly starting from the new threshold for linear growth. This is what’s called a quick recovery, a quick jump to the threshold without exponential growth.
PS: THE TCP Tahoe version was used in the early stage, which had an exponential growth process. However, its problem was that every time the network fell into congestion and caused packet loss, the congestion window value would drop to 1, which was equivalent to going through the congestion window again for adaptation, which was not conducive to the stable transmission of data. Therefore, the TCP Reno version is used instead.