TCP is a connection-oriented unicast protocol. In TCP, there is no multicast or broadcast because the IP address of the sender and receiver can be specified in the TCP packet segment.

Before sending data, the two communicating parties (sender and receiver) establish a connection. After sending data, the two communicating parties disconnect the connection. This is the establishment and termination of TCP connection.

Establishing and terminating a TCP connection

If you read my previous article on the network layer, you know that there are four basic elements of TCP: the IP address of the sender, the port number of the sender, the IP address of the receiver, and the port number of the receiver. The IP + port number of each party can be regarded as a socket, and the socket can be uniquely identified. The socket is like a door, out of which data is transferred.

TCP connection establishment -> termination is divided into three phases

The following discussion focuses on these three levels.

The following figure shows a typical TCP connection setup and closure process, excluding data transfer.

TCP establishes a connection – three-way handshake

  1. The server process is ready to receive TCP connections from the outside. Usually, it calls the bind, LISTEN, and socket functions to do this. This mode of opening is considered to bePassive open. Then the server process is inLISTENStatus waiting for a client connection request.
  2. Client passconnectinitiateActive openSends a connection request to the server with SYN = 1 at the beginning and an initial sequence number (seq = x). The SYN packet segment does not carry data and consumes only one sequence number. At this point, the client entersSYN-SENDState.
  3. After receiving the connection from the client, the server needs to confirm the packet segment of the client. Set both the SYN and ACK bits to 1 in the acknowledgment packet segment. The confirmation number is ACK = x + 1 and the initial number is seq = y. This message segment also cannot carry data, but again consumes a sequence number. At this point, the TCP server entersSyn-received (Synchronously RECEIVED)State.
  4. After receiving the response from the server, the client also needs to confirm the connection. Confirm that the ACK in the connection is set to 1, the sequence number is seq = X + 1, and the confirmation number is ACK = Y + 1. According to TCP, this packet segment may or may not carry data. If no data is carried, the sequence number of the next data packet segment is still seq = X + 1. At this point, the client entersESTABLISHEDstate
  5. After receiving the customer’s confirmation, the server also logs inESTABLISHEDState.

This is a typical three-way handshake. A TCP connection can be established through the preceding three packet segments. The purpose of the three-way handshake is not only to let the communication parties know that a connection is being established, but also to exchange special information using the option fields in the packet, exchanging initial serial numbers.

The first party to send a SYN packet is considered to open a connection, and this party is also called the client. The recipient of the SYN is usually called the server, and it is used to receive this SYN and send the SYN below, so it is opened passively.

TCP requires three packet segments to establish a connection, but four to release a connection.

TCP disconnects – Four waves

After the data transfer is complete, the communication parties can release the connection. After the data transfer is complete, both the client and server hosts are in the ESTABLISHED state, and the connection is released.

The procedure for TCP disconnection is as follows

  1. The client application sends a packet to release the connection, stops sending data, and actively closes the TCP connection. The client host sends a packet to release the connection. The FIN position in the header of the packet is 1, which does not contain data, and the sequence number bit is SEq = U. In this case, the client host enters the fin-WaI-1 phase.

  2. After receiving the packet segment from the client, the server host sends an acknowledgement packet, in which ACK = 1 is generated and its serial number seq = V, ACK = U + 1 is generated. Then the server host enters close-wait state.

  3. After receiving the acknowledgement from the server host, the client host enters the FIN-WaIT-2 state. Waiting for the client to send a packet to release the connection.

  4. In this case, the server host sends the disconnected packet segment, in which ACK = 1, sequence number SEq = V, ACK = U + 1 is sent. After sending the disconnected request packet, the server host enters the last-ACK phase.

  5. After receiving the disconnection request from the server, the client needs to respond. In the packet segment, ACK = 1 and sequence number SEq = U + 1 are sent, because the client has not sent any data since the disconnection, ACK = V + 1, Then it enters the time-wait state. Note that the TCP connection has not been released. The client can enter the CLOSED state only after the Maximum Segment Lifetime is set to 2MSL.

  6. The server enters the CLOSED state after receiving the disconnection confirmation from the client. The server terminates a TCP connection earlier than the client, and four packet segments are sent during the entire disconnection process. Therefore, the process of disconnection is also called four-fold waving.

Either side of a TCP connection can initiate the closing operation, although it is usually the client that initiates the closing operation. However, some servers, such as Web servers, may initiate a connection closure after responding to the request. TCP sends a FIN packet to initiate the closure.

To sum up, it takes three segments to establish a TCP connection and four segments to close a TCP connection. TCP also supports a half-open state, although this is rare.

TCP half open

The reason why the TCP connection is half-open is that one party of the connection closes or terminates the TCP connection without notifying the other party, that is to say, the two people are chatting on wechat. Cxuan You are offline and you didn’t tell me, I am still gossiping with you. The connection is considered to be half-open. This happens when one of the parties in the communication is in a mainframe crash, you f * * k, my computer crashed how do I tell you? As long as the half-connected party does not transmit data, it is impossible to detect that the other host is offline.

Another reason to be in the half-open state is when the communicating party turns off the host power instead of shutting it down properly. In this case, there are many half-open TCP connections on the server.

TCP half closed

Since TCP supports half-open operations, we can assume that TCP also supports half-closed operations. Similarly, IT is not common for TCP to be half-closed. TCP’s half-closed operation is to close only one direction of the data flow. The two half-close operations together close the entire connection. In normal cases, applications send FIN packets to each other to end the connection. However, when TCP is half-closed, applications indicate their views: “I have finished sending data and sent a FIN segment to the receiver, but I still want to receive data from the receiver until it sends a FIN segment to me.” The following is a schematic of TCP half-closed.

Explain the process:

First client host and server host at the start of the data transmission, after a period of time, the client has launched a FIN packet, actively ask disconnected, server, after receipt of the FIN response ACK, due to the side of a half closed is the client still want the server to send data, so the server will continue to send data, After a certain period of time, the server sends another FIN packet. After receiving the ACK packet, the client disconnects from the server.

In A TCP half-close operation, one direction of the connection is closed and the other direction continues to transmit data until it is closed. It’s just that very few applications use this feature.

Open and close at the same time

A more unconventional operation is when both applications actively open the connection at the same time. While this may seem unlikely, it is possible under certain arrangements. We’re going to focus on this process.

Each communication party sends a SYN before receiving a SYN from the other party. In this scenario, both parties must know the IP address and port number of the other party.

Here is an example of simultaneous opening

As shown in the preceding figure, both sides send a SYN packet before receiving the packet from the other side and reply an ACK packet after receiving the packet from the other side.

A simultaneous opening process requires the exchange of four message segments, which is one more than the ordinary three-way handshake. Since there is no client and server to open at the same time, I use the two communication parties here.

Similar to simultaneous opening, simultaneous closing means that the communication parties send a FIN packet at the same time. The following figure shows the process of simultaneous closing.

The simultaneous closing process requires the same number of segments to be exchanged as normal closing, but the simultaneous closing is not sequential like the four waves, but is carried out across.

Talk about the initial serial number

Maybe it is because of my unprofessional drawings or text description. The Initial sequence number is represented by a technical term. The English name of the Initial sequence number is Initial Sequence numbers (ISN), so the seq = V represented above actually represents the ISN.

Before sending the SYN, the communication parties select an initial sequence number. The initial sequence number is randomly generated, and each TCP connection has a different initial sequence number. The RFC documentation states that the initial serial number is a 32-bit counter + 1 every 4 us (microseconds). Because each TCP connection is a different instance, the purpose of this arrangement is to prevent overlapping sequence numbers.

When a TCP connection is established, only the correct TCP quad and sequence number can be received by the peer. This also reflects the vulnerability of TCP packet segments to be forged. As long as I forge the same quad and initial sequence number, I can forge the TCP connection and break the NORMAL TCP connection. Therefore, one way to defend against this attack is to use the initial sequence number, and another way is to encrypt the sequence number.

TCP state transition

We talked about three handshakes and four waves, and talked a little bit about state transitions between TCP connections, so LET me start from the beginning and walk you through these state transitions.

At the beginning, both the server and the client are in the CLOSED state. In this case, you need to determine whether the server is opened actively or passively. If the server is opened actively, the client sends a SYN packet to the server, and the client is in syn-send state. Syn-send Sends a connection request and waits for a matching one. When the server is passively enabled, it listens for SYN packets. If the client calls the close method or does not operate for a period of time, it becomes CLOSED again

A client in LISTEN sends a SYN that changes to SYN_SENT.

LISTEN -> SYN_SENT because the connection may be triggered by the application on the server sending data to the client. The client passively accepts the connection and starts to transfer files after the connection is established. In other words, it is possible for a SERVER in LISTEN to send a SYN packet, but this is very rare.

A server in the SYN_SEND state receives a SYN and sends a SYN and an ACK to the SYN_RCVD state. Similarly, a client in LISTEN state receives a SYN and sends a SYN and an ACK to the SYN_RCVD state. If a client in the SYN_RCVD state receives an RST, it changes to LISTEN.

It’s better to look at these two pictures together.

So I need to explain what is RST

In this case, when the host receives a TCP packet segment, the IP address and port number do not match. If the client host sends a request and the server host finds that the request is not for the server after checking the IP address and port number, the server sends a special RST message segment to the client.

Therefore, when a server sends an RST special packet segment to a client, it tells the client that there is no matching socket connection and please stop sending it.

RST :(Reset the connection) used to Reset a faulty connection for some reason, and also used to reject illegal data and requests. If an RST bit is received, some error usually occurs.

Failure to identify the correct IP port above is one condition that causes an RST to occur. In addition, an RST may occur due to a request timeout, cancellation of an existing connection, and so on.

The SYN_RCVD server receives ACK packets, and the SYN_SEND client receives SYN and ACK packets and sends ACK packets. Thus, a connection is established between the client and the server.

Now, the other thing to notice here is that I didn’t specify the simultaneous state up here, but in fact, in the simultaneous state, it looks something like this.

Why is this so? If both hosts SEND SYN packets, the host that initiates the SYN packet is in syn-send state. After the packet is sent, the host waits for SYN and ACK packets to be received. After both hosts SEND SYN + ACK packets, the host that initiates the SYN packet is in SYN-send state. Both parties are in the SYN-received (SYN-RCVD) state. After a SYN + ACK packet arrives, the two parties enter the ESTABLISHED state and start data transmission.

Well, so far, I’ve given you an overview of state transitions during TCP connection setup, and now you can make a pot of tea and wait for the data to transfer.

Ok, now that you’ve had enough water, the data transfer is complete, and when the data transfer is complete, the TCP connection can be disconnected.

Now let’s move the clock forward to the time when the server is in the SYN_RCVD state. The server is happy because it just received a SYN packet and sent a SYN + ACK packet. But then the server application shuts down and sends a FIN packet. The server goes from SYN_RCVD -> FIN_WAIT_1 state.

Then set the clock to now, and the client and server have finished transmitting data. The client sends a FIN packet to disconnect, and the client becomes FIN_WAIT_1. In the case of the server, it receives a FIN packet and replies with an ACK packet. It goes from ESTABLISHED -> CLOSE_WAIT.

The server in CLOSE_WAIT state sends a FIN packet and places itself in the LAST_ACK state. A client in FIN_WAIT_1 state becomes FIN_WAIT_2 when it receives an ACK message.

So I need to explain the CLOSING state, so FIN_WAIT_1 -> CLOSING is a very special transition

CLOSING is a special state, which should be very rare in the actual situation and belongs to a rare exceptional state. Normally, after you send a FIN packet, you should receive the ACK packet first or at the same time and then the FIN packet. However, in a CLOSING state, you receive a FIN packet instead of an ACK packet from the peer party after sending a FIN packet.

When does this happen? If both parties close a link at the same time, FIN packets are sent at the same time. In other words, the two parties are CLOSING the connection.

The FIN_WAIT_2 client changes to the TIME_WAIT state after receiving the FIN + ACK message sent by the server host and sending an ACK response. The FIN sent by the CLOSE_WAIT client is in the LAST_ACK state.

In many images and blogs, the LAST_ACK state is displayed only after the FIN + ACK packet is displayed. However, only the FIN is described. That is, the FIN is in the LAST_ACK state only when CLOSE_WAIT is sent.

The FIN_WAIT_1 -> TIME_WAIT state is the state of the client after receiving the FIN and ACK and sending the ACK.

The client in CLOSINIG state will continue to be in TIME_WAIT state if ACK is received. As you can see, the TIME_WAIT state is the last state of the client before closing. LAST_ACK is the last state of the server before closing. It is a passive open state.

There are a couple of states up here that are special, so let’s go west.

TIME_WAIT state

After a TCP connection is established, the party that actively closes the connection enters the TIME_WAIT state. The TIME_WAIT state is also called the 2MSL wait state. In this state, TCP waits twice the Maximum Segment Lifetime (MSL).

MSL needs to be explained here

MSL is the maximum expected lifetime of a TCP segment, that is, the maximum time it can exist on the network. The TTL and hop count fields in the IP packet determine the lifetime of the IP address. Generally, the maximum lifetime of TCP is 2 minutes, but this value can be changed. You can change this value based on the operating system.

Based on this, let’s explore the state of TIME_WAIT.

When TCP performs an active shutdown and sends a final ACK, the TIME_WAIT should exist with a maximum lifetime of 2 * so that TCP can resend the final ACK to avoid loss. Resending the final ACK is not because TCP retransmits the ACK, but because the other side of the communication retransmits the FIN. The client often sends back the FIN because it needs the ACK response to close the connection. If the lifetime exceeds 2MSL, the client sends an RST, causing the server to fail.

TCP timeout and retransmission

There is no such thing as infallible communication, which indicates that no matter how perfect external conditions are, there is always the possibility of error. Therefore, in the normal process of TCP communication, there will be errors, which may be caused by packet loss, packet repetition, or even packet disorder.

During TCP communication, the TCP receiver sends a series of confirmation messages to determine whether an error occurs. Once packet loss occurs, the TCP retransmits unconfirmed data.

TCP retransmission is performed in two ways, one based on time and the other based on confirmation information. Generally, confirmation information is more efficient than time.

Therefore, it can be seen from this point, TCP confirmation and retransmission, are based on whether the packet is confirmed as the premise.

TCP sets a timer when sending data. If no confirmation message is received within the specified time, a timeout or timer based retransmission operation is triggered. Timer timeout is usually called retransmission timeout (RTO).

But there is another way that does not cause delay, which is fast retransmission.

TCP doubles the retransmission time after each retransmission. This “double interval” is called binary exponential backoff. When the interval is doubled to 15.5 minutes, the client displays it

Connection closed by foreign host.
Copy the code

TCP has two thresholds for determining how to retransmit a packet segment. These thresholds are defined in RFC[RCF1122]. The first threshold is R1, which indicates the number of retransmission attempts that TCP is willing to make, and R2, which indicates the time at which TCP should drop the connection. R1 and R2 should be set to at least three retransmissions and 100 seconds of TCP connection abandonment.

Note that for the SYN packet, the R2 value should be set to at least 3 minutes. However, the values of R1 and R2 are set differently in different systems.

On Linux, the values of R1 and R2 can be set by an application or by modifying the values of net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2. The variable value is the number of retransmissions.

The default value of TCP_retries2 is 15, and this enrichment count takes about 13-30 minutes, which is just a ballpark value, and the final time will depend on the RTO, which is the retransmission timeout. The default value for tcp_retries1 is 3.

For the SYN segment, net.ipv4.tcp_syn_retries and net.ipv4.tcp_synack_retries limit the number of SYN retries. The default value is 5, which is about 180 seconds.

Windows also has R1 and R2 variables, and their values are defined in the registry below

HKLM\System\CurrentControlSet\Services\Tcpip\Parameters
HKLM\System\CurrentControlSet\Services\Tcpip6\Parameters
Copy the code

One of the important variables are TcpMaxDataRetransmissions, the corresponding in Linux TcpMaxDataRetransmissions tcp_retries2 variables, the default value is 5. This value represents the number of times TCP failed to acknowledge a segment of data on an existing connection.

The fast retransmission

We mentioned fast retransmission above. In fact, the fast retransmission mechanism is triggered based on feedback from the receiver and is not affected by the retransmission timer. Therefore, compared with timeout retransmission, fast retransmission can effectively repair packet loss. When an out-of-order packet (for example, 2-4-3) arrives at the receiving end during a TCP connection, TCP generates an acknowledgement message immediately. This acknowledgement message is also called repeated ACK.

When an out-of-order packet arrives, the repeated ACK must be returned immediately without delay. The purpose of this ACK is to inform the sender of the out-of-order packet and expect the sender to indicate the sequence number of the out-of-order packet segment.

In another case, repeated ACK is sent to the sender. That is, subsequent packets of the current packet segment are sent to the receiver, so that the packet segment of the current sender is lost or delayed. In both cases, the receiver does not receive the packet, but we cannot determine whether the packet segment is lost or not delivered. So the TCP sender waits for a certain number of duplicate ACKS to be accepted to determine whether the data is lost and trigger a fast retransmission. Usually the number of these judgments is 3, so this might be a little confusing, but let’s take an example.

As shown in the preceding figure, packet segment 1 is successfully received and confirmed as ACK 2. The expected sequence number of the receiving end is 2. If packet segment 2 is lost, packet segment 3 is received. The out-of-order arrives, but does not match the receiver’s expectations, so the receiver repeatedly sends redundant ACKS 2.

In this way, after receiving three consecutive identical ACKS before the retransmission timer expires, the sender knows which packet segment is lost. In this way, the sender does not need to wait for the expiration of the retransmission timer, which greatly improves efficiency.

SACK

In the standard TCP confirmation mechanism, if the sender sends the data between 0 and 10000, but the receiver only receives the data between 0 and 1000 and 3000-10000, but the data between 1000 and 3000 does not reach the receiver, The sender will retransmit the data between 1000 and 10000, which is not necessary because the data after 3000 has already been received. But the sender is not aware of the existence of this condition.

How to avoid or solve this problem?

This SACK option field is a kind of Selective Acknowledgment ** mechanism that tells the TCP client, As the saying goes, “I am allowed to receive a maximum of less than 1000 packets, but I received a maximum of 3000-10000 packets. Please send me a maximum of 1000-3000 packets.”

The two parties add the SACK option field to the SYN segment or the SYN + ACK segment to inform the peer host whether SACK is supported or not. The SACK option can then be used in the SYN section.

A note of caution: the SACK option field can only appear in the SYN segment.

Pseudo timeout and retransmission

In some cases, packet retransmission may occur even if no packet segment is lost. This retransmission behavior is called spurious Retransmission, and it is unnecessary and may be caused by spurious timeout, which is a premature determination that a timeout has occurred. Pseudo timeout can be caused by many factors, such as out-of-order arrival of packet segments, repeated packet segments, and ACK loss.

There are many methods to detect and deal with pseudo timeouts, which are called detection algorithm and response algorithm. The detection algorithm is used to determine whether a timeout occurs or a timer retransmission occurs. In the event of a timeout or retransmission, a response algorithm is executed to undo or mitigate the effects of the timeout. The following algorithms are not detailed in this article

  • Repeated SACK extension – DSACK
  • Eifel detection algorithm
  • Forward RTO restore -f-rTO
  • Eifel response algorithm

Packet out of order and packet repetition

We have discussed how TCP handles packet loss. Let’s discuss packet out-of-order and packet duplication.

Packet disorder

The out-of-order arrival of packets is extremely easy to occur in the Internet. Since the IP layer cannot guarantee the order of packets, each packet may be sent on the link with the fastest transmission speed, so it is likely to send three packets of A -> B -> C. The order of packets arriving at the receiver is C -> A -> B or B -> C -> A and so on. This is a phenomenon of packet disorder.

In packet transmission, there are two types of links: forward link (SYN) and reverse link (ACK)

If the disordering occurs on a forward link, TCP cannot correctly determine whether packets are lost. Data loss and disordering will cause the receiver to receive disordered packets, resulting in data gaps. This does not matter much if the vacancy is not large enough; However, if the gap is large, it may lead to false retransmission.

If the disorder occurs on the reverse link, the TCP window moves forward and repeated ACK packets that should be discarded are received, causing unnecessary traffic bursts at the sender and affecting available network bandwidth.

Going back to the fast retransmission we discussed above, because fast retransmission is initiated based on repeated ACK inferences of packet loss, it does not have to wait until the retransmission timer times out. The TCP receiver returns an ACK immediately after receiving an out-of-order packet. Therefore, any out-of-order packet on the network may cause repeated ACK. It is assumed that once an ACK is received, the fast retransmission mechanism will start. When the number of acks increases, a large number of unnecessary retransmissions will occur. Therefore, the fast retransmission should reach the dupthresh threshold before triggering. However, severe disorder is not common on the Internet, so the dupthRESH value can be set as low as possible, and generally 3 will handle most cases.

Package to repeat

Packet duplication is also a rare condition on the Internet. It means that packets may be transmitted multiple times during network transmission. When retransmission is generated, TCP may be confused.

Packet repetition can cause the receiver to generate a series of duplicate ACKS, which can be resolved using SACK negotiation.

TCP data flow and window management

We learned in the article “40 Figures to understand TCP and UDP” that sliding Windows can be used to achieve flow control, that is, the client and server can provide each other with data flow information exchange, data flow related information mainly includes the sequence number of the packet segment, ACK number and window size.

The two arrows in the figure indicate the direction of data flow. The direction of data flow is the direction of transmission of TCP packets. As you can see, each TCP segment contains sequence numbers, ACK and window information, and possibly user data. The size of the WINDOW in the TCP packet segment indicates the size, in bytes, of the cache space available to the receiving end. This window size is dynamic, because message segments will be received and disappear at any time. This dynamically adjusted window size is called sliding window. Let’s have a specific understanding of sliding window.

The sliding window

Either end of a TCP connection can send data, but the transmission of data is not unlimited. In fact, Both ends of the TCP connection maintain a Send window structure and a receive window structure, which are the limits of data transmission.

Sender window

Below is an example of a sender window.

In this picture, there are four concepts involved in sliding Windows:

  • Sent and acknowledged packet segment: After the packet is sent to the receiver, the receiver responds with an ACK reply to the packet segment. The packet segment marked green in the figure is the packet segment confirmed by the receiver.
  • Packet segments that have been sent but not confirmed: In the figure, the green area indicates the packet segments that have been confirmed by the receiver, while the light blue area indicates the packet segments that have been sent but not confirmed by the receiver.
  • Message segment waiting to be sent: The dark blue area in the figure is the message segment waiting to be sent, which is part of the structure of the sending window. In other words, the structure of the sending window is actually composed of the message segment waiting to be sent and unconfirmed.
  • Message segments that can be sent only when the window slides: if the message segments in the set [4,9] in the figure are sent, the whole sliding window moves to the right. The orange area in the figure is the message segments that can be sent only when the window moves to the right.

Sliding Windows also have boundaries. The boundaries are Left edge and Right edge. Left Edge is the Left edge of the window, and Right Edge is the Right edge of the window.

When the Left edge moves to the Right and the Right edge stays the same, the window may be close. This happens as the window becomes smaller as the sent data is validated.

When the Right Edge moves to the Right, the window is open, allowing more data to be sent. This state occurs when the receiving process reads the buffer so that the buffer receives more data.

In addition, the Right edge may move to the left, resulting in a smaller segment of the message sent and confirmed. This situation is called confused window syndrome, which we do not want to see. In the case of confused window syndrome, the size of data segments exchanged between communication parties becomes smaller, while the fixed cost of the network does not change. The proportion of useful data in each message segment is small relative to the header information, resulting in low transmission efficiency.

It’s like you’re overqualified to spend a day writing a complex page, but now you’re overqualified to spend a day fixing a bug in the title.

Each TCP packet segment contains the ACK number and window notification, so the TCP receiver adjusts the window structure based on these two parameters every time it receives a response.

The Left edge of the TCP slide window can never be moved to the Left, because the segment sent and confirmed can never be cancelled, just as there is no regret in this world. This edge is controlled by an ACK number sent by another segment. When the ACK label moves a window to the right but the window size does not change, the window is said to slide forward.

If the number of ACKS increases but the window notification becomes smaller as other ACKS arrive, the Left edge will approach the Right edge. When the Left edge and Right edge overlap, the sender will not transmit any data, which is called zero window. At this point, the TCP sender initiates a window probe and waits for an appropriate time to send data.

Receiver window

The receiver also maintains a window structure that is much simpler than the sender’s. This window records the data that has been received and confirmed, as well as the maximum serial number it can receive. The window structure of the receiver does not store duplicate segments and ACKS, and the window of the receiver does not record segments and ACKS that should not be received. The following is the window structure for the TCP receiver.

Like the sender window, the receiver window structure maintains a Left edge and a Right edge. The segment on the Left of the Left edge is the received and acknowledged segment, and the segment on the Right edge is the unreceived segment.

For the receiver, data arriving at a sequence number less than Left EFGE is considered duplicated and needs to be discarded. Those exceeding the Right edge are considered out of processing range. Only when the arrived packet segment is equal to the Left edge, data will not be discarded and the window can slide forward.

Structure of the receiver window will exist zero window, if an application process consumption data is slow, while TCP sender to send a large amount of data to the receiving party, will cause the TCP buffer overflow, notice the sender don’t send the data again, but the application process in a very slow speed of consumption of the buffer data (such as 1 byte), The receiver is told to send only one byte of data, and this process continues slowly, resulting in high network overhead and low efficiency.

We mentioned above that the window has Left edge = Right edge, which is called zero window. Now we will study the zero window in detail.

Zero window

TCP realizes traffic control through the window notification information of the receiving end. The notification window tells TCP how much data the receiver can receive. When the window of the receiver turns to 0, the sender is effectively prevented from continuing to send data. When the receiver regays free space, it sends a window update to the sender to tell it is ready to receive data. Window updates are generally pure ACK, that is, without any data. However, pure ACK cannot be guaranteed to reach the sender, so relevant measures are needed to deal with such packet loss.

If a pure ACK is lost, both sides of the communication will always be in a waiting state, the sender wants to collapse the receiver why let me send data! The receiver wonders why the damned sender hasn’t sent the data yet! To prevent this, the sender uses a persistent timer to intermittently query the receiver to see if its window has grown. A persistent timer triggers window probing, forcing the recipient to return an ACK with an update window.

Window probe contains one byte of data, using TCP lost-retransmission mode. The sending of window probes is triggered when the TCP persistence timer times out. Whether a byte of data can be received by a receiver depends on the size of its buffer.

Congestion control

With THE TCP window control, the computer network between two hosts is no longer in the form of a single data segment sent, but can continuously send a large number of data packets. However, a large number of data packets are accompanied by other problems, such as network load, network congestion and so on. To prevent such problems, TCP uses the congestion control mechanism. The congestion control mechanism prevents data transmission from the sender when the network is congested.

There are two main methods of congestion control

  • End-to-end congestion control: Because the network layer does not provide display support for traffic layer congestion control. So even if there is congestion in the network, the end system has to infer the network behavior by observing it.TCP uses end-to-end congestion control. The IP layer does not provide feedback about network congestion to the end system. So how does TCP infer network congestion?If timeouts or triple acknowledgements are considered congestion, TCP reduces the window size or increases round trip latency to avoid this.
  • Network - assisted congestion control: In network-assisted congestion control, the router provides feedback to the sender about the state of congestion in the network. This feedback is a bit that indicates congestion in the link.

The following diagram depicts both congestion control methods

TCP congestion control

If you see this, I assume for a moment that you understand the basics of TCP implementation reliability, which is the use of ordinals and confirmation numbers. In addition, another basic implementation of TCP reliability is TCP congestion control. if

The method used by TCP is to allow each sender to limit the rate of sending packets according to the perceived degree of network congestion. If the TCP sender feels that there is no congestion, the TCP sender increases the rate of sending packets. If the sender perceives a block along the path, the sender will slow down the transmission rate.

But there are three problems with this approach

  1. How does a TCP sender limit the rate at which it can send segments to other connections?
  2. How does a TCP sender sense network congestion?
  3. When the sender senses end-to-end congestion, what algorithm is used to change its transmission rate?

Let’s start with the first question, how does a TCP sender limit the rate at which it can send segments to other connections?

We know that TCP consists of a receive cache, a send cache, and variables (LastByteRead, RWND, etc.). The sender’s TCP congestion control mechanism keeps track of a variable called congestion window, which is represented by CWND and is used to limit the amount of data that TCP can send to the network before receiving an ACK. The receiving window (RWND) is used to tell the receiver how much data it can accept.

In general, the amount of data not acknowledged by the sender must not exceed the minimum values of CWND and RWND, i.e

LastByteSent – LastByteAcked <= min(cwnd,rwnd)

Since the round-trip time of each packet is RTT, we assume that the receiver has enough cache space to receive data, we don’t worry about RWND and just focus on CWND. Then, the sender’s sending rate is about CWND /RTT bytes/second. By tuning the CWND, the sender can therefore adjust the rate at which it sends data to the connection.

How does a TCP sender sense network congestion?

This, as we discussed above, is what TCP senses based on timeouts or three redundant ACKS.

When the sender senses end-to-end congestion, what algorithm is used to change its transmission rate?

This is a complex issue, and as I’ll explain, TCP generally follows the following guidelines

  • If packets are lost during transmission, the network is congested. In this case, you need to lower the TCP sender rate.
  • An acknowledgment segment indicates that the sender is passing a segment to the receiver, thus increasing the sender’s rate when an acknowledgment of a previously unacknowledged segment arrives. Why? Because the unconfirmed packet segment reaches the receiver, it indicates that the network is not congested and can be successfully arrived. Therefore, the length of the congestion window of the sender will be larger, so the sending rate will be faster
  • Bandwidth detection, bandwidth detection means that TCP can increase or decrease the number of ACK arrivals by adjusting the transmission rate. If packet loss occurs, the transmission rate will be reduced. Therefore, to detect the frequency at which congestion starts, the TCP sender should increase its transmission rate. Then slowly slow down the transmission rate and start probing again to see if the congestion start rate has changed.

Now that we know about TCP congestion control, let’s talk about TCP congestion control algorithm. TCP congestion control algorithm mainly consists of three parts: slow start, congestion avoidance, and fast recovery

Slow start

When a TCP connection is established, the value of CWND is initialized to a smaller value of MSS. This gives an initial send rate of about MSS/RTT bytes/second, for example, to transmit 1000 bytes of data and RTT is 200 ms, the resulting initial send rate is about 40 KB /s. In practice, the available bandwidth is much larger than this MSS/RTT, so TCP can find the best transmission rate by slow-start. In the slow-start mode, the CWND value is initialized to 1 MSS. In addition, one MSS will be added after each transmission message is confirmed, and the value of CWND will become two MSS. After the two message segments are successfully transmitted, each message segment + 1 will become four MSS, and so on, the value of CWND will double after each successful transmission. As shown in the figure below

Sending rates can’t keep going up. Growth has to come to an end, so when? A slow start usually ends an increase in the send rate in one of the following ways.

  • If packet loss occurs during a slow start send, TCP sets the sender’s CWND to 1 and restarts the slow start process, at which point one is introducedSsthresh (Slow Start Threshold)The initial value is the CWND / 2 that generated the packet loss, i.e., ssthRESH is half the window value when congestion is detected.
  • If CWND > SSTHRESH is doubled, packet loss may occur. Therefore, the best way to do this is to use CWND = SSTHRESH. Then TCP switches to congestion control mode, ending the slow start.
  • The final way to end a slow start is if three redundant ACKS are detected, TCP performs a fast retransmission and enters the recovery state.

Congestion avoidance

When TCP enters the congestion control state, the CWND value is equal to half of the congestion value, which is the SSthRESH value. Therefore, the value of CWND cannot be doubled every time the message segment arrives. Instead, a relatively conservative approach is adopted, and only one MSS is added to the value of CWND after each transmission. For example, after receiving confirmation of 10 message segments, only one MSS is added to the value of CWND. If there is packet loss, then CWND is an MSS and SSTHRESH is half the value of CWND. MSS growth can also be stopped by receiving three redundant ACK responses. If TCP still receives three redundant ACKS after halving the CWND value, ssTHRESH is recorded as half the CWND value and goes into a quick recovery state.

Fast recovery

In quick recovery, the value of the CWND is increased by an MSS for each redundant ACK received for the missing segment that put TCP into the quick recovery state. When an ACK for the lost segment arrives, TCP goes into congestion avoidance after reducing CWND. If a timeout occurs after the congestion control state, then the migration is to the slow start state, where the value of CWND is set to 1 MSS and the value of SSTHRESH is set to half that of CWND.

I have uploaded six PDFS by myself, and the spread has exceeded 10W + on the Internet. After searching the public account of “Programmer Cxuan” on wechat, I reply to CXuan on the background and get all PDFS. These PDFS are as follows

Six PDF links