preface

Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now

Where t

  • Relearn the Web series (The Past and Present of HTTP)
  • Relearn network Series (My name is IP)
  • Relearn network Series (Ping and Gateway)
  • Relearn network series (build Http experimental Environment)

Today we are going to learn something about TCP

A hundred years can be a few days, endure at time. Flowers in the mirror, moon in the water.

TCP Packet header format

Let’s see what we have

  • First of all, the source and destination port numbers are indispensable if these two port numbers are not available. You don’t know which application to send the data to.
  • Next is the serial number of the package. Why do you number the bags? In order to solve the disorder problem, of course. How can we figure out which one should come first and which one should come last without numbering. Numbering is to solve out-of-order problems. Since I am an old driver of the society, I should do things steadily, one by one. No matter how complicated the situation is, I will not be in a mess.
  • There should also be a confirmation number. There should be an acknowledgement on the package, otherwise how will I know if it has been received? If it is not received, it should be resend until it is delivered. This can solve the problem of not losing packets. As an old driver, do things of course to rely on the spectrum, promised to do, temporarily do not also want to have a reply.
  • And then there are some state bits. For example, SYN initiates a connection, ACK replies, RST reconnects, and FIN terminates a connection. TCP is connection-oriented, so both parties need to maintain the state of the connection. The sending of packets with status bits will cause the state change of both parties.
  • Another important thing is window size. TCP to do flow control, communication both sides of the declaration of a window, identify their current ability to handle, don’t send too fast, support me, also don’t send too slow, starve me.

Through the analysis of THE TCP header, we know that to master the TCP protocol, we should focus on the following problems:

  • Order problem, steady not disorderly;

  • Packet loss problem, reliable commitment;

  • Connection maintenance, start and finish;

  • Flow control, grasp the scale;

  • Congestion control, know how to advance, know how to retreat.

  • Serial number SEQ: 4 bytes, used to mark the sequence of data segments. TCP encodes a serial number for all data bytes sent in the connection. The number of the first byte is randomly generated locally. After the bytes are numbered, each message segment is assigned a number; The serial number seQ is the data number of the first byte in the packet segment.

  • Ack number: a 4-byte sequence number of the first data byte of the next packet segment expected to be received. Serial number indicates the number of the first byte of data carried in the message segment. The acknowledgment number refers to the number expected to receive the next byte; Therefore, the number of the last byte of the current packet segment +1 is the confirmation number.

  • ACK: Contains 1 bit. The ACK number field is valid only when ACK=1. When ACK=0, the confirmation number is invalid

  • SYN: Synchronizes the sequence number when the connection is established. When SYN=1 and ACK=0, it indicates that this is a connection request packet segment. If the connection is agreed, SYN=1 and ACK=1 are set in the response packet segment. Therefore, SYN=1 indicates that this is a connection request, or a connection accept message. The SYN bit is set to 1 only when TCP establishes a production connection. After the handshake is complete, the SYN bit is set to 0.

  • Terminate FIN: Used to release a connection. FIN=1 indicates that the sender of the packet has finished sending data and wants to release the transport connection

  • PS: ACK, SYN, and FIN are capitalized words representing flag bits that are either 1 or 0. Ack and seq are lowercase words for serial numbers.

The three handshakes that never change

The above picture, I believe you are familiar with, ha ha. But? We still have to do this together

All problems start with a connection, so let’s look at connection maintenance.

The establishment of a TCP connection is often referred to as a triple handshake.

  • A: Hello, this is A.
  • B: Hello A, this is B.
  • A: Hello, B.

This is often referred to as a “request -> reply -> reply of reply” round. This looks simple, but actually there is a lot of knowledge, a lot of details. The following is a detailed explanation of the process

  • First handshake: When establishing a connection, the client sends a SYN packet (SYN = X) to the server and enters the SYN_SENT state, waiting for confirmation from the server. SYN: Indicates the Synchronize Sequence number.
  • Second handshake: After receiving a SYN packet, the server must acknowledge the client’s SYN (ACK = X +1) and send a SYN packet (ACK = Y). In this case, the server enters the SYN_RECV state.
  • Third handshake: After receiving the SYN+ACK packet from the server, the client sends an ACK packet (ACK = Y +1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state (TCP connection is successful) and complete the three-way handshake.

Why not twice?

The three-way handshake performs two important functions, both by preparing the parties to send the data (both parties know that they are ready) and by allowing the parties to negotiate the initial serial number, which is sent and confirmed during the handshake.

Now instead of three handshakes requiring only two handshakes, deadlocks can occur. As an example, consider the communication between computers S and C. Suppose C sends A connection request packet to S, which receives the packet and sends an acknowledgement reply packet. Following the two-handshake protocol, S considers that the connection has been successfully established and can start sending packets of data. However, in the case that THE reply packet of S is lost in transmission, C will not know whether S is ready, do not know what sequence number S establishes, and C even doubts whether S has received its connection request packet. In this case, C considers that the connection has not been established successfully, and ignores any data sent by S, and only waits for the connection to confirm and reply the grouping. S repeatedly sends the same packet after the sent packet times out. This creates a deadlock.

The cliche of four waves

  • The client process sends a connection release packet and stops sending data. Release the header of the data packet, FIN=1, whose sequence number is SEq = U (equal to the sequence number of the last byte of the previously transmitted data plus 1). At this point, the client enters the fin-WaIT-1 state. According to TCP, FIN packets consume a sequence number even if they do not carry data.

  • After receiving the connection release packet, the server sends an acknowledgement packet with ACK=1, ACK= U +1 and its serial number seq= V. In this case, the server enters close-wait state. The TCP server notifies the higher-level application process that the client is released from the direction of the server. This state is half-closed, that is, the client has no data to send, but if the server sends data, the client still accepts it. This state also lasts for a period of time, i.e. the duration of the close-wait state.

  • After receiving the acknowledgement request from the server, the client enters the fin-WaIT-2 state and waits for the server to send a connection release packet (before receiving the final data from the server).

  • After sending the LAST data, the server sends a connection release packet with FIN=1 and ACK = U +1 to the client. The server is probably in the semi-closed state. Assume that the serial number is SEQ = W, then the server enters the last-ACK state and waits for the client’s confirmation.

  • After receiving the connection release packet from the server, the client sends ACK=1, ACK= W +1 and its serial number is SEq = U +1. In this case, the client enters the time-wait state. Notice That the TCP connection is not released at this time, and the client can enter the CLOSED state only after 2∗∗MSL (maximum packet segment life) and the corresponding TCB is revoked.

  • The server enters the CLOSED state immediately after receiving an acknowledgement from the client. Similarly, revoking the TCB terminates the TCP connection. As you can see, the server ends the TCP connection earlier than the client.

Summarize the various states of client and server

The client

  • FIN_WAIT_1
  • FIN_WAIT_2
  • TIME_WAIT
  • CLOSED

The service side

  • CLOSE_WAIT
  • LAST_ACK
  • CLOSED

You can not look at the article, think about it in your mind, back, remember a few more times will be clear

Why does the TIME_WAIT state take 2MSL to return to CLOSE?

Although logically, all four packets are sent and we can directly enter the CLOSE state, we must pretend that the network is unreliable and the last ACK can be lost. Therefore, the TIME_WAIT state is used to resend ACK packets that may be lost. The Client sends the last ACK reply, but the ACK may be lost. If the Server does not receive an ACK, it repeatedly sends FIN fragments. Therefore, the Client cannot shut down immediately. It must confirm that the Server received the ACK. The Client enters the TIME_WAIT state after sending an ACK. The Client sets a timer and waits for 2MSL of time. If a FIN is received again within that time, the Client resends the ACK and waits for another 2MSL. The so-called 2MSL is twice the MSL(Maximum Segment Lifetime). MSL refers to the maximum lifetime of a fragment in the network. 2MSL refers to the maximum time required for a send and a reply. If the Client does not receive a FIN again until 2MSL, the Client concludes that the ACK has been successfully received and terminates the TCP connection.

How to implement a reliable protocol?

TCP uses the same pattern. To ensure orderliness, each package has an ID. When establishing a connection, the starting ID is agreed upon and sent one by one. To ensure that no packets are lost, all packets should be answered, but not one by one. Instead, a previous ID should be answered, indicating that all packets have been received. This mode is called cumulative Acknowledgment or cumulative acknowledgment.

To keep track of all packets sent and received, TCP also requires caches on both the sender and the receiver to hold these records. The cache of the sender side is arranged one by one according to the PACKET ID, divided into four parts according to the processing situation, like a project allocation.

  • Part ONE: sent and confirmed. This is what you told your subordinates to do, and you’ve done it, and you should cross it out.
  • Part two: Sent and not confirmed. This part is the part that you have assigned to your subordinates, but has not finished yet. You need to wait for the finished reply before crossing it out.
  • Part three: Not sent, but already waiting to be sent. This is what you haven’t told your subordinates yet, but will soon.
  • Part four: Not sent, and will not be sent for a while. This is the part you haven’t communicated to your subordinates, and won’t for a while.

Why is there a distinction between part III and Part IV? No account, all of a sudden account is not over?

As a project manager, you should first assess in your mind how much work this person can do in a day, based on past work experience and the employee’s ability to respond to stress, etc. If the work arrangement is less, it will not be saturated; If he is assigned too much work, he will never finish it. If you push hard, people might quit their jobs.

How many things can a single employee handle at once? In TCP, the receiving end advertises the size of an Advertised window to the sending end. The size of this window is going to be equal to the second part above plus the third part, which is what we’ve done plus what we’re going to do. Beyond this window, the receiver cannot do it and cannot send.

Thus, the sender needs to maintain the following data structure.

  • LastByteAcked: The boundary between Part I and Part II
  • LastByteSent: The dividing line between Part 2 and Part 3
  • LastByteAcked + AdvertisedWindow: The dividing line between Parts 3 and 4

For the receiver

  • MaxRcvBuffer: The maximum amount of cache;
  • LastByteRead is followed by one that has been received but not yet read by the application layer;
  • NextByteExpected is the dividing line between Part I and Part II

Sequence problems and packet loss problems

Let me take a look at that data structure again

  • The sender

  • The receiving party

At the sending end, 1, 2, and 3 have been sent and confirmed; 4, 5, 6, 7, 8 and 9 have all been sent and have not been confirmed. 10, 11, 12 are not sent yet; 13, 14, 15 are the ones that the receiver has no space to send.

On the receiving end, 1, 2, 3, 4, 5 are completed ACK, but not read; 6 and 7 are waiting to be received; 8 and 9 have been received but have not been ACK.

The current status of the sender and receiver is as follows:

  • 1, 2, 3 there is no problem, both sides have reached an agreement
  • 4, 5 The receiver says it has ACK, but the sender has not received it yet. It may be lost or on the way.
  • 6, 7, 8, 9 must have been sent, but 8, 9 have arrived, but 6, 7 did not arrive, there is a disorder, cache but can not ACK.

Suppose the ACK of 4 is received. Unfortunately, the ACK of 5 is lost, and the packets of 6 and 7 are lost. What should we do?

One method is timeout retry, that is, a timer is set for each packet that has been sent but has not been ACK. After a certain period of time, the packet will be tried again. But how is this timeout evaluated? The time must be longer than RTT; otherwise, unnecessary retransmission may occur. It should not be too long, as the longer the timeout, the slower the access.

To estimate the round trip time, TCP needs to sample the TIME of RTT and then weighted average it to calculate a value that is constantly changing because of the changing network conditions. In addition to sampling the RTT, sample the range of RTT fluctuations and calculate an estimated timeout. Since the retransmission time is constantly changing, we call it Adaptive RetransmissionAlgorithm.

If after some time, 5, 6, and 7 all times out, it resends. The receiver finds that 5 has been received before and discards 5. 6 received, send ACK, request next 7,7 unfortunately lost again. When 7 times out again and retransmission is required, TCP’s policy is to double the timeout interval. Each time a timeout retransmission occurs, the next timeout interval is set to twice the previous value. Two times out, it indicates that the network environment is poor, not frequent repeated send.

The problem with timeout triggered retransmission is that the timeout period can be relatively long. Could there be a faster way?

There is a fast retransmission mechanism. When the receiver receives a packet segment with a sequence number larger than the expected next segment, it detects a lattice in the data stream and sends three redundant ACKS. After receiving the packet, the client retransmits the lost packet segment before the timer expires.

For example, if the receiver finds that 6, 8, and 9 have already been received, but 7 has not arrived, it must be lost. Therefore, the receiver sends three ACKS of 6, asking for the next 7. If the client receives 3, it will find that 7 is indeed lost again.

Flow control

Let’s look at the flow control mechanism, which also carries the size of a window in the packet acknowledgement.

So let’s assume that the window stays the same, that the window is always 9. When the confirmation of 4 arrives, it moves one to the right, at which point the 13th packet can also be sent.

At this time, assuming that the sender sends too much, it will send all 10, 11, 12 and 13 of the third part, and then stop sending, and the unsent part that can be sent is 0.

When the confirmation for packet 5 arrives, the client window slides one more space so that more packets can be sent, such as packet 14.

If the receiver is so slow that there is no space in the cache, the size of the window can be changed by confirmation, or even set to 0, and the sender will temporarily stop sending.

Let’s assume an extreme case where the application on the receiving end never reads the data in the cache. When packet 6 is acknowledged, the window size can no longer be 9 and will be reduced by one to 8.

If the receiver is still not processing the data, the window gets smaller and smaller as more packets are acknowledged until it reaches zero.

If so, the sender periodically sends window probe packets to see if there is an opportunity to resize the window. To prevent low energy window syndrome when the receiver is slow, instead of telling the sender immediately that a byte is empty and then filling it up again, you can update the window when the window is too small until it reaches a certain size or the buffer is half empty.

This is what we call flow control.

At the end

Summary: TCP sequence problems, packet loss problems, connection management, flow control, etc., there are a lot of things, it takes a little time to memorize.

reference

  • Anecdotal stories network

Daily for praise

Ok, everybody, that’s all for this article, you can see people here, they are real fans.

Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article

Wechat search “six pulse Excalibur program life” reply 888 I find a lot of information to you