preface
Small to do network development based on application layer, large to the ubiquitous network in life. While we enjoy this convenience, no one cares about how it is built with such a solid foundation stone. And one of those cornerstones is the TCP protocol. If you look at “three handshakes” and “four waves,” you might think that this is TCP, but it’s not. It only solves the problem of connection and closure. Transport is the more important, more difficult, and more complex problem of TCP. If you look back at the principles of TCP, you will see how many network complications it has to deal with in order to guarantee the “reliability” of upper-layer data transmission. Here are some simple and straightforward examples:
- How do you make sure the data is reliable? — Confirm connection! Close confirmation! Received data confirmation! All kinds of confirmation!!
- What if the other party cannot receive the data due to network or other reasons? — Retry after timeout
- Network conditions are changing, how to determine the timeout period? — Dynamic calculation based on RTT
- Over and over again, tired of retry, resulting in network congestion how to do? — Slow start, congestion avoidance, fast retransmission, fast recovery
- What if the send speed and receive speed do not match? — Sliding window
- In the process of sliding the window, he kept telling me that I could not handle it. What should I do if I did not allow the data to be transmitted? –ZWP
- In the process of sliding the window, it is slow to process, so it is natural for me to send a little data every time, resulting in a low network utilization rate. —Nagle
Any one of the small links, are condensed numerous algorithms, we do not have the ability to understand the implementation of each algorithm, but need to understand the process of TCP implementers.
After combing through all the content, you can probably know:
What mechanisms does TCP provide to ensure data transmission reliability?
What about the “three-way handshake” and the “four-way wave” flow of a TCP connection?
How does the status change during TCP connection and closure?
What are the fields in the TCP header and what are they used for?
What is TCP’s sliding window protocol?
What is the mechanism for timeout retransmission?
How to avoid transmission congestion?
Summary of a.
1. Features of TCP connections
- Provides connection-oriented, reliable byte stream service
- It serves the upper application layer and doesn’t care what is being transmitted, whether it is a binary stream or ASCII characters.
2. How to ensure TCP reliability
- Chunking: Data is split into the most appropriate chunk (UDP datagram length remains the same)
- Waiting for confirmation: The timer waits for the receiving end to send the confirmation request. If no confirmation is received, the receiver resends the request
- Confirmation reply: Send a confirmation reply when received (not immediately, usually a fraction of a second later)
- Data check: Keep the checksum of the head and data to detect whether the data transmission process is changed
- Out of order: The receiver can reorder the data to the application in the correct order
- Duplicate discard: The receiver can discard duplicate packets
- Flow buffering: fixed size buffer (sliding window) at both ends to prevent speed mismatch and loss of data
3. TCP header format
3.1 Macro Position
- From the application layer to the transport layer to the network layer to the link layer, the corresponding header is added to the packet each time. Refer to the previous article on the HTTP protocol
- TCP data is encapsulated in IP datagrams
3.2 Header Format
- TCP header data typically contains 20 bytes (not including optional fields)
- Bytes 1-2: the source port number
- Two bytes 3-4: destination port number
The source IP address in the source port number + IP address in the destination port number + IP address in the IP header uniquely identifies a TCP connection. Socket corresponding to the encoding level.
- 5-8 Four bytes: 32-bit serial number. TCP provides the full-duplex service, and both ends have their own serial numbers.Number: Resolves the problem of out-of-order network packets
How to generate serial number: The serial number cannot be fixed, otherwise the serial number will be confused when it is used repeatedly when the network is disconnected and reconnected. TCP generates an ordinal number based on the clock, incrementing by one every 4 microseconds and starting from 0 again at 2^32-1
- Four bytes 9-12:32-bit confirmation sequence number. If the byte number of the last successfully received data is increased by 1, the ACK value is 1. Confirmation number: Resolves the packet loss problem
- Bit 13: length of header. Because the optional field length is variable
- Back 6bite: reservation
- Then 6BITE: identifier bit. Control various states
- Two bytes 15-16: window size. The number of bytes the receiver expects to receive. Solve flow control problems
- Two bytes 17-18: checksum. It is calculated and stored by the sender and verified by the receiver. Solve data correctness problems
- Two bytes 19-20: the emergency pointer
3.3 Description of identifier Bits
- URG: If the value is 1, the emergency pointer is valid
- ACK: Indicates the confirmation identifier. The value is always 1 after the connection is successfully established. If the value is 1, the confirmation number is valid
- PSH: The receiver sends the packet to the application layer as soon as possible
- RST: resets the identifier and reestablishes the connection
- SYN: This bit is 0 when a new connection is established
- FIN: indicates that the connection is closed
3.4 TCP Option Format
- Each option starts with a 1-byte KIND field indicating the type of option
- The kind option is 0 and 1 and takes only one byte
- Other kinds are followed by a byte len, indicating the total length of the option (including kind and len).
- Kind values of 11,12,13 indicate TCP transactions
3.5 Maximum size of MSS packets
- The most common optional field
- MSS can only occur when SYN is passed (first handshake and second handshake)
- Indicates the maximum length of the packet segment that can be received by the local end
- When establishing a connection, both parties send MSS
- If no, the default value is 536 bytes
2. Connection establishment and release
1. “Three-way handshake” for connection establishment
1.1 Three-way handshake Process
- The client sends a SYN to indicate that it wants to establish a connection to the server. And with the serial ISN
- The server returns an ACK (serial number: client serial number +1) as an acknowledgement. Also send a SYN as a reply (the SYN sequence number is unique to the server)
- The client sends an ACK to acknowledge receipt of the reply (serial number is server serial number +1)
1.2 Why three handshakes
- TCP connections are full-duplex, and data can be transmitted in both directions simultaneously.
- So you want to make sure that both parties can send and receive data at the same time
- First handshake: prove that data can only be sent by sender
- Second handshake: ACK ensures that the recipient can receive data, syn ensures that the recipient can send data
- Third handshake: data can only be received after sending
- It’s actually a four-dimensional exchange of information, but the middle two steps are merged into a handshake.
- Four handshakes are wasted. Two handshakes cannot guarantee “both parties have the sending and receiving function”.
2. Connection closed “four waves”
2.1 Why four waves
- Because TCP connections are full-duplex, data can be transmitted in both directions simultaneously.
- At the same time, TCP supports semi-closing (the function that the sender can receive data after sending).
- Therefore, each direction should be closed separately, and a confirmation reply should be sent when relationship notifications are received
2.2 Why should semi-shutdown be supported
- The client needs to notify the server that its data has been transferred
- While still receiving data from the server
- Using a half-closed single connection is more efficient than using two TCP connections
2.3 Four-way handshake Process
- The closing party sends the FIN to unilaterally shut down data transmission
- Upon receiving the FIN, the server sends an ACK (serial number +1) as an acknowledgement.
- After data transmission is complete, the server also sends a FIN identifier to disable data transmission in this direction
- The client replies with an ACK to confirm the reply
3. Connect and close corresponding states
3.1 Status Description
- The server is in the Listen state while waiting for the client to connect
- The client proactively opens the request and sends the SYN in SYN_SENT state
- The client receives the SYN and ACK and replies with the ACK, and the Established state is waiting for the packet to be sent
- After receiving the ACK, the server is in the Established state and waits to send the packet
- The client is in the FIN_WAIT_1 state after sending the FIN packet
- The server is in close_wait state when receiving the FIN and sending ack
- After receiving an ACK, the client is in fin_WAIT_2 state
- The server enters the last_ACK state after sending the FIN
- After receiving the FIN, the client sends an ACK in time_wait state
- The server is in the Closed state after receiving the ACK
3.2 the time_wait state
- MSL=Maximum Segment LifetIme, which is set based on the TCP implementation. The common values are 30s, 1min, and 2min. For Linux, it takes 30 seconds.
- The state in which the last ACK was sent by the active closing party
- This state must maintain a 2MSL wait time
3.2.1 Why is this necessary?
- Imagine a scenario where the ACK ends up missing and the receiver does not receive it
- In this case, the receiver sends the FIN to the sender again
- The wait time is designed to prevent this from happening and allow the sender to resend the ACK
- Summary: Reserve enough time for the receiver to receive the ACK. Also, ensure that this connection does not interfere with subsequent connections (some routers cache packets).
3.2.2 Consequences of this?
- During this 2MSL wait time, the connection (socket, IP +port) will be unavailable
- When Linux reports too many Open files, it is necessary to check some code for creating a large number of socket connections that are not immediately released after being closed
- When the client connects to the server, the port of the client is not specified. Because the client shuts down and then starts up immediately, the theory is that the port is occupied. Similarly, actively shutting down the server and starting it immediately within 2MSL will report an error indicating that the port is occupied
- In the case of multiple concurrent short connections, a large number of Time_wait states occur. These two parameters solve the problem, but they violate THE TCP protocol and are risky. The parameters are tcp_tw_reuse and tcp_TW_recycle
- If the server is developing, you can set keep-alive to let the client actively close the connection to solve the problem
4. Reset the packet segment
If any error occurs in a packet segment sent from the source address to the destination address, the packet segment will be reset. The RST in the header field is used for “reset”. These errors include the following
- The port is not listening
- Abort: Aborts the connection by sending an RST instead of a FIN
5. Open at the same time
- Both applications perform active opening at the same time, called “simultaneous opening”
- This rarely happens
- Both ends send SYN and enter the SYN_SENT state
- Open one connection instead of two
- There are four packets exchanged, four handshakes.
6. Simultaneously close
- Both sides execute active shutdown simultaneously
- Four packets are exchanged
- The state is different from normal shutdown
7. Server handling of concurrent requests
- There is a queue of fixed length at the end of the waiting connection (the length is called “backlogs” and in most cases is 5)
- The connection in this queue is: has completed the three-way handshake, but has not yet been received by the application layer (the application layer does not know about the connection until the last ACK has been received)
- The connection that the application layer receives the request is removed from the queue
- When a new request arrives, the queue is determined to accept the connection
- Backlog value: the maximum number of TCP listening endpoints that have been received by TCP but are waiting for the application layer to receive them. It has nothing to do with the maximum number of connections allowed by the system or the maximum number of concurrent requests received by the server
Data transmission
1. Classification of data transmitted by TCP
- Block data transmission: Large volume, packet segment is often full
- Interactive data transmission: Small amount, packet segment is small packet, a large number of small packet, wan transmission will increase congestion
- TCP processes two types of data, which have different characteristics and require different transmission technologies
2. Transmission technology of interactive data
2.1 Confirmation of delay
- Concept: When TCP receives data, it does not send an ACK immediately, but later
- Purpose: To send an ACK along with the data that needs to be sent in that direction to reduce overhead
- Features: The recipient does not need to acknowledge each packet received. ACk is cumulative, indicating that the recipient has correctly received all bytes up to the acknowledgement sequence number -1
- Delay time: most of them are 200ms. Cannot exceed 500ms
2.2 Nagle algorithm
- What problem to solve: Congestion in wan caused by tiny packets
- Core: Reduces the number of small packets transmitted over the wan
- Principle: A TCP connection can have at most one incomplete packet that has not been confirmed. No other packets can be sent until the confirmation of this packet arrives. TCP collects these packets and sends them out as a packet before acknowledgement arrives
- Advantages: Adaptive. The faster the confirmation arrives, the faster the data is sent. Confirm slower and send fewer groups.
- Usage Note: Lans rarely use this algorithm. Disable this algorithm in some special scenarios
3. Transmission of block data
- It mainly uses the sliding window protocol
Sliding window protocol
1. An overview of the
- What problems have been solved: ensuring reliable transmission and packet disorder when sender and receiver rates do not match
- Mechanism: The receiver notifies the sender of the maximum value that can be received based on the current buffer size. The sender sends data according to the processing capacity of the receiver. This coordination mechanism prevents the receiving end from being unable to process.
- Window size: The value sent by the receiver to the sender is called the window size
2. Data structure of the TCP buffer
- The receiver:
- LastByteRead: Position read by the buffer
- NextByteExpected: Last position of consecutive packets received
- LastByteRcvd: The last location of the received package
- Blank space in the middle: data did not arrive
- The sender:
- LastByteAcked: The position of the ack at the receiving end, indicating that an acknowledgement has been sent successfully
- LastByteSent: it has been sent and no Ack has been successfully confirmed
- LastByteWritten: Where the upper application is being written
3. Schematic diagram of sliding window
3.1 Initial schematic diagram
- Black boxes represent sliding Windows
- #1 indicates that an ACK has been received
- #2 indicates that no ACK data has been received
- #3 indicates what has not been sent in the window (the receiver has room)
- #4 Data outside the window (receiver has no space)
3.2 Schematic diagram of sliding process
- Received an ACK of 36 and sent bytes of 46-51
4. Congested Windows
- What problem is solved: the problem that the sender sends too fast, which causes the traffic jam of the transfer router
- Mechanism: the sender adds a congestion window (CWND). Each time it receives an ACK, the window value increases by 1. When sending a message, take the minimum size of the congested window and the window sent by the receiver
- Play the role of sender flow control
5. Problems caused by sliding Windows
5.1 zero window
- How this occurs: The processing speed of the receiving end is slow, and the sending speed is fast. The window size is slowly set to 0
- How to solve: ZWP technology. Send a ZWP packet to the receiver to ack his window size.
5.2 Confused window syndrome
- How it happens: The receiver is too busy to fetch all the data, causing the sender to get smaller and smaller. Finally, the sender is only allowed to send a few bytes of data.
- Disadvantages: The data is much smaller than TCP and IP headers, and the network utilization is too low.
- How to fix it: Avoid responding to small window sizes.
- Sender: The Nagle algorithm mentioned earlier.
- Receiver: if the window size is smaller than a certain value, ack (0) is used to prevent sending data. Wait until the window gets bigger.
V. Timeout and retransmission
1. An overview of the
- TCP provides a reliable transport layer using an acknowledgement mechanism.
- But data and validation can be lost
- TCP solves this problem by setting a timer at send time
- When the timer time expires and no confirmation is received, the data is retransmitted
2. Timer type managed by TCP
- Retransmission timer: waits for confirmation
- Hold timer: Keep window size information flowing
- Keepalive timer: Detects idle connection crashes or restarts
- 2MSL timer: checks the time_wait state
3. Timeout retransmission mechanism
3.1 background
- The Ack acknowledgement from the receiver to the sender confirms only the last consecutive packet
- For example, when sending 1,2,3,4,5, the receiver receives 1,2, and ack3, and then receives 4 (it has not received 3 yet). In this case, TCP will not directly confirm 4 without skipping 3, otherwise the sender will think that 3 has also received. What can you think of at this point? What about TCP?
3.1 Timeout Retransmission Policy for Passive Waiting
- The intuitive approach is: the receiver does nothing, waits for the sender to time out, and then retransmits.
- Disadvantages: the sender does not know whether to resend 3 or 3,4,5
- If the sender if only send 3: save width, but slow
- If the sender if send 3,4,5: fast, but waste broadband
- Anyway, they’re all passively waiting for a timeout, which can be very long. So TCP does not use this method
3.2 Active fast retransmission mechanism
3.2.1 overview
- The name is Fast Retransmit
- Data driven retransmission rather than actually-driven retransmission
3.2.2 Implementation Principle
- If the packet is not delivered, it will continue to ack the last packet that might have been lost
- If the sender receives three consecutive identical ACKS, it retransmits. Don’t wait for timeout
- Data 1,2,3,4,5 occur in the diagram
- Data 1 arrives and ack2 occurs
- Data 2 did not arrive for some reason
- When a 3 is received later, the receiver is not ACK4, nor is it waiting. It’s active ACK2
- If you get 4,5, you’re always active ack2
- The client receives ack2 three times and retransmits 2
- 2 received, combined with the previous received 3,4,5, directly ack6
3.2.3 Advantages and Disadvantages of fast retransmission
- Fixed the problem of passive waiting timeout
- Can’t solve one of the problems before retransmission, or all of them.
- In the example above, retransmit 2, or retransmit 2,3,4,5. Because it’s not clear who sent ack2 back
3.3 the SACK method
3.3.1 overview
- In order to solve the shortcoming of fast retransmission, a better SACK retransmission strategy is proposed
- Based on fast retransmission, with a SACK added to the TCP header
- What problem was resolved: which timeout packets should the client send
3.3.2 Implementation Principle
- SACK records a range of values indicating what data was received
- This function is enabled by default for linux2.4, and the tcp-sack parameter is required for earlier versions
- SACK is only an auxiliary mode, and the sender cannot rely entirely on SACK. Again, we rely on ACK and Timout
3.3.3 Duplicate SACK (D – SACK)
- Using the scope of the SACK flag, you can also tell the sender what data has been received twice
- You can let the sender know whether the sent packet is lost or the returned ACK packet is lost
4. Determine the timeout period
4.1 background
- Both the router and network traffic will change
- Therefore, the timeout cannot be set to a fixed value
- Long timeout: slow retransmission, low efficiency, and poor performance
- Short timeout: resends when not lost, resulting in network congestion, resulting in more timeout and multiple sends
- TCP tracks these changes and dynamically changes the timeout (RTO) accordingly
4.2 How do I Dynamically Change
- The time interval of each retransmission is twice as long as that of the last one, until the maximum interval is 64s, which is called “exponential retreat”.
- Generally, the interval between the first retransmission and the last retransmission abandonment is 9 minutes
- Rely on past round-trip time calculations (RTT) to calculate dynamically
4.3 Calculation of round trip time (RTT)
- It’s not simply the difference between ack time and send time. Because of retransmission, network congestion, all kinds of variable factors.
- You do it by sampling a number of times and then doing an estimate
- TCP uses the following methods:
- Smoothed RTT estimator
- Smoothed mean deviation estimator
4.4. Specific calculation of retransmission time
- Calculate the round trip time (RTT) and save the measurement results
- Maintain a smoothed RTT estimator and smoothed mean deviation estimator from the measurement results
- The next retransmission time is calculated according to these two estimators
5. Congestion caused by timeout retransmission
5.1 Why Does Retransmission Cause Congestion
- When the network delay increases suddenly, TCP retransmits data
- However, excessive retransmission will increase the network burden, resulting in greater delay and packet loss, a vicious cycle
- This is TCP congestion
5.2 Congestion – Congestion control algorithm
- Slow start: Decreases the transmission rate of packets entering the network
- Congestion avoidance: Algorithms for handling lost packets
- The fast retransmission
- Fast recovery
Other timers
1. Stick to the timer
1.1 The significance of the existence of timer
- When the window size is 0, the receiver sends an ACK of the window size with no data
- But what happens if this ACK is lost? The two parties may end up waiting
- Insist that the timer periodically inquire to the receiver whether the window has been enlarged. These outgoing segments are called window probes
1.2 Adhere to the timer start time
- When the sender is notified that the receiver window size is 0
1.3 The same and different from timeout retransmission
- Same: Same retransmission interval
- Difference: Window probe never gives up sending until the window is opened or the process is closed. And the timeout retransmission a certain time to give up sending
2. Keepalive timer
2.1 Meaning of the keepalive timer
- How does the server detect if the client is alive when there is no data transfer over TCP
reference
- TCP/IP Volume 1: Protocols
- Coolshell. Cn/articles / 11…
- Coolshell. Cn/articles / 11…