The article directories

    • preface
    • Chapter 1 Overview
      • layered
      • TCP/IP layer
      • The domain name system
      • Points with
      • Client-server model
      • The port number
    • IP: Internet protocol
      • The introduction
      • The IP header
    • The Ping program
      • The introduction
    • UDP: user datagram protocol
      • The introduction
      • UDP test and
      • IP fragmentation
      • Maximum UDP datagram length
      • UDP server design
    • TCP: Transmission control protocol
      • TCP service
        • TCP provides reliability in the following ways:
      • TCP’s first
    • Establishing and terminating a TCP connection
      • The introduction
      • Connection and termination of a connection
        • Three-way handshake
        • Four times to wave
        • Connection establishment timeout
      • TCP half-closed
      • TCP status transition diagram
      • 2MSL Wait status
      • TCP Server Design
      • summary
    • TCP timeout and retransmission
      • Congestion avoidance algorithm
      • Fast retransmission and fast recovery algorithm
      • regrouping
    • TCP keepalive timer

preface

Unconsciously, to the junior. Before you know it, you’re looking for a summer internship. Look at the old and you’ll know the new. (After reviewing the data structure for two days, I still prefer this one.) So here we are.

The references for this article are Richard Stevens, my hero

Stevens died of illness in 1999 (he was 48, born in 1951) and left us seven books, in reverse chronological order:

UNIX Network Programming, Volume 2, Second Edition: Interprocess Communications, Prentice Hall, 1999.

UNIX Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI, Prentice Hall, 1998.
 
TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP,and the UNIX Domain Protocols, Addison-Wesley, 1996.
 
TCP/IP Illustrated, Volume 2: The Implementation, Addison-Wesley, 1995.
 
TCP/IP Illustrated, Volume 1: The Protocols, Addison-Wesley, 1994.
 
Advanced Programming in the UNIX Environment, Addison-Wesley, 1992.
 
UNIX Network Programming, Prentice Hall, 1990.
Copy the code

Translated into familiar languages:

UNP       (  UNIX Network Programming, Prentice Hall, 1990.UNP Volume 1 (UNIX Network Programming, Volume1, Second Edition: Networking APIs: Sockets and XTI, Prentice Hall, 1998.UNP Volume 2 (UNIX Network Programming, Volume2, Second Edition: Interprocess Communications, Prentice Hall, 1999.TCP/IP Illustrated, Volume1: The Protocols, Addison-Wesley, 1994.TCP/IP Illustrated, Volume2: The Implementation, Addison-Wesley, 1995.TCP/IP Illustrated, Volume3: TCP for Transactions, HTTP, NNTP,and the UNIX Domain Protocols, Addison-Wesley, 1996.)

APUE(  Advanced Programming in the UNIX Environment, Addison-Wesley, 1992.)
Copy the code

Chapter 1 Overview

Many different manufacturers make various models of computers that run entirely different operating systems, but the TCP/IP family of protocols allows them to communicate with each other. This is surprising, because it has gone far beyond what was originally envisaged.

layered

Network protocols are usually developed at different levels, each of which is responsible for different communication functions. A protocol family, such as TCP/IP, is a combination of multiple protocols at different levels. TCP/IP is generally regarded as a layer 4 protocol system, as shown in Figure 1-1

TCP provides highly reliable data communication between two hosts. It does this by dividing the data handed to it by the application into appropriate chunks for the network layer below, confirming the packets received, and setting the timeout clock for sending the final confirmation packets. Because the transport layer provides highly reliable end-to-end communication, the application layer can ignore all these details. UDP, on the other hand, provides a very simple service for the application layer. It simply sends packets called datagrams from one host to another, with no guarantee that the datagrams will reach the other end. Any required reliability must be provided by the application layer. The two transport layer protocols have different uses in different applications, as we will see later.

In the TCP/IP protocol family, network layer IP provides an unreliable service. That is, it simply moves packets from the source to the destination as quickly as possible without providing any guarantee of reliability. TCP, on the other hand, provides a reliable transport layer over the unreliable IP layer. To provide this reliable service, TCP employs mechanisms such as time-out retransmission, sending and receiving end to end confirmation grouping (three-way handshake). Thus, the transport layer and the network layer are responsible for different functions.


TCP/IP layer

TCP and UDP are two of the most well-known transport-layer protocols, and both use IP as the network layer protocol. But unlike TCP, UDP is unreliable and does not guarantee that datagrams will reach their final destination safely and unerringly.

The domain name system

Although the IP address identifies the network interface on the host and allows access to the host, the host name is the most popular. In the TCP/IP world, the Domain Name System (DNS) is a distributed database that provides mapping information between IP addresses and host names. Now, we must understand that any application can call a standard library function to view the IP address of a host with a given name. Similarly, the system provides an inverse function – given the IP address of the host, see its corresponding host name. Most applications that use a host name as a parameter can also take an IP address as a parameter.

1.6 packaging

When an application uses TCP to transmit data, it is fed into the protocol stack and then passed through each layer one by one until it is sent to the network as a stream of bits. Each layer adds some header information (and sometimes tail information) to the received data, as shown in Figure 1-7.

Points with

When the destination host receives an Ethernet data frame, the data starts to rise from the bottom of the protocol stack, removing the header added by each layer of protocol. Each layer protocol box checks the protocol identifier in the packet header to determine the upper-layer protocol that receives the data. This process is called Demultiplexing and Figure 1-8 shows how it occurs.

Not all of these layered protocol boxes are perfect.

Client-server model

Most web applications are written with the assumption that the client is on one side and the server is on the other, so that the server can provide specific services to the client. These services can be divided into two types: repetitive or concurrent. Repeat servers interact through the following steps:

I1. Wait for a customer request to arrive. I2. Handle customer requests. I3. Send the response to the client that sent the request. I4. Return to I1.Copy the code

The primary problem with duplicate servers is in the I2 state. At this point, it cannot serve other clients.

Accordingly, the parallel server takes the following steps:

C1. Waiting for a customer request to arrive. C2. Start a new server to process the client's request. A new process, task, or thread may be generated during this time and depend on the support of the underlying operating system. How this happens depends on the operating system. The new server is generated to handle all requests from customers. When the processing is complete, terminate the new server. C3. Return to C1Step.Copy the code

The advantage of a concurrent server is that it processes customer requests in the same way that other servers are generated. That is, each client has its own server. If the operating system allows multitasking, you can serve multiple customers simultaneously.

In general, TCP servers are concurrent, while UDP servers are repetitive, but there are some exceptions.

The port number

TCP and UDP use 16-bit port numbers to identify applications. So how are these port numbers chosen? Servers are generally identified by well-known port numbers. For example, for each TCP/IP implementation, the TCP port number of the FTP server is 21, the TCP port number of each Telnet server is 23, and the UDP port number of each TFTP (Simple File Transfer Protocol) server is 69. Any TCP/IP implementation provides services with well-known port numbers between 1 and 1023. These well-known port Numbers are managed by the Internet Assigned Numbers Authority (IANA).

As of 1992, well-known port numbers range from 1 to 255. Ports between 256 and 1023 are typically used by Unix systems to provide uniX-specific services-that is, to provide services that are unique to Unix systems but may not be available to other operating systems. IANA now manages all port numbers between 1 and 1023.

The client usually does not care about the port number it uses, as long as it is unique on the local machine. Client port numbers are also called temporary port numbers (that is, short-lived). This is because it usually exists only when the user runs the client, whereas a server runs its services as long as the host is on.

Most TCP/IP implementations assign temporary ports between 1024 and 5000. Port numbers larger than 5000 are reserved for other servers (services not commonly used on the Internet). We can see many such points for temporary ports later


IP: Internet protocol

The introduction

IP is the core protocol in the TCP/IP protocol family. All TCP, UDP, ICMP, and IGMP data is transmitted in IP datagrams format. Many people new to TCP/IP are surprised that IP provides an unreliable, connectionless datagram delivery service.

Unreliable means it does not guarantee that the ipdatagram will successfully reach its destination. IP provides only the best transport services. If something goes wrong, such as a router temporarily running out of buffers, IP has a simple error-handling algorithm: discard the datagram and send an ICMP message to the source. Any required reliability must be provided by the upper layer (such as TCP).

The term connectionless means that IP does not maintain any state information about subsequent datagrams. Each datagram is handled independently of each other. This also indicates that IP datagrams can be received in a different order than they were sent. If A source sends two consecutive datagrams (first A, then B) to the same home address, each datagram is routed independently and may take A different route, so THAT B may arrive before A.


The IP header

Figure 3-1 shows the format of an IP datagram. A normal IP header is 20 bytes, unless an option field is included.

Analyze the header in Figure 3-1. The highest bit on the left is 0 bit; The lowest order is on the right, 31 bits. The 32-bit values of four bytes are transmitted in the following order: 0 to 7 bits, 8 to 15 bits, 1, 6 to 23 bits, and finally 24 to 31 bits. This transmission order is called big Endian byte order. Since all binary integers in the TCP/IP header are required to be in this order when transmitted over the network, it is also called network byte order. Machines that store binary integers in other forms, such as the Little Endian format, must convert the header to network byte order before transmitting the data.

The current protocol version is 4, so IP is sometimes referred to as IPv4.

Although it is possible to transmit an IP datagram up to 65535 bytes, most link layers fragment it. Also, the host is required not to receive datagrams larger than 576 bytes. Because TCP divides user data into slices, this limitation generally does not affect TCP. In fact, most current implementations (especially those that support the network file system NFS) allow IP datagrams larger than 8192 bytes.


The Ping program

The introduction

The name “ping” comes from the sonar location operation. The ping program was written by Mike Muuss to test the reachability of another host. The program sends an ICMP echo request to the host and waits for an ICMP echo reply

Generally speaking, if you cannot ping a host, you cannot Telnet or FTP to that host. Conversely, if you can’t Telnet to a host, you can usually use the ping program to determine what the problem is. The ping program also measures the round-trip time to the host, indicating “how far away” it is.


UDP: user datagram protocol

The introduction

UDP is a simple transport-layer protocol for datagrams: each output operation of a process produces exactly one UDP datagram, which is assembled into an IP datagram to be sent. Unlike stream-oriented protocols such as TCP, the total data generated by an application may have little to do with the individual IP datagrams that are actually sent.

Although independent of each other, TCP and UDP usually choose the same port number if both protocols provide a well-known service. This is purely for ease of use and is not a requirement of the protocol itself. The UDP length field refers to the length of the UDP header and UDP data in bytes. The minimum value of this field is 8 bytes (sending a 0 byte UDP datagram is OK). This UDP length is redundant. IP datagram length refers to the full length of the datagram, so the UDP datagram length is the full length minus the length of the IP header (this value is specified in the header length field).

UDP test and

UDP verifies and overwrites UDP headers and UDP data. Recall that the IP header checksum only covers the IP header – not any data in the IP datagram. Both UDP and TCP have checksums in headers that override their headers and data. UDP checksum is optional, while TCP checksum is required.

If the sender does not calculate the sum and the receiver detects an error in the sum, the UDP datagram is quietly discarded. No error message is generated (this is also done when the IP layer detects AN IP header check and error). UDP checksum is an end-to-end checksum. It is calculated by the sender and verified by the receiver. The purpose is to detect any changes in the UDP header and data between the sender and the receiver.

Although UDP validations and are optional, they should always be used. In the 1980s, some computer manufacturers turned off UDP checksum by default to improve the speed of Network File System (NFS) using UDP. This may be acceptable on a single LAN, but most errors can be detected by cyclic redundancy checks on link-layer data frames (such as Ethernet or tok-ring data frames) as the datagram passes through the router, causing the transmission to fail. Believe it or not, there are also software and hardware bugs in routers that modify data in datagrams. If you turn off the end-to-end UDP validation and functionality, these errors cannot be detected in UDP datagrams. Additionally, some data link layer protocols (such as SLIP) do not have any form of data link validation and. Host Requirements RFC states that UDP validation and options are turned on by default. It also states that if the sender has calculated the sum, then the receiver must verify the received sum (if the received sum is not zero). However, many systems do not comply with this and only validate received checksums when exit checksums and options are turned on.

IP fragmentation

The physical network layer generally limits the maximum length of each data frame to be sent. Whenever the IP layer receives an IP packet to be sent, it determines which local interface to send data to (uplink selection) and queries the interface to obtain its MTU. IP compares the MTU with the datagram length and fragments it if necessary. Sharding can occur on the original sender host or on the intermediate router. An IP datagram is fragmented and reassembled only when it reaches its destination (unlike other network protocols, which require reassembled at the next station, not at the final destination). The reassembly is done by the DESTINATION IP layer, and the purpose is to make the sharding and reassembly process transparent to the transport layer (TCP and UDP), except for some possible off-level operations. Datagrams that have been sharded may be sharded again (possibly more than once).

Although the IP sharding process appears transparent, there is one thing that makes it hard to use: retransmitting the entire datagram even if only one piece of data is lost. Why does this happen? This is because there is no timeout retransmission mechanism at the IP layer itself – the higher level is responsible for timeout and retransmission (TCP has timeout and retransmission mechanisms, but UDP does not). Some UDP applications also perform timeouts and retransmissions themselves. When a TCP packet segment is lost, TCP resends the entire TCP packet segment after the timeout. The packet segment corresponds to an IP packet packet. There is no way to retransmit only one datagram slice in a datagram. In fact, if the intermediate router shards the datagram, not the originating system, then the originating system has no way of knowing how the datagram was shard. Sharding is often avoided for this reason.

Using UDP can easily lead to IP sharding (as we’ll see later, TCP tries to avoid sharding, but it’s nearly impossible for an application to force TCP to send a long message segment that needs to be sharded).

Maximum UDP datagram length

Theoretically, the maximum length of an IP datagram is 65535 bytes, which is limited by the 16-bit total length field at the head of the IP. Excluding the 20-byte IP header and 8-byte UDP header, the maximum length of user data in a UDP datagram is 65507 bytes. However, most implementations provide lengths that are less than this maximum.

We will encounter two constraints. First, applications may be limited by their programming interfaces. The Socket API provides a function that an application can call to set the length of the receive and send caches. For UDP sockets, this length is directly related to the maximum length of UDP datagrams that an application can read and write. Most systems today provide UDP datagrams that can read and write more than 8192 bytes by default (this default is used because 8192 is the default value for NFS read and write user data).

The second limitation comes from the kernel implementation of TCP/IP. There may be implementation features (or bugs) that make the IP datagram less than 65535 bytes long.

UDP server design

The UDP datagrams from the customer. The IP header contains the SOURCE and destination IP addresses, and the UDP header contains the UDP port numbers of the source and destination ends. When an application receives a UDP message, the operating system must tell it who sent the message, namely the source IP address and port number. This feature allows a single interactive UDP server to process multiple clients. Send a reply back to each client that sent the request.

Some applications need to know to whom the datagram is being sent, the destination IP address. For example, the Host Requirements RFC specifies that the TFTP server must ignore received datagrams destined for broadcast addresses. This requires the operating system to pass the destination IP address to the application from the received UDP datagram. Unfortunately, not all implementations provide this functionality. The Socket API provides this functionality with the IP_RECVDSTADDR socket option.

Most UDP servers are interactive servers. This means that a single server process processes all customer requests on a single UDP port (a known port on the server). Typically, each UDP port used by a program is associated with a finite size input queue. This means that requests arriving from different customers at about the same time will be automatically queued by UDP. The received UDP data is returned to the application in the order it was received (when the application requests the next datagram).

However, it is possible that queuing overflow causes UDP modules in the kernel to drop datagrams.


TCP: Transmission control protocol

TCP service

Although both TCP and TCP use the same network layer (IP), TCP provides completely different services to the application layer. TCP provides a connection-oriented, reliable byte stream service. Connection-oriented means that two applications using TCP (typically a client and a server) must establish a TCP connection before exchanging data with each other. The process is very similar to making a phone call. You dial and ring, wait for the other person to pick up and say “Hello”, and then explain who it is. In a TCP connection, only two parties communicate with each other. Broadcast and multicast cannot be used with TCP.

TCP provides reliability in the following ways:

• Application data is split into blocks that TCP considers best for sending. Unlike UDP, the application-generated datagrams remain the same length. The unit of information transmitted from TCP to IP is called a packet segment or segment. When TCP sends a segment, it starts a timer and waits for the destination to confirm receipt of the segment. If an acknowledgement cannot be received in time, the packet segment is resend. • When TCP receives data from the other end of the TCP connection, it sends an acknowledgement. This confirmation is not sent immediately and is usually delayed by a fraction of a second. • TCP will keep its header and data checksum. This is an end-to-end checksum to detect any changes in the data during transmission. If a segment is checked and an error occurs, TCP discards the segment and does not acknowledge receipt of the segment (expecting the originator to time out and retransmit). • TCP packet segments are transmitted as IP datagrams, and the arrival of IP datagrams may be out of order, so the arrival of TCP packet segments may also be out of order. If necessary, TCP resorts the received data to the application layer in the correct order. • Since IP datagrams can duplicate, TCP receivers must discard duplicate data. •TCP also provides traffic control. Each side of a TCP connection has a fixed amount of buffer space. The receiving end of TCP allows the other end to send only as much data as the receiving end buffer can accept. This prevents faster hosts from overrunning the buffer for slower hosts.

The two applications exchange an 8-bit byte stream over a TCP connection. TCP does not insert record identifiers in the byte stream. We call this a Byte stream service. If one side’s application sends 10 bytes, then 20 bytes, then 50 bytes, the other side of the connection has no way of knowing how many bytes the sender sent each time. The receiver can receive the 80 bytes in four 20-byte increments. One end places the byte stream on the TCP connection, and the same byte stream appears on the other end of the TCP connection. In addition, TCP does not interpret the contents of the byte stream. TCP does not know whether the byte stream transmitted is binary data, ASCII characters, EBCDIC characters, or other types of data. Byte streams are interpreted by the application layer on both sides of the TCP connection.

TCP’s first

TCP data is encapsulated in an IP datagram,

Each TCP segment contains the source and destination port numbers, which are used to find the originating and ending application processes. These two values together with the source IP address and destination IP address in the IP header uniquely determine a TCP connection.


Establishing and terminating a TCP connection

The introduction

TCP is a connection-oriented protocol. Before either party can send data to the other, a connection must be established between the two parties.

Connection and termination of a connection

Three-way handshake

To establish a TCP connection:

1The requester (often called the client) sends a SYN segment that identifies the port of the server to which the client intends to connect, and the initial sequence number (ISN, in this case)1415531521). The SYN segment is the packet segment1.2The server sends back a SYN packet segment that contains the initial sequence number of the server (packet segment)2) in response. Also, set the confirmation sequence number to the ISN plus of the customer1To confirm the client's SYN packet segment. A SYN will occupy a sequence number.3The customer must set the confirmation sequence number to the ISN plus of the server1To confirm the SYN packet segment of the server3).Copy the code

These three segments complete the establishment of the connection. This process is also called a three-way handshake.

The end that sends the first SYN performs active open. Receive this SYN and send it back to the other end of the SYN to perform passive open

When one end sends its SYN to establish a connection, it selects an initial sequence number for the connection. The ISN changes over time, so each connection will have a different ISN.

Four times to wave

It takes three handshakes to establish a connection and four handshakes to terminate a connection. This is caused by TCP half-close. Since a TCP connection is full-duplex (that is, data can be transmitted in both directions at the same time), each direction must be closed separately. The principle is that a party can send a FIN to terminate the connection when it has completed its data transmission task. When one end receives a FIN, it must notify the application layer that the other end has terminated traffic in that direction. Sending FIN is usually the result of an application layer shutdown. Receiving a FIN simply means that there is no data flow in that direction. A TCP connection can still send data after receiving a FIN. This is possible with half-closed applications, although in practice only a few TCP applications do this.



18-3The termination of a connection is initiated by the Telnet client when the connection is closed. This causes the TCP client to send a FIN, which shuts down the transfer of data from the client to the server. When the server receives the FIN, it sends back an ACK with the received sequence number plus one (packet segment 5). As with the SYN, a FIN takes a sequence number. The TCP server also sends an end-of-file character to the application (that is, the discard server). The server program then closes its connection, causing its TCP end to send a FIN (segment 6), and the client must send back an acknowledgement with the acknowledgement sequence set to receive sequence plus 1 (segment 7).

Figure 18-4 shows the typical handshake order for terminating a connection. We dropped the serial number. In this figure, sending fins, whose ACKS are automatically generated by TCP software, causes applications to close their connections.

Connection establishment timeout

There are many situations where a connection cannot be established. In one case, the server host is not in a normal state.

TCP half-closed

TCP provides the ability for one end of a connection to receive data from the other end after it has finished sending. This is called a semi-shutdown. I rarely use software, but I’ve been cheated.

To use this feature, the programming interface must provide a way for the application to say, “I have completed data transfer, so SEND an end-of-file (FIN) to the other end, but I want to receive data from the other end until it sends an end-of-file (FIN) to me.” If the application calls shutdown instead of close, and the second argument has a value of 1, the API of the socket supports semi-closing. However, most applications terminate the connection in both directions by calling close.

Now I know why I was cheated, I must manually close()!!

Although I was quick to react to close(), I didn’t know the reason until today.



Figure 18-10 shows a typical example of a semi-shutdown. Let the left side of the client start half closed, of course, can also start from the other side. The first two packet segments are the same as those shown in Figure 18-4: the FIN sent by the initial end and then the ACK packet sent by the other end to the FIN. But this is different from Figure 18-4, because the receiving side that is half closed can still send data. We only display one datagram segment and one ACK segment, but may send many datagram segments. When the receiving end completes its data transfer, it sends a FIN to close the connection in that direction, which sends an end-of-file character to the application process that initiated the half-closed. When the second FIN is validated, the connection is completely closed.

TCP status transition diagram

2MSL Wait status

The TIME_WAIT state is also called the 2MSL wait state. Each TCP implementation must select a Maximum Segment Lifetime (MSL). It is the maximum amount of time that any packet segment is in the network before being discarded. We know that this time is finite because TCP segments travel across the network as IP datagrams, and IP datagrams have TTL fields that limit their lifetime. RFC 793 [Postel 1981c] indicates that MSL is 2 minutes. However, common values in implementations are 30 seconds, 1 minute, or 2 minutes. In practice, the TTL limit for IP datagrams is based on the hop count, not the timer.

For a given MSL value for a particular implementation, the rule is that when TCP performs an active shutdown and sends back the last ACK, the connection must stay in TIME_WAIT for twice as long as MSL. This causes TCP to send the last ACK again in case the ACK is lost (the other end times out and resends the last FIN).

Another consequence of this 2MSL wait is that the socket that defines the TCP connection (client’S IP address and port number, server’s IP address and port number) can no longer be used during the 2MSL wait. This connection can only be used after 2MSL ends.

Unfortunately, most TCP implementations, such as the Berkeley version, impose more stringent restrictions. During the 2MSL wait, the local port used in the socket is no longer available by default. Some implementations and apis provide a way around this limitation. When using the socket API, the SO_REUSEADDR option is specified. It will let the caller assign to the local port on a 2MSL wait, but we’ll see that TCP will still in principle avoid using ports that are still on a 2MSL connection.

When the connection is in 2MSL wait, any late segments are discarded. Because connections defined by the socket pair in wait 2MSL cannot be reused during this time, when establishing a valid connection, A late fragment from an earlier incarnation of the connection cannot be misinterpreted as part of a new connection (a connection is defined by a socket pair). A new instance of a connection is called its surrogate. We say that it is normal for the customer in Figure 18-13 to perform an active shutdown and enter TIME_WAIT. Servers typically perform a passive shutdown and do not enter a TIME_WAIT state. This implies that if we terminate a client and restart it immediately, the new client will not be able to reuse the same local port. This is not a problem because the client uses the local port and doesn’t care what the port number is. For servers, however, the situation is different because servers use familiar ports. If we terminate a server program that has already established a connection and try to restart it immediately, the server program will not be able to assign its familiar port to its endpoint because that port is part of a 2MSL connection. It takes 1 to 4 minutes before restarting the server program.

TCP Server Design

Look for me to get started

Socket/Epoll/Pthread/ design mode you must haveCopy the code

summary

Before two processes can exchange data using TCP, a connection must be established between them. When you’re done, close the connection. This chapter has detailed how to establish a connection using a three-way handshake and close a connection using four message segments.

The key to understanding TCP operations is its state transition diagram.

A TCP connection is uniquely identified by a 4-tuple: the local IP address, the local port number, the remote IP address, and the remote port number. Whenever a connection is closed, one end must keep the connection, and we see that the TIME_WAIT state handles this problem. The rule of thumb is that the end performing the active opening should stay in this state for twice as long as the MSL value specified in the TCP implementation.


TCP timeout and retransmission

TCP provides a reliable transport layer. One of the methods it uses is to acknowledge the data received from the other end. But data and validation can be lost. TCP solves this problem by setting a timer when sending. If no acknowledgement is received when the timer overflows, it retransmits the data. The key for any implementation is the timeout and retransmission strategy, how to determine the timeout interval and how often to retransmit.

T, C, and P manage four different timers for each connection.

1The retransmission timer is used when you want to receive an acknowledgement from the other end.2The persist timer keeps window size information flowing even if the other end closes its receive window.3A Keepalive timer detects when the other end of an idle connection crashes or restarts.4) 2The MSL timer measures how long a connection is in TIME_WAIT state.Copy the code

Congestion avoidance algorithm

The slow start algorithm is a way to initiate a data stream over a connection, but sometimes we reach the intermediate router limit where packets are discarded. Congestion avoidance algorithm is a method to deal with lost packets. The algorithm assumes that loss due to packet corruption is very small (much less than 1 percent), so packet loss implies congestion somewhere on the network between the source and destination hosts. There are two kinds of indication of packet loss: timeout has occurred and duplicate acknowledgements have been received.

Congestion avoidance algorithm and slow start algorithm are two independent algorithms with different purposes. But when congestion occurs, we want to reduce the rate at which packets are transferred into the network, so we can call slow start to do this. In practice these two algorithms are usually implemented together.

The congestion avoidance algorithm and the slow start algorithm need to maintain two variables for each connection: a congestion window CWND and a slow start threshold SSthRESH. The working process of the algorithm thus obtained is as follows:

  1. For a given connection, initialize CWND to 1 message segment and SSthRESH to 65535 bytes.
  2. The output of the TCP output routine cannot exceed the size of the CWND and receiver notification Windows. Congestion avoidance is the flow control used by the sender, while the notification window is the flow control performed by the receiver. The former is an estimate of the network congestion experienced by the sender, while the latter is related to the cache size available to the receiver on the connection.
  3. When congestion occurs (timeout or duplicate acknowledgement is received), SSTHRESH is set to half the current window size (minimum CWND and recipient notification window size, but at least 2 message segments). In addition, if a timeout causes congestion, the CWND is set to 1 message segment (this is slow start).
  4. CWND is added when new data is confirmed by the other party, but the method of addition depends on whether we are doing slow start or congestion avoidance. If CWND is less than or equal to SSTHRESH, slow start is underway, otherwise congestion avoidance is underway. The slow start continues until we are half way back to where we were when congestion occurred, and then switch to performing congestion avoidance. The slow start algorithm initially sets the CWND to 1 message segment, and then increments by 1 for each acknowledgement received.

That would make the window grow exponentially: send one message segment, then two, then four… . The congestion avoidance algorithm requires that the CWND be increased by 1/ CWND each time an acknowledgement is received. This is an additive increase compared to the exponential increase of a slow start. We want to add at most 1 segment to CWND in a round trip time (regardless of how many ACKS are received in this RTT), whereas slow start will increase CWND based on the number of acknowledgements received in this round trip time.

Fast retransmission and fast recovery algorithm

Since we do not know whether a duplicate ACK is caused by a missing segment or just a reordering of several segments, we wait for a small number of duplicate ACKS to arrive. If this is just a reordering of segments, only one or two duplicate ACKS may be generated before the reordered segments are processed and a new ACK is generated. If three or more repeated ACKS are received in succession, it is highly likely that a packet segment is missing. We then retransmit the missing datagram segment without waiting for the timeout timer to overflow. This is the fast retransmission algorithm.

Instead of a slow start algorithm, congestion avoidance algorithm is performed. This is the fast recovery algorithm.

This algorithm is usually implemented as follows:

  1. When the third repeated ACK is received, ssTHRESH is set to half of the current congestion window CWND. Retransmitting the lost packet segment. Set CWND to SSTHRESH plus 3 times the packet segment size.
  2. Each time it receives another duplicate ACK, the CWND increases the packet segment size by 1 and sends a packet (if the new CWND allows sending).
  3. When the next ACK arrives confirming the new data, set CWND to SSTHRESH (the value set in Step 1). This ACK should be an acknowledgement of the retransmission in Step 1 within a round-trip time after the retransmission. In addition, this ACK should also be an acknowledgement of all intermediate packet segments between the missing packet and the first duplicate ACK received. This step uses congestion avoidance because we halve the current rate when packets are lost.

regrouping

When TCP times out and retransmits, it does not have to retransmit the same packet segment. TCP, in contrast, allows regrouping to send a larger segment, which helps performance (the larger segment cannot exceed the MSS declared by the receiver, of course). This is allowed in the protocol because TCP uses byte numbers rather than segment numbers to identify and acknowledge the data it is sending.


TCP keepalive timer

Many TCP/IP beginners will be surprised to learn that no data can flow over an idle TCP connection. That is, if neither party to a TCP connection is sending data to the other, no information is exchanged between the two TCP modules. For example, there is no polling that can be found in other network protocols. This means that we can start a client and establish a connection with the server, and then go away for hours, days, weeks, or months with the connection remaining. Intermediate routers can crash and restart, telephone lines can be hung up and reconnected, but as long as the hosts at both ends are not restarted, the connection remains established.

This means that neither application process — the client process or the server process — uses an application-level timer to detect inactivity, which can cause either application process to terminate its activity. However, many times a server wants to know if the client host crashed and shut down or crashed and restarted. Many implementations provide keepalive timers that provide this capability.

Keepalive is not part of the TCP specification. The Host Requirements RFC provides three reasons for not using a keepalive timer:

  1. In the case of a momentary error, this might free up a perfectly good connection;
  2. They consume unnecessary bandwidth;
  3. You spend more money on the Internet when you charge by packet.

However, many implementations provide keepalive timers.

The live timer is a controversial feature. Many believe that this functionality should not be provided in TCP if needed, but should be done by applications. This is one of those issues that should be taken seriously, since there has been a great deal of enthusiasm expressed on this subject.

In the event of a temporary failure of the network connecting the two end systems, the keepalive option causes a actually good connection to terminate. For example, if a probe is sent when an intermediate router crashes and restarts, TCP will assume that the client’s host has crashed, when in fact this is not what has happened. Survivability is primarily provided for server applications. The server application wants to know if the client host has crashed so that it can use resources on behalf of the client. Many versions of Rlogin and Telnet servers use this option by default.

A common example of the need to use liveability today is when a PC user registers with a host using Telnet using TCP/IP. If, at the end of the day, they simply turn off the power without logging off, they will be left with a semi-open connection. If the customer has disappeared, leaving a semi-open connection on the server and the server is waiting for data from the customer, the server will wait forever. The keepalive function attempts to detect this semi-open connection on the server side.

In this description, we call the server on one end and the client on the other. There is nothing that prevents the client from using this option, but it is usually the server that sets it up. Both parties can use this option if they particularly need to know if the other has disappeared.

If there is no action on a given connection for two hours, the server sends a probe segment to the client. The client host must be in one of the following four states:

1The client host is still up and reachable from the server. The client's TCP response is fine, and the server knows that the other party is fine. The server resets the keepalive timer after two hours. If there is application traffic passing through this connection before the two-hour timer reaches time, the timer is in the future after exchanging data2Reset at hour.2The client host has crashed and is down or restarting. In either case, the client's TCP is not responding. The server will not be able to receive the response to the probe, and the75Seconds out. The server sends a total10One probe like this, every interval75Seconds. If the server does not receive a response, it assumes that the client host is down and terminates the connection.3The client host crashed and has been restarted. The server will receive a response to its probe, but this response is a reset, causing the server to terminate the connection.4The client host is up, but the slave server is unreachable. This is the state2Same, because TCP cannot distinguish between states4With the state2All it can detect is that there is no response to the probe. The server does not have to worry about client hosts being shut down and restarted (this refers to an operator shutdown, not a host crash). When the system is shut down by the operator, all application processes are also terminated (that is, the client process), which causes the client TCP to issue a FIN on the connection. Receiving FIN causes the server's TCP to report the end of the file to the server process, allowing the server to detect the condition.Copy the code

In the first case, the server’s application does not feel that the probe has occurred. The TCP layer takes care of everything. This process is transparent to the application until case 2, 3, or 4 occurs. In all three cases, the server application will receive an error report from its TCP (usually the server has already made a read operation request to the network and is waiting for data from the client). If the keepalive function returns an error, the error is returned to the server as the return value of the read operation.

In the second case, the error is a message such as “connection timed out,” and in the third case, “Connection reset by the other party.” The fourth case looks like a connection timeout and can return other errors depending on whether an ICMP error related to the connection was received.

One of the questions people are constantly discussing about the viability option is whether the two hours of idle time can be changed. Usually they want it to be much smaller, on the order of minutes. This value can usually be changed, but the keepalive interval is a system-level variable, so changing it affects all users of the feature.


Come here!