An overview of the

Instant Messaging (IM) is a real-time communication system that allows two or more people to communicate via text messages, files, voice and video in real time through the Internet. At present, with the widespread popularity of the Internet, instant messaging software such as QQ and wechat has been integrated into every aspect of our life and work. In order to facilitate instant communication between users, popular software on the market also integrates IM functions.

This is the first of a series of articles designed to systematically organize and introduce the theory and implementation aspects of the instant messaging domain, introducing the underlying communication protocols.

TCP/IP protocol stack

TCP/IP protocol stack is the sum of a series of network protocols, constitutes the core framework of network communication, is the cornerstone of modern Internet communication. The TCP/IP protocol stack shields the differences between hardware and operating systems and defines restrictions and specifications for network communication. Devices or software can be connected only after meeting the corresponding standards.

TCP/IP adopts a general layered structure, consisting of four layers: application layer, transport layer, network layer, and link layer. Each layer contains multiple corresponding protocols. In the whole process of data transmission from the sender to the receiver at the application layer, the corresponding protocol headers are encapsulated in sequence from the upper layer to the lower layer, and the protocol headers are removed in sequence from the lower layer to the upper layer. The protocol header defines the metadata information required for protocol processing at this layer, which embodies all the details of the protocol specification.

To take one example, when via HTTP to launch a request, the application layer, transport layer, network layer and link layer related protocols, in turn, the request for packaging and carry corresponding first, finally in the link layer generated Ethernet packets, Ethernet packets through the physical media transmission to the host to the other party and the other receives the packet, Then layer by layer using the corresponding protocol for unpacking, and finally the application layer data to the application program processing.

If network communication is compared to delivery, the layers of packages outside the goods can be understood as the first encapsulation of each layer of protocol, including commodity information, delivery address, recipient, contact information and so on.

In addition, express delivery to the hands of users, but also need to distribution cars, distribution stations, couriers with each other. The delivery vehicle is the physical medium, the delivery station is the gateway, the Courier is the router, the receiving address is the IP address, and the contact information is the MAC address. The delivery vehicle will transport the goods from the warehouse to the nearby distribution station. The distribution station will confirm whether it needs to continue to forward the goods to other distribution stations according to the province or city in the harvest address. When the package arrives at the target distribution station, the Courier will find the recipient through contact information and finally deliver the goods to the user.

The link layer

The link layer is the lowest layer of the four-layer protocol and interacts directly with the physical media.

Physical medium is the physical means to connect the computer, common optical fiber, twisted-pair, and radio waves, it determines the transmission of electrical signals (0 and 1), the physical medium determines the transmission bandwidth of electrical signals, speed, transmission distance and anti-interference and so on. TCP/IP supports a number of different link layer protocols, depending on the hardware used in the network, such as Ethernet, token ring, FDDI (Fiber Distributed Data Interface), and RS-232 serial lines.

In the TCP/IP protocol family, the link layer provides the upper layer to send and receive datagrams. Regardless of the specific link layer protocol, the following three common problems need to be solved: frame encapsulation, transparent transmission, and error detection.

Encapsulated into a frame

Network communication is to have a certain significance of transmitting data through the physical media, simply send 0 and 1 is meaningless, meaningful data, you need to in bytes for grouping 0 s and 1 s, and to identify good information characteristics of each set of electrical signals, then according to the packet sent in the order. Framing is putting a header and a tail before and after a piece of data, forming a frame. After receiving the bit stream submitted by the physical layer, the receiver can identify the beginning and end of the frame from the received bit stream according to the tags at the head and tail.

Take Ethernet II as an example. The entire data frame consists of the header, data, and tail. The header is fixed at 14 bytes and contains the target MAC address, source MAC address, and type. The minimum length of data is 46 bytes and the maximum length is 1500 bytes. If the data to be transmitted is very long, it must be divided into multiple frames for transmission (upper-layer protocol processing). The tail is fixed at 4 bytes, representing the data frame check sequence, which is used to determine whether the packet is damaged during transmission.

Preamble	Start frame delimiter	MAC destination	MAC source	802.1 Q tag (optional)	Ethertype	Payload	Frame check sequence (32‑bit CRC)	Interpacket gap
7 octets	1 octet	6 octets	6 octets	(4 octets)	2 octets	46 ‑ 1500 octets	4 octets	12 octets

Ethernet uses MAC addresses to uniquely identify network devices and implement communication between network devices on the LAN. A MAC address is also called a physical address. Most nic manufacturers burn the MAC address into the ROM of the NIC. The sender uses the MAC address of the receiver as the destination address. After encapsulation, Ethernet frames are converted into bitstreams through the physical layer and transmitted on physical media.

When receiving a data frame, the host checks the destination MAC address in the frame header. If the destination MAC address is neither the local MAC address nor a monitored multicast or broadcast MAC address, the host discards the received frame. If the destination MAC address is the local MAC address, the frame is received, the frame verification sequence (FCS) field is checked and compared with the value computed locally to determine whether the integrity of the frame is maintained during transmission. If the check passes, the frame header and tail will be stripped, and the data will be sent to the corresponding upper-layer protocol for subsequent processing according to the Type field in the frame header.

Transparent transmission

Since the beginning and end of frames are marked using specially specified control characters, any combination of 8 bits in the transmitted data must not be allowed to match the bit encoding of the control character used for frame framing, otherwise frame framing errors will occur. The point of transparent transmission is that the data sent by the upper-layer protocol should not be affected by frame framing.

A common solution to this problem is byte padding. The specific method is: the data link layer of the sending end inserts an escape character in front of the control character in the data. The data link layer at the receiving end removes the inserted escape character before sending the data to the network layer.

When Ethernet transmits frames, there must be a gap between them. Therefore, as long as the receiver finds the frame start delimiter, the subsequent successive bit streams are all part of the same MAC frame. Visible Ethernet does not require the use of frame end delimiters or byte padding to ensure transparent transmission.

Error control

Realistic communication links are not ideal. That is, a bit can go wrong in transmission: a 1 can go to a 0, and a 0 can go to a 1. This is called a bit error. Currently, Cyclic Redundancy Check (CRC) is widely used in data link layer. I “infinite error” is not the same as “transmission error free”. Using CRC checks at the data link layer, it is possible to achieve extremely high error transmission, but it is not reliable transmission. Problems such as frame loss, frame repetition, or frame out-of-order need to be handled by higher protocols, such as TCP. For wired transmission links with good communication quality, the data link layer protocol does not use the acknowledgement and retransmission mechanism and is processed by the upper layer, but can improve the communication efficiency of the network to a certain extent.

MTU Maximum transmission unit

Maximum Transmission Unit (MTU) is the Maximum payload upper limit of the data link layer (that is, the Maximum length of IP packets). The MTU of each link may be different. The MTU of an end-to-end path is determined by the MTU of the link with the smallest MTU. If the length of the data sent by the upper layer is larger than this value, the data must be fragmented by the upper layer. Otherwise, the data will be discarded by the link layer protocol.

If the length of an IP packet is larger than the MTU value of the current link (usually 1500), the packet is fragmented to ensure that the length of each packet does not exceed the MTU value. IP datagrams transmitted by sharding do not necessarily arrive in order, but the information in the IP header allows these datagrams to be assembled in order.

Don’t understand to ask

How is the MAC address of the data frame set? Will it change?

Data sent by hosts is encapsulated with source AND destination Ip addresses at the network layer, and source and destination MAC addresses at the link layer. When the IP network layer determines the next route IP address based on the destination IP address, it obtains the MAC address of the route IP address through ARP address resolution and changes it to the destination MAC address of the data frame. This means that when data frames are transmitted between devices at the IP network layer and above, the source MAC address and destination MAC address will change accordingly.

How to determine the upper layer protocol for Ethernet data frames?

Ethernet frames contain a Type field that indicates which upper-layer protocol the data in the frame should be sent to for processing. For example, the Type value of the IP protocol is 0x0800, and that of the ARP protocol is 0x0806.

How can I obtain the maximum MTU of a path to avoid upper-layer fragmentation

Sharding of IP datagrams increases processing overhead and space overhead. How can you avoid sharding? Source address Set the flag bit of the datagram to DF and gradually increase the size of the sent datagram. Any device along the path that needs to fragment a packet will discard this datagram and return an ICMP “datagram too large” response to the source address. Thus, the source host “learns” the maximum maximum transport unit that can pass through the path without sharding.

Unfortunately, ICMP can be rejected by firewall or router access control lists. If you can manage and configure these devices, just allow ICMP (Type=3, Code=4) messages to pass, otherwise you have to turn off path MTU discovery, because at least shards can communicate, and avoiding shards can’t communicate at all.

The difference between a hub, a switch, and a router?

equipment	use	The characteristics of
A hub	Connecting multiple Twisted Pairs of Ethernet cables or optical fibers to the same physical medium can regenerate, shape, and amplify the received signals to extend the transmission distance of the network and concentrate all nodes on the central node	Working at the physical layer and sharing bandwidth, conflict domains cannot be isolated and can only be half-duplex
switches	Connects to devices on the LAN, maintains a mapping table of MAC addresses and ports, and forwards the corresponding MAC addresses to the corresponding devices. Switches also offer more advanced features than baselers, Repeaters, and Bridges, such as virtual local area networks (vlans) and higher performance, and are often used to build large lans	Generally, it works at the link layer in full duplex mode, with exclusive bandwidth and conflict domains isolated
The router	A network device that connects multiple networks or network segments and “translates” data information between different networks, network segments, or vlans so that they can “read” each other’s data to form a larger network	They work at the network layer. Although IP addresses are used to route networks, they need to be translated into MAC addresses for forwarding on the link layer

However, today’s commercial switches for large Lans in components generally work at the network layer.

The network layer

The link layer provides intra-LAN communication capabilities, while the network layer provides host-to-host communication capabilities across multiple networks. For the data sent by the upper layer, the network layer forwards it in groups as required. Each packet (that is, IP datagram) is sent independently and is not numbered. After all the groups are sent to the destination host, the groups are merged and delivered to the upper layer.

But the network layer up only provides simple, flexible, connectionless, best-effort datagram services. The network layer offers no quality-of-service promises. That is, packets transmitted can be wrong, lost, repeated and out of order (i.e. not reaching the destination in order), and of course there is no guarantee of the time frame for packet delivery.

The main fields in the protocol format are:

Identifier: Used to uniquely mark a datagram. When a datagram is sharded, each fragment identifier is the same and used to refragment
Flag: the lowest WF=1 indicates that there are fragments; DF=1 in the middle indicates that sharding cannot be performed
Slice offset: The position of the bits of the sharded packet relative to the original datagram
Lifetime: The maximum number of routers a datagram can pass over the Internet (hops)
Protocol: Which protocol is used for the data carried by the datagram so that the IP layer of the destination host knows which process to hand the data portion to
Source ADDRESS and Destination address: IP addresses of the sender and receiver

The IP address

The IP address is the final concept of the network layer and represents the unique identifier of the communication between devices at the network layer. Because the structure of IP address requires easy network addressing, IP address addressing goes through three stages: classified IP address, subnets, and classless CIDR.

IP address of the class

IP address classification divides IP addresses into several fixed classes. Each class of IP addresses consists of two fields of fixed length. The first of these is the net-ID, which identifies the network to which the host (or router) is connected. A network number must be unique throughout the Internet. The second field is the host-ID, which identifies the host (or router). A host number must be unique within the network range indicated by the network number preceding it.

Divide the subnet

Due to the fixed subnet of classified IP address, IP address allocation is not flexible enough, resulting in low utilization rate. Subnets divide a network into multiple subnets, which are still a network to the outside world. After subnets are created, THE IP address becomes a three-tier structure. Subnets divide only the host number of an IP address, but do not change the original network number of the IP address.

We know that the header of the IP datagram does not tell whether the network to which the source or destination host is connected is subnetted. This is because neither the 32-bit IP address itself nor the header of the datagram contains any information about subnets. So you have to do something else, and that is use a subnet mask.

When a router exchanges routing information with its neighbor routers, it must tell the neighbor routers the subnet mask of its own network (or subnet). Each item in the router’s routing table must give the subnet mask of the network in addition to the destination network address.

Unclassified addressing CIDR

CIDR eliminates the traditional concepts of Class A, B, and C addresses and subnets, thus allowing for A more efficient allocation of IPv4 address space. CIDR divides the 32-bit IP address into two parts. The first part is the “network-prefix” (or prefix for short), which identifies the network, and the second part identifies the host.

CIDR combines contiguous IP addresses with the same network prefix into a “CIDR address block”. By knowing any address in the CIDR address block, we can know the start (minimum) and maximum address of the block, as well as the number of addresses in the block.

Units assigned to a CIDR address block can still be subnets as needed within the unit. These subnets also have only one network prefix and one host number field, but the subnet network prefix is longer than the entire unit network prefix. CIDR uses 32-bit address masks to facilitate routing. The address mask consists of a string of 1s and 0s, and the number of 1s is the length of the network prefix.

Because there are many addresses in a CIDR address block, CIDR address blocks are used in routing tables to find the destination network. This aggregation of addresses, often called route aggregation, enables a single entry in the routing table to represent many (for example, thousands) of routes from traditionally classified addresses. Route aggregation is also called supernetting.

When CIDR is used, the IP address consists of the network prefix and the host number, so the items in the routing table must be changed accordingly. In this case, each item consists of a network prefix and a next-hop address. But you might get more than one match when you look up the routing table. This raises the question: Which route should we choose from these matching results? The correct answer is that the route with the longest network prefix should be selected from the matching results. This is called a longest-prefix matching, because the longer the network prefix, the smaller the address block, and thus the more specific the route.

Public IP and Intranet IP

Due to the shortage of IP addresses, the number of IP addresses that an organization can apply for is often much smaller than the number of hosts that the organization has. Given that the Internet is not very secure, it is not necessary for all hosts in an organization to be connected to the external Internet. Because in many practical scenarios, such as large shopping malls or hotels, the computers used for business and management do not need to be connected to an external network.

The IP addresses of these computers that are used only within the organization can be assigned by the organization itself. This means allowing these computers to use IP addresses that are valid only in the organization (called local/Intranet IP), rather than applying to the Internet administration for globally unique IP addresses (called global/public IP). This saves valuable global IP addresses.

Arbitrary selection of IP addresses as local addresses for internal use within your organization may cause trouble in some cases. For example, when a host within an organization needs to be connected to the Internet, a local address for internal use may overlap with an IP address on the Internet, causing address ambiguity.

To address this problem, RFC 1918 specifies some private addresses. These addresses can only be used for internal communication within an organization, not for communication with hosts on the Internet. In other words, a private address can only be used as a local address, not a global address. All routers on the Internet do not forward datagrams whose destination address is a private address. The following three types of IP addresses are Intranet IP addresses:

10.0.0.0 to 10.255.255.255 (or 10.0.0.0/8, also known as the 24-bit block)
172.16.0.0 to 172.31.255.255 (or 172.16.0.0/12, also known as the 20-bit block)
192.168.0.0 to 192.168.255.255 (or 192.168.0.0/16, also known as 16-bit block)

Sharding and regrouping

When AN IP datagram travels over a network, it passes through multiple physical networks to reach its destination. Different networks have different physical characteristics of the link layer and media. Therefore, the maximum length of data frames is limited during transmission. This limitation is the MTU. The IP protocol states that all hosts and routers on the Internet must be able to accept datagrams no longer than 576 bytes.

If the number of packets encapsulated by the IP layer exceeds the MTU value, the IP layer fragments the packets. Of course, sharding may not only occur at the sender, but also may occur in the intermediate network, but the reorganization is carried out at the receiver.

The IP data of the partitioned slices arrives at the destination host, and the target host reorganizes the slices to recover the datagrams before the component slices. Fragment reassembly depends on the header field of the IP datagram as follows:

Determine which IP packet the fragment belongs to according to the Id field.
According to the “slice unfinished MF” sub-field of the “flag” field, whether the fragment is the last fragment can be determined.
The offset field determines the position of the shard in the original datagram.

routing

We know that the source host and the destination host may span multiple networks. How do we find a path to the destination host? If there are multiple paths, how do we find the optimal one? This is the routing protocol problem.

There are two routing protocols: intra-zone routing (intra-zone routing) and inter-zone routing (inter-zone routing). Common intra-zone routing protocols are RIP and OSPF, and common interzone routing protocols are BGP.

The general principle of the routing protocol is as follows: A router and its neighbor constantly exchange the current link status and update the link status of the router to quickly synchronize routing information.

Don’t understand to ask

Relationship between IP address and MAC address

IP addresses are used at the network layer or higher, while MAC addresses are used at the link layer or lower. An IP address uniquely identifies a host or router on a network. MAC addresses are used to identify devices working at the link layer in a LAN. To forward an IP packet at the link layer, the IP address of the next hop needs to be translated into a MAC address. This process is processed by THE ARP protocol at the network layer. The upper layer only cares about the logical IP address, not the physical MAC address.

Will the source AND destination IP addresses of IP datagrams change

IP The source AND destination IP addresses of IP packets may change.

To communicate with the external network, you need to configure a public IP address. Due to the limited public IP address resources, the common solution is to use NAT network address translation to enable the Intranet host to communicate with the NAT gateway, and the NAT gateway updates the source IP address of the IP packet to the public IP address.

In load balancing NAT mode, the destination IP address of an IP packet is changed to the IP address of the real server. However, the load balancer must act as the NAT gateway of the real server to ensure that the IP addresses in the request and response packets are the same.

Do I need to manually configure the IP address

There are two TYPES of IP addresses: static IP address and dynamic IP address. The disadvantage of static IP address configuration is that you need to know the current network situation, which is unfriendly to users and is often used by the NMS. Dynamic IP address configuration is common. The host interacts with the DHCP server through DHCP to obtain the IP address for configuration. The obtained IP address is usually time-limited, called a lease, and the IP address in the lease is always available.

The transport layer

The IP layer provides host-to-host communication capabilities, but the real sender and receiver are the processes (applications) running on the host. Therefore, the communication capability between application processes provided by the transport layer identifies application processes at both ends by port numbers. Due to the unreliable data forwarding service provided by the IP layer, the transport layer has two choices: one is to provide orderly and reliable data transmission service, and the other is to deliver as much as possible like the IP layer, which is handled by the upper application layer according to the situation.

UDP protocol.

UDP, like the IP layer, delivers at maximum capacity, i.e., reliable delivery is not guaranteed. After the header is added, the UDP packets from the sender are sent to the IP layer. UDP does not merge or split packets from the application layer, but retains the boundaries of the packets.

The application must select a message of the appropriate size. If the packet is too long, THE IP layer may fragment the packet after UDP sends it to the IP layer, which reduces the efficiency of the IP layer. On the other hand, if a packet is too short, UDP sends the packet to the IP layer, causing the relative length of the header of the IP packet to be too large, which reduces the efficiency of the IP layer. Therefore, the HEADER of a UDP packet is very simple, consisting of eight bytes in source port, destination port, length, and checksum fields.

TCP protocol

TCP is a connection-oriented transport layer protocol. It provides a reliable transport service. Data transmitted over the TCP connection is error-free, not lost, not repeated, and arrives in sequence.

In addition, TCP is byte stream oriented. A “stream” in TCP refers to a sequence of bytes flowing into or out of a process. Byte stream oriented means that while an application interacts with TCP one data block at a time (of varying sizes), TCP treats the data handed over by an application as just a series of unstructured byte streams.

TCP does not guarantee that the recipient application received a block of data and the data block is made by the sender application corresponding to the size of a relationship (for example, the sender application to sender TCP, a total of 10 data blocks, but the recipient’s TCP may only four upper data block is the received byte stream delivery applications). However, the byte stream received by the receiver application must be exactly the same as the byte stream sent by the sender application. Of course, the receiver’s application must be able to recognize the received byte stream and restore it to meaningful application-layer data.

Protocol format

The meanings of the fields in the fixed part of the header are as follows:

Source port and destination port: Identifies the application process on the sender and receiver
Ordinal number: The ordinal number ranges from 0 to [2 ^ 32 -1], after which it goes back to 0. Each byte in the byte stream transmitted over a TCP connection is numbered sequentially.
Acknowledgement number: the sequence number of the first data byte of the next segment expected to be received from the peer. B correctly receives A packet segment from A, whose serial number field value is 501, and the data length is 200 bytes (serial number 501-700), which indicates that B correctly receives the data from A up to serial number 700. Therefore, B expects the next data number from A to be 701. Therefore, B sets the confirmation number to 701 in the confirmation packet sent to A
Data offset: Indicates the distance between the start of the TCP packet and the start of the TCP packet. This field actually indicates the header length of the TCP packet segment.
Control bit: SYN (indicates the connection request packet segment, used for serial number synchronization); Acknowledgment ACK (used to acknowledge receipt of a packet segment); Terminate the FIN (indicates that the sender of the packet has finished sending data and needs to release the connection)
Window: The window field indicates how much data the other party is now allowed to send. Window values are constantly changing dynamically.
Options: Maximum segment length MSS (used to negotiate the maximum length of data fields in TCP packet segments); Window enlargement option; Timestamp option

Connection management

TCP is a connection-oriented protocol. Before transmitting data, establish a connection, determine the initial serial number, and negotiate parameters such as MSS. After the connection is established, the two parties can send and receive data in full-duplex mode. Both the sender and the receiver have caches. After data transfer is complete, release the connection.

The process of establishing a connection

The TCP connection requires three communications between the two parties to synchronize information. To ensure the reliability of data transmission, the initial serial numbers of the two parties need to be agreed upon when establishing a connection so that the received data can be confirmed in subsequent data transmission.

For the first handshake, the connection initiator A sends the serial number of end A.
For the second handshake, the connection recipient B confirms and returns B’s serial number;
For the third handshake, end A confirms receiving the reply message from end B.

The process of establishing a connection requires three communications, known as the triple handshake. Here’s why a third handshake is needed:

Without the third handshake, it is impossible to confirm whether the initiator receives the information from the receiver. Therefore, the third handshake is required to ensure the reliability of data transmission.
Because the IP layer is unreliable, TCP requires packet confirmation and timeout retransmission to ensure reliable transmission. If the first handshake message is delayed, the sender retransmits the first handshake message after the line times out. It is possible that server B received the last delayed first handshake message, thinking that A sent A new connection request, and server B will send A connection confirmation message. Since there is A third handshake, customer A will ignore B’s confirmation request and will not establish A new connection.

The process of releasing a connection

Four attempts are required to close a TCP connection. Because a TCP connection is full-duplex (data can be transmitted in both directions), each direction must be closed separately. After a party completes its data transmission task, it can send a FIN to terminate the connection in that direction. When a party receives the FIN, it should notify the application layer that the other end has terminated the data transmission in that direction, but can continue to send data.

For the first handshake, the initiator sends A FIN message, indicating that end A has completed data transmission and terminates data transmission in this direction.
For the second handshake, receiver B sends an ACK, indicating that end B has received the data sent by end A.
For the third handshake, the receiver sends a FIN message, indicating that end B has also finished sending data and terminates data transmission in this direction.
For the fourth handshake, initiator A sends an ACK to confirm that end A has received the data sent by end B.

Why does the TIME_WAIT state take 2MSL to return to CLOSE?

To ensure that the last ACK packet segment can reach terminal B.
Prevents invalid Connection request message segment from appearing in a new connection. After sending the last ACK segment, all segments generated in the connection can be eliminated from the network after practice 2MSL. This prevents the old segment from appearing in the next new connection.

Reliable transport

Since the IP layer provides unreliable datagram transport services, how does TCP guarantee reliable transport? TCP achieves reliable communication through sliding Windows (sending and receiving), packet ACK and timeout retransmission.

After A and B establish A connection, data transfer begins. After host A sends A packet segment M1 to host B, A timer is set. If the timer times out and no M1 is confirmed by HOST B, host A resends the packet segment M1. After the mechanism is adopted, recipient B needs to discard the duplicate packet segments that may be received. The sender also discards duplicate acknowledgement packets.

Sender host A has A send window and receiver host B has A receive window. Both the sending window and the receiving window are byte stream oriented, with an ordinal number for each byte, arranged according to the ordinal size. The size of the receive window is determined by the size of the receive buffer, while the size of the send window is determined by both the send buffer and the receive window. The relationship between Windows and cache can be seen in the following figure.

The send window consists of the byte parts that have been sent but not received confirmation and the byte parts that are allowed to be sent but not yet sent; The receive window consists of bytes that are received but not acknowledged. To improve communication efficiency, the sender can send multiple packet segments without waiting for the confirmation of the other party. However, the size of the sending window is limited. The receiving party adopts cumulative confirmation. This means that the receiver does not need to acknowledge each received segment one by one. Instead, after receiving several segments, the receiver acknowledges the last segment that arrives in sequence. This indicates that all segments up to this segment have been correctly received. But the TCP standard states that the confirmation delay should not exceed 0.5 seconds.

Cumulative validation has both advantages and disadvantages. Advantages are: easy to implement, even if confirmed lost do not have to retransmit. The disadvantage, however, is that it does not reflect to the sender all the grouping information that the receiver has correctly received. If the sender sends the first five segments, the middle third segment is lost. In this case, the receiver can only send an acknowledgement for the first two segments. The sender has no way of knowing the whereabouts of the next three segments and has to retransmit them all. This is called go-back-n, which means you need to Go back again to retransmit the N packets that have been sent.

Congestion control

In the computer network link capacity (namely bandwidth), the cache and processor in the switching node, are the resources of the network. At some point, if the demand for a resource in the network exceeds the available portion of the resource, the performance of the network deteriorates. This condition is called congestion. Congestion control prevents too much data from being injected into the network so that routers or links in the network are not overloaded.

There are four common congestion control algorithms, which are slow start, congestion avoidance, fast retransmit and fast recovery.

Slow start with congestion avoidance

The sender maintains a state variable called congestion window CWND (congestion Window). The size of the congestion window depends on the level of congestion on the network and changes dynamically. The sender makes its send window equal to a congested window (which is also limited by the receive window).

The slow start algorithm increases the value of congestion window gradually from small to large. Usually, the congestion window CWND is set to a maximum MSS value at the beginning of sending a packet segment. After each acknowledgement of a new message segment is received, the congestion window is increased by at most one MSS value. In this way, the sender’s congestion window CWND (where the value 1 represents an MSS) is gradually increased.

At the beginning, the sender sets CWND = 1, sends the first message segment M1, and the receiver confirms M1 after receiving it. After receiving an acknowledgement of M1, the sender increases the CWND from 1 to 2, and then sends two message segments M2 and M3. The recipient sends back confirmation of M2 and M3 upon receipt. Each time the sender receives an acknowledgement of a new message segment (excluding retransmission), the congestion window of the sender increases by 1. Therefore, after the sender receives two acknowledgments, the CWND increases from 2 to 4 and can send a total of four message segments from M4 to M7 (see Figure 5-24). Therefore, with the slow start algorithm, the congestion window CWND doubles for each transmission round.

To prevent network congestion caused by excessive CWND growth of congestion window, a slow start threshold SSthRESH status variable needs to be set. The slow start threshold ssthresh is used as follows:

When CWND < SSTHRESH, use the slow start algorithm described above.
When CWND > SSTHRESH, stop using the slow-start algorithm and use the congestion avoidance algorithm instead.
When CWND = SSTHRESH, either the slow start algorithm or the congestion avoidance algorithm can be used.

The idea of the congestion avoidance algorithm is to make the congestion window CWND grow slowly, that is, to increase the congestion window CWND of the sender by 1 after each transmission round, instead of doubling it. In this way, the congestion window CWND grows slowly according to the linear law, which is much slower than the congestion window growth rate of the slow-start algorithm.

The slow start threshold ssthRESH should be set to half (but not less than 2) of the value of the sender window when congestion occurs, as long as the sender determines that the network is congested, whether in the slow start phase or in the congestion avoidance phase. Then the congestion window CWND is reset to 1 and the slow start algorithm is performed. The goal is to quickly reduce the number of packets sent to the network by the host so that the congested router has enough time to clear the backlog of packets in the queue.

Fast retransmission and fast recovery

The fast retransmission algorithm first requires the receiver to send repeated acknowledgement immediately after receiving an out-of-order segment (so that the sender can know that a segment has not reached the other party in time) instead of waiting for the pickup acknowledgement when it sends data. According to the fast retransmission algorithm, as long as the sender receives three consecutive repeated acknowledgements, it should immediately retransmit the unreceived packet segment M3 without waiting for the expiration of the retransmission timer set for M3.

The process of the fast recovery algorithm is as follows: When the sender receives three consecutive repeated acknowledgements, the slow start threshold SSthRESH is halved. This is to prevent network congestion; Then start to execute the congestion avoidance algorithm, so that the congestion window slowly linearly increase. This is because the sender now believes that the network is likely to be free of congestion (if the network is severely congested, there will not be multiple segments reaching the receiver in succession, which will not cause the receiver to send repeated acknowledgements continuously)

Comparison between UDP and TCP

Transport layer protocol	The characteristics of	Applicable scenario	Actual Application Scenarios
UDP	Message-oriented, connectionless, unreliable delivery is not guaranteed	Applicable to real-time applications (SUCH as IP phone calls, video conferences, and live broadcasts) with low latency and packet loss tolerance	SNMP, Quic/Http3, RTP
TCP	Byte stream oriented, connected, reliable delivery, flow control and congestion control	Suitable for applications requiring reliable transmission	Http, FTP, SMTP, TelNet

The resources

Computer network

TCP/IP, rounding

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Ten minutes to understand network protocol

An overview of the

TCP/IP protocol stack

The link layer

Encapsulated into a frame

Transparent transmission

Error control

MTU Maximum transmission unit

Don’t understand to ask

How is the MAC address of the data frame set? Will it change?

How to determine the upper layer protocol for Ethernet data frames?

How can I obtain the maximum MTU of a path to avoid upper-layer fragmentation

The difference between a hub, a switch, and a router?

The network layer