preface
Learning resources from Geek Time – Teacher Li Bing “Browser working principle and Practice”. Next, let’s check in every day
- Day 01 Chrome architecture: Why 4 processes with only 1 page open?
- Day 02 TCP: How to ensure that a page file can be delivered to the browser in its entirety?
What will you learn from reading this article?
Start by understanding TCP to prepare for a comprehensive understanding of HTTP, including its actual functions and limitations. And for a better understanding of why HTTP/2 was introduced, and why the QUIC protocol, the future HTTP/3, was introduced.
This is a gradual process from shallow to deep. I hope I can play it steadily, learn every step and every agreement well, and then “water will flow to nature”.
- Data on the Internet is transmitted through data packets, which are easy to lose or make errors during transmission.
- IP is responsible for delivering packets to the destination host.
- UDP is responsible for delivering data packets to specific applications.
- TCP ensures the complete transmission of data. Its connection can be divided into three phases: establishing a connection, transferring data, and disconnecting the connection.
In this article, we’ll focus on how TCP/IP works in the Web world.
The “journey” of a packet
Here I will tell you about the data transfer process from three perspectives: “how the packet gets to the host”, “how the host forwards the packet to the application” and “how the data gets to the application in its entirety”.
The Internet is really an architecture of ideas and protocols. An agreement is a well-known set of rules and standards that, if all parties agree to use them, make communication between them unimpeded.
Data on the Internet is transmitted in packets. If a large amount of data is sent, it is broken up into smaller packets for transmission. The audio data you listen to now, for example, is broken down into small packets, not one big file.
1. IP: sends the data packet to the destination host
For packets to travel over the Internet, they must comply with the Internet Protocol (IP) standard. All the different online devices on the Internet have unique addresses. The addresses are just a number, similar to most home delivery addresses. You only need to know the specific address of a home to send a package to that address, so that the logistics system can deliver the item to its destination.
A computer’s address is called an IP address, and visiting any website is really just your computer asking another computer for information.
If you want to send A packet from host A to host B, the packet is appended with host B’s IP address information before transmission so that it can be addressed correctly during transmission. In addition, host A’s OWN IP address is attached to the packet. With this information, host B can reply to host A. This additional information is loaded into a data structure called an IP header. The IP header is the information at the beginning of an IP packet, including the IP version, source IP address, destination IP address, and lifetime.
To facilitate understanding, I first divided the network into three layers, as shown below:
Let’s look at the journey of the next packet from host A to host B:
- The upper layer sends packets containing “geek time” to the network layer;
- The network layer attaches the IP header to the packet to form a new IP packet, which is handed to the bottom layer.
- The bottom layer transmits data packets to host B through the physical network.
- The data packet is transmitted to the network layer of host B, where host B unwraps the IP header of the data packet and delivers the disassembled data to the upper layer.
- Eventually, the packet containing the geek time message reaches the upper layer of host B.
2. UDP: Sends data packets to the application
IP is a very low-level protocol, which is only responsible for sending packets to the other computer, but the other computer does not know which program to send the packets to, the browser or king of Glory? Therefore, protocols that can communicate with applications need to be developed based on IP. The most common one is User Datagram Protocol (UDP).
One of the most important pieces of information in UDP is the port number, which is a number bound to every application that wants to access the network. UDP sends the specified packet to the specified program by port number, so IP sends the packet to the specified computer by IP address information, and UDP sends the packet to the correct program by port number. Like the IP header, the port number is loaded into the UDP header, which is then combined with the original UDP packet to form a new UDP packet. The UDP header contains information such as the destination port and the source port number.
In order to support UDP protocol, I extended the previous three-layer structure to four-layer structure, and added a transport layer between the network layer and the upper layer, as shown in the figure below:
Let’s see how the next packet travels from host A to host B:
- The upper layer sends packets containing geek time to the transport layer;
- The transport layer will attach the UDP header in front of the packet to form a new UDP packet, and then the new UDP packet to the network layer;
- The network layer attaches the IP header to the packet to form a new IP packet, which is handed to the bottom layer.
- The packet is transmitted to the network layer of host B, where host B unwraps the IP header and passes the unwrapped part of the data to the transport layer. – At the transport layer, the UDP header in the packet is separated and the data part is handed over to the upper-layer application program according to the port number provided in the UDP;
- Eventually, packets containing geek time information travel to host B’s upper application.
When UDP is used to send data, various factors may cause packet errors. Although UDP can verify whether the data is correct, UDP does not provide a retransmission mechanism for incorrect packets, but only discards the current packet. After UDP is sent, it cannot know whether the packet can reach the destination.
Although UDP does not guarantee data reliability, it is very fast, so UDP will be used in some areas where speed is a concern but data integrity is not so strict, such as online video, interactive games and so on.
3. TCP: Delivers data to the application in its entirety
For browser requests and mail applications that require reliability of data transmission, UDP has two problems:
- Data packets are easily lost during transmission.
- Large files are broken into smaller packets for transmission. These packets take different routes and arrive at the receiver at different times. UDP does not know how to assemble these packets into a complete file.
Based on these two issues, we introduced TCP. Transmission Control Protocol (TCP) is a connection-oriented, reliable, byte stream – based transport layer communication Protocol. Compared with UDP, TCP has the following characteristics:
- TCP provides a retransmission mechanism for packet loss.
- TCP introduces the packet sorting mechanism to ensure that out-of-order packets are combined into a complete file.
Like UDP headers, the TCP header contains the destination port and the local port number, as well as a sequence number for sorting, so that the receiver can reorder the packet by the sequence number.
Let’s take a look at the transmission flow of a single packet under TCP:
The figure above should give you an idea of how a packet is transmitted over TCP. The transmission flow of a single TCP packet is similar to that of UDP. The difference is that the information in the TCP header ensures the integrity of a large piece of data.
Let’s take a look at the complete TCP connection process to see how TCP guarantees retransmission and packet ordering.
As can be seen from the following figure, a complete TCP connection life cycle consists of three phases: “Establish a connection”, “transfer data” and “disconnect”.
- First, establish the connection phase. This phase establishes the connection between the client and server through a “three-way handshake.” TCP provides connection-oriented communication transport. Connection-oriented refers to the preparation work between the two ends before data communication begins. The three-way handshake means that when a TCP connection is established, the client and server send a total of three packets to confirm the connection.
- Secondly, the data transmission stage. At this stage, the receiving end needs to confirm each packet. That is, after receiving the packet, the receiving end needs to send the confirmation packet to the sender. Therefore, if the sender does not receive the confirmation message from the receiver within a specified period after sending a data packet, the packet is considered lost and the retransmission mechanism is triggered. Similarly, a large file is divided into many small packets during transmission. After these packets arrive at the receiving end, the receiving end sorts them according to the sequence number in the TCP header to ensure complete data.
- Finally, the disconnect phase. Once the data is transferred, the connection is terminated, which involves the final stage of “four waves” to ensure that both parties are disconnected.
By now you can see that TCP has sacrificed packet speed to ensure reliable data transmission, because “three-way handshake” and “packet verification” doubled the number of packets in the transmission process.