This is an old topic and the article will describe in detail what happens in this process.

The request URL

When we type a web address into the browser, for example

https://www.baidu.com/ 
Copy the code

The URL consists of the resource type, host domain name for storing the resource, and resource file name. It also consists of four parts: protocol, host, port, and path.

The general syntax format of a URL is:

// Options with square brackets [] protocol :// hostname[:port] / path / [;parameters][? Query]#fragmentCopy the code

Generate HTTP request information

After getting the URL, you need to parse the URL, such as what does the URL request express (is the request image, HTML, TXT? What is the requested domain name? After the URL is parsed, the browser determines the Web server and file name, and then generates the HTTP request message from this information.

 

HTTP message format

After the packet is generated, you still do not know where your destination is, so you need to obtain the IP address according to the URL.

Obtain the IP address through DNS

After entering the URL, the system resolves the domain name in the url to obtain the IP address. Although IP addresses can uniquely mark computers on the network, IP addresses are a long string of numbers, which is not intuitive, and users are very difficult to remember, so people invented another set of character address scheme, the so-called domain name address. IP address and Domain name are one-to-one, the information of the Domain name is stored in a host called DNS (Domain name Server), users only need to know the easy to remember Domain name address, the corresponding translation is left to the Domain name server. A domain name server (DNS) is a server that converts IP addresses to domain names.

In a wan, communication is based on IP addresses. However, the customer usually visits a web site, and needs to obtain the IP address corresponding to the web site first, which requires the domain name Service system to convert the domain name into an IP address. When you enter an IP address in the browser, the browser obtains the IP address corresponding to the domain name from the DNS server based on the configuration of the DNS server on the local client.

The domain name resolution server (DNS) is an application program based on UDP. It usually obtains domain name resolution requests from clients by listening on port 53.

The DNS lookup process is as follows:

Browser cache -> System cache -> Router cache ->ISP DNS cache -> Recursive searchCopy the code

The recursive search process is from the root DNS server to the top-level DNS server to the queried DNS server.

The DNS resolution process is shown in the following figure:

Workflow of domain name resolution

If you’re visiting a site that uses a cloud platform and is configured with smart DNS and global load balancing, in authoritative DNS services, usually by configuring a CNAME, we can create an alias, like vip.yourcomany.com, and tell the local DNS server, Let it request THE GSLB to resolve the domain name, and the GSLB can implement load balancing through its own policies during the domain name resolution process.

The GSLB knows the carrier and address of the user by viewing the carrier and address of the local DNS server that requests it. Then, the GSLB returns the IP address of the Region near the user to the local DNS server. The local DNS parser caches the results and returns them.

For mobile apps, you can bypass the traditional DNS resolution mechanism. As long as the HTTPDNS service passes, you can directly call the HTTPDNS server to obtain the public IP addresses of multiple SLBS.

Don’t forget that there is another important player on the Internet, the CDN, which also has a foot in the DNS resolution process. DNS resolution may give you the IP address of the CDN server, so you will get the CDN server instead of the actual address of the target website.

Because CDN caches most of the website’s resources, such as images and CSS style sheets, some HTTP requests do not need to be sent to Apple, and CDN can directly respond to your request and send you the data.

Pages dynamically generated by background services such as PHP and Java belong to “dynamic resources”, which cannot be cached by CDN and can only be obtained from the target website. Your HTTP request then begins its long journey across the Internet, passing through numerous routers, gateways, and proxies to reach its destination.

Protocol stack

After obtaining the IP through DNS, you can transfer HTTP to the protocol stack in the operating system.

The protocol stack is divided into several parts, each with different tasks. There are rules for the top-down relationship, where the upper part delegates work to the lower part, which receives the work and executes it.

The application (browser) delegates the stack by calling the Socket library. The top half of the stack consists of two protocols, namely TCP and UDP, which are responsible for sending and receiving data. These two protocols are entrusted by the application layer to perform the operation of sending and receiving data.

The bottom half of the protocol stack uses IP protocol to control the sending and receiving of network packets. When data is uploaded on the Internet, the data executioner is divided into network packets, and IP is responsible for sending network packets to the other party.

In addition, IP includes ICMP and ARP.

  • ICMP Indicates network packet transmission errors and various control information.
  • ARP is used to query Ethernet MAC addresses based on IP addresses.

The nic driver under the IP controls the nic hardware, while the nic at the bottom performs the actual sending and receiving operations, that is, sending and receiving signals on the network cable.

Reliable transport — TCP

HTTP is based on TCP, which can be described in the article TCP three-way handshake and four-way Wave (finite state machine) diagram.

Before HTTP can transmit data, TCP needs to establish a connection, which I won’t go into here.

If the HTTP request message is longer than the MSS length, TCP needs to send the HTTP data piece by piece instead of sending all the data at once.

MTU and MSS

  • MTU: Maximum length of a network packet, usually 1500 bytes in Ethernet.
  • MSS: The maximum length of TCP data that a network packet can hold, excluding IP and TCP headers.

The data is split in units of MSS length, and each piece of data is put into a separate network package. That is, the TCP header is added to each split data and then handed to the IP module to send the data.

 

Packet segmentation

TCP Packet Generation

TCP has two ports, one for the browser to listen on (usually randomly generated) and one for the Web server to listen on (HTTP default port number 80, HTTPS default port number 443).

After the two parties establish a connection, the data part of the TCP packet is the HTTP header + data. After the TCP packet is assembled, it needs to be submitted to the following network layer for processing.

So far, the network packet is shown in the following figure.

Remote location — IP

Let’s first look at the format of the IP packet header:

IP header format

In the IP protocol, both the source IP address and the destination IP address are required:

  • The source IP address is the IP address output by the client.
  • The destination address is the IP address of the Web server obtained through DNS domain name resolution.

Because HTTP is transmitted through TCP, set the protocol number in the IP packet header to 06 (hexadecimal), indicating that the protocol is TCP.

Two point transmission — MAC

Once the IP header is generated, the network packet then needs to precede the IP header with the MAC header.

The MAC header is the header used by the Ethernet. It contains information such as the MAC address of the receiver and sender.

MAC header format

In the MAC packet header, the MAC address of the sender and the MAC address of the receiver are required for transmission between two points.

Generally, in TCP/IP communication, the protocol type of the MAC header is only used:

  • 0800: IP protocol
  • 0806: INDICATES ARP

How do I confirm the MAC sender and receiver?

The MAC address of the sender is relatively easy to obtain. The MAC address is written into the ROM during nic production. Just read the value and write it into the MAC header.

The MAC address of the recipient is a bit more complicated. If we tell Ethernet the MAC address of the recipient, Ethernet will send the packet for us, so obviously we should fill in the MAC address of the recipient.

So you need to figure out who to send the packet to, and you can do that by checking the routing table. Just find a matching entry in the routing table and send the packet to the IP address in the Gateway column.

Now that you know who to send it to, how do you get their MAC address?

Don’t know the MAC address? Yell if you don’t know.

ARP is needed to help us find the MAC address of the router.

ARP broadcast

The ARP protocol broadcasts the following message to all Ethernet devices: Whose IP address is this? Please tell me your MAC address.

“The IP address is mine, and my MAC address is XXXX.”

If you and the peer are on the same subnet, you can obtain the MAC address of the peer. Then, we write the MAC address to the MAC header, and the MAC header is done.

It seems to be broadcast every time. Isn’t that a hassle?

Rest assured, the operating system will store the query results in a memory area called the ARP cache for later use, but only for a few minutes.

In other words, at the time of delivery:

  • If the MAC address of the peer is stored in the ARP cache, you do not need to send ARP query and use the ADDRESS in the ARP cache.
  • If the peer MAC address does not exist in the ARP cache, the device sends an ARP broadcast query message.

So far, the network packet is shown in the following figure.

MAC layer message

The network card

The network packet generated by IP is just a string of binary digits stored in memory and cannot be sent directly to the other party. Therefore, digital information needs to be converted into electrical signals before it can be transmitted over the network cable. In other words, this is the real data transmission process.

Responsible for performing this operation is the network card, to control the network card also need to rely on the network card driver.

After the nic driver obtains the packet from the IP module, it copies it to the cache in the NIC. It then adds a header and a start frame delimiter to the packet, and a frame check sequence to detect errors to the end.

 

Physical layer packet

  • The start frame delimiter is a marker used to indicate the start position of a packet
  • The trailing FCS (Frame check sequence) checks for damage during packet transmission

Finally, the network adapter converts the packet into an electrical signal and sends it out through the network cable.

switches

Let’s look at how packets get through the switch. Switches are designed to forward network packets as-is to their destinations. Switches work at the MAC layer, also known as Layer 2 network devices.

Packet receiving operations on the switch

First, the electrical signals arrive at the network interface and are picked up by modules on the switch, which then convert them into digital signals.

It then checks the FCS at the end of the package for errors and puts them into the buffer if there is no problem. This part of the operation is basically the same as the computer network card, but the switch works differently from the network card.

The nic of the computer itself has a MAC address, and by checking the MAC address of the receiver of the received packet to determine whether it is sent to its own, if not to its own discarded; In contrast, the switch port does not check the MAC address of the receiver, but receives all packets and stores them in a buffer. Therefore, unlike a nic, a switch port does not have a MAC address.

After storing the packet in the buffer, the next step is to check whether the MAC address of the packet’s receiver has been recorded in the MAC address table.

The MAC address table of a switch contains two pieces of information:

  • One is the MAC address of the device,
  • The other is which port on the switch the device is connected to.

For example, if the MAC address of the receiving packet is 00-02-B3-1C-9C-F9, it matches row 3 in the table in the figure. According to the information in the port column, the address is on port 3. Then the packet can be sent to the corresponding port through the switching circuit.

So, the switch looks up the MAC address from the MAC address table and sends the signal to the appropriate port.

What happens when the MAC address table cannot find the specified MAC address?

The specified MAC address cannot be found in the address table. This may be because the device with the address has not yet sent packets to the switch, or the device has been inactive for some time and the address has been removed from the address table.

In this case, the switch cannot determine which port to forward the packet to, but can only forward the packet to all ports except the source port. The device can receive the packet regardless of which port it is connected to.

This poses no problem because Ethernet is designed to send packets across the network, and then only the appropriate recipient receives the packet, which is ignored by other devices.

Some people say, “This will send extra packets, won’t it cause network congestion?”

There’s nothing to worry about, because the target responds after sending the packet, and once it does, the switch can write its address into the MAC address table, eliminating the need to send the packet to all ports next time.

A LAN can transmit thousands of packets per second, and one or two more packets will not matter.

In addition, if the recipient MAC address is a broadcast address, the switch will send the packet to all ports except the source port.

The following two are broadcast addresses:

  • FF:FF:FF:FF:FF:FF :FF
  • An IP address is 255.255.255.255

The router

Differences between routers and switches

After passing through the switch, the network packet now arrives at the router, where it is forwarded to the next router or target device.

In this step, the packet forwarding principle is similar to that of a switch. The packet forwarding target is determined by looking up the table.

However, there are differences between routers and switches in the specific operation process.

  • Because the router is designed based on IP, commonly known as Layer 3 network device, each port of the router has MAC address and IP address.
  • Switches, commonly known as Layer 2 network devices, are designed based on Ethernet. Ports on switches do not have MAC addresses.

Router Principles

The router’s port has a MAC address, so it can be an Ethernet sender and receiver. It also has an IP address, in the sense that it is the same as a computer’s network card.

When forwarding packets, the router port first receives Ethernet packets destined for itself, then searches the routing table for the forwarding destination, and sends Ethernet packets to the corresponding port as the sender.

The packet receiving operation of the router

First, the electrical signal reaches the interface of the network cable, and the module in the router converts the electrical signal into a digital signal, which is then checked for errors by the FCS at the end of the packet.

If yes, check the recipient’s MAC address in the MAC header to see if the packet was sent to you. If yes, put the packet in the receive buffer. Otherwise, discard the packet.

Generally speaking, router ports have MAC addresses and only receive packets that match their own addresses. If the packets do not match, they are discarded directly.

Query the routing table to determine the output port

Once the packet is received, the router replaces the MAC header at the beginning of the packet.

The FUNCTION of the MAC header is to send the packet to the router, where the MAC address of the receiver is the MAC address of the router port. Therefore, when the packet arrives at the router, the MAC header’s job is done and the MAC header is discarded.

The router then forwards packets based on the contents in the IP header behind the MAC header.

The forwarding operation is divided into several stages. The first step is to query the routing table to determine the forwarding target.

Router Forwarding Flowchart

For example

Suppose the computer at 10.10.1.101 wants to send a packet to the server at 192.168.1.100, which first arrives at the router in the figure.

The first step to determine the forwarding target is to search the target address bar in the routing table according to the IP address of the packet receiver to find the matching record.

Route matching is the same as described above. After ampersand is performed on the subnet mask and 192.168.1.100 IP of each entry, the result is matched with the target address of the corresponding entry. If it is matched, it will be used as a candidate forwarding target.

For example, if ampersand is performed between the subnet mask of the second entry 255.255.255.0 and the IP address of 192.168.1.100, the result is 192.168.1.0, which matches the destination address of the second entry 192.168.1.0, the record of the second entry will be used as the forwarding target.

If no matching route is found, the system selects the default route. A record with a subnet mask of 0.0.0.0 in the routing table indicates the default route.

Router send operations

The next step is to send the packet.

First, we need to determine the address of the other party based on the gateway column of the routing table.

  • If the gateway is an IP address, the IP address is the target address that we want to forward to. Before it reaches the destination, the router needs to forward it.
  • If the gateway is empty, the receiver IP address in the IP header is the target address to be forwarded to, and the destination address in the IP header is finally found, indicating that the destination has arrived.

After knowing the IP address of the peer, you need to use ARP to query the MAC address based on the IP address and use the query result as the MAC address of the receiver. The router also has an ARP cache. Therefore, the router searches the ARP cache first. If the ARP cache cannot be found, the router sends an ARP query request.

Next is the sender MAC address field, where the MAC address of the output port is filled in. There is also an Ethernet type field, 0080 (in hexadecimal) for the IP protocol.

Once the network packet is complete, it is then converted into an electrical signal and sent over the port. This step works the same way as a computer. The outgoing network packet travels through the switch to the next router. Because the MAC address of the receiver is the address of the next router, the switch forwards the packet to the next router based on this address.

Next, the next router forwards the packet to the next router, and after several layers of forwarding, the network packet reaches its final destination.

Have you noticed that during network packet transmission, the source IP address and destination IP address do not change, but always change the MAC address, because the MAC address is required for packet transmission between two devices in the Ethernet.

Data packets arrived at the server, the server must be happy ah, is the so-called friends from afar, is not yiyi?

Unpacking – server and client

The server is so happy that it begins to pick the skin of data packets! It’s like when you get a package, isn’t it exciting?

When the packet arrives at the server, the server uncovers the MAC address header of the packet to check whether it matches the MAC address of the server. If yes, the server collects the packet.

Then continue to open the PACKET IP header, found that the IP address matches, according to the PROTOCOL in the IP header, know that their upper layer is TCP protocol.

So, I peel off the TCP header, and I have the sequence number inside, and I need to see if this sequence packet is what I want, and if it is, I put it in the cache and I return an ACK, and if it’s not, I throw it away. The TCP header also contains the port number that the HTTP server is listening for.

Naturally, the server knows that the HTTP process wants the package and sends it to the HTTP process.

The server’s HTTP process sees that the request is to access a page, and encapsulates the page in

HTTP response packet.

HTTP response packets also need to wear TCP, IP, and MAC headers, but this time the source address is the IP address of the server, and the destination address is the IP address of the client.

Once you’re dressed, you go out of the network card, and the switch forwards it to the outgoing router, which sends the response packet to the next router, and you hop and hop.

Finally, the router jumped to the gateway handle of the client. The router peeled off the IP head and found that it was looking for someone in the city, so it sent the packet to the switch in the city, and the switch forwarded the packet to the client.

After receiving the response packet from the server, the client is also very happy. The client can unpack the package!

So, the client starts to peel the skin of the received packet, leaving the HTTP response packet and handing it to the browser to render the page. A special packet is displayed.

Finally, the client is ready to leave, and with four TCP waves to the server, the connection is broken.

other

The difference between gateway and route

Simple version of

“Gateway” is a big concept, does not specifically refer to a class of products, as long as the device connected to two different networks can be called gateway; While ‘router’ generally refers to a specific class of products that can implement route finding and forwarding, routers can obviously function as gateways.

A detailed version

A Gateway is a Gateway through which one network connects to another. According to different classification criteria, there are many kinds of gateways. The TCP/IP gateway is the most commonly used. All the “gateway” refers to the GATEWAY under THE TCP/IP protocol.  

So what exactly is a gateway? A gateway is essentially an IP address from one network to another. For example, network A and NETWORK B. The IP address of network A ranges from 192.168.1.1 to 192. 168.1.254, and the subnet mask is 255.255.255.0. The IP address of network B ranges from 192.168.2.1 to 192.168.2.254, and the subnet mask is 255.255.255.0. In the absence of a router, two networks cannot communicate with each other through TCP/IP. Even if two networks are connected to the same switch (or hub), TCP/IP determines that the hosts on the two networks are on different networks according to the subnet mask (255.255.255.0). And to realize the communication between the two networks, it must be through the gateway. If the host in network A finds that the destination host of the packet is not in the local network, it forwards the packet to its own gateway, which in turn forwards the packet to the gateway of network B, which in turn forwards it to A host in network B. The same is true when network B forwards packets to network A.  

Therefore, TCP/IP can realize the communication between different networks only when the GATEWAY IP address is set. So which machine’s IP address is this IP address? The IP address of a gateway is the IP address of a device with the routing function, such as a router, a server with the routing protocol enabled (essentially a router), and a proxy server (also a router).

A Router is a network device that is responsible for path finding. It searches for a network path with the least amount of traffic for users to communicate. Routers are used to connect multiple logically separate networks. To provide users with the best communication path, the router uses a routing table to select the path for data transmission. The routing table contains the list of network addresses and the distance between addresses. The router uses the routing table to find the correct path for packets from the current location to the destination address. The router uses the least time algorithm or the optimal path algorithm to adjust the path of information transfer. If a network path is faulty or blocked, the router can choose another path to ensure the normal transmission of information. The router can convert data format and become a necessary device for network interconnection between different protocols.

A HUB (HUB)

A hub is a machine that brings network cables together, a connector for multiple hosts and devices. The main function of the hub is to extend the transmission distance of the network. It is a form of repeater. The difference lies in that the hub can provide multi-port service, also known as multi-port repeater. The hub is at the OSI/RM physical layer. The basic function of a hub is information distribution, which distributes all signals received by one port to all ports. Some hubs regenerate weak signals before distribution, and some clean up the timing of signals to provide synchronous data communication across all ports.

A hub is essentially a multi-port repeater. Hubs generally have 4, 8, 16, 24, 32 and other numbers of RJ45 interfaces, through these interfaces, the hub will be able to complete the corresponding number of computers “relay” function (has been the attenuation of incomplete signal after sorting, to produce a complete signal to continue transmission). Because of its “central” position in the network, a HUB is also called a “HUB.”

The hub works in a simple way, such as having a hub with eight ports for connecting eight computers. The hub is located in the “center” of the network, through which signals are forwarded and eight computers can be connected to each other. The specific communication process is as follows: If computer 1 to 8, a message sent to the computer when a computer network card through the information 1 twisted pair to the hub, hub will not directly send information to the computer 8, “broadcast” – it will information and send the information to the eight ports, when eight port on the computer receives the broadcast information, If the message is sent to you, you will receive it. Otherwise, you will ignore it. Since the message is sent from Computer 1 to computer 8, eventually computer 8 will receive the message, while the other seven computers will read the message and reject it because it is not their own.

Switch (Switch)

The switch is the upgraded product of the HUB, which is no different from the HUB in appearance. It is a device that automatically completes the information exchange function in the communication system. It is also used to connect the network like the HUB, but it has more powerful functions than the HUB.

Switches also called switching hub, it through the information to regenerate, and forwarded to the specified port after internal processing, automatic addressing ability and exchange function, due to the switch according to the destination address of the packet, each packet independently from the source port to send to the port of destination, to avoid the collision and other ports. The generalized switch is the equipment that completes the information exchange function in the communication system.

In the computer network system, the switch is aimed at the weakness of the shared working mode. Hub is a Shared work mode, if compare hub to a postman, the postman is a don’t know the word “fool”, go to his messenger, and he did not know directly according to the correspondence address send mail to the recipient, will only be distributed to all the people took the letter, then according to the address information to receive man is his own! The switch is a “smart” postman – it has a high-bandwidth back bus and an internal switching matrix. All switch ports are articulated in the back on the bus, when control circuit receives packets processing port looks in memory in order to determine the destination MAC address table (network adapter hardware address) of the NIC (NIC) articulated in which port, through internal exchange matrix quickly to send the packet to the destination port. If the destination MAC address does not exist, the switch broadcasts to all ports. After receiving the response from the port, the switch “learns” the new ADDRESS and adds it to the internal address table.

conclusion

  • Before the application layer is connected to the network adapter, it is actually preparing for sending data, such as adding various header information, MAC and IP addresses.
  • The network transmission process is mainly through the switch – router -… – The router eventually sends the data to the other server.
  • Both servers and clients go through the process of adding and removing headers, switches pass data by MAC, routers pass data by MAC and IP.

This article is only the request process to do the explanation, but really in the network transmission process, also involves encryption, caching, load balancing, data transmission and other processes.

Here, just found that want to put a network request complete understanding, or need to spend a lot of effort.

 

Other articles

  • HTTP overview

  • TCP three-way handshake and four-way Wave (Finite State machine)

  • From the time you type in the url to the time you see the page — explain what happens in between

  • HTTPS (Detailed Version)

  • Rambling on about HTTP connections

  • Rambling on HTTP performance tuning

  • Introduction to HTTP packet format

  • Easy to understand: HTTP/2

  

Refer to the article

What happens from the time you type in a web address to the time the page is displayed?