This article explores what happens when you enter an address in the browser’s address bar until the page is rendered. Through the exploration of this problem, detailed comb HTTP, HTTPS, TCP and front-end performance optimization related issues. This paper involves front-end, back-end, operation and maintenance related work, and also involves a lot of proper terms and their derivative knowledge

This is what happens when we type an address into the browser url and hit Enter. There are two cases, HTTP transactions and HTTPS transactions. Let’s start with HTTP transactions: a super detailed article

1. The browser (client) resolves the address.

2. Perform DNS resolution on the resolved domain name.

3. Find the target (server) address by IP addressing and ARP.

4. Perform the TCP three-way handshake to establish a TCP connection.

5. The browser sends data and waits for a response from the server.

6. The server processes and responds to requests.

7. The browser receives the response from the server and gets the HTML code.

8. Render the page.

With these steps, a complete HTTP transaction is completed. What happens at each step is described in detail below.

First, browser (client) address resolution.

When we type an address into the browser and press Enter, the browser gets a string. The browser then resolves the address to obtain the protocol, host, port, and path information.

The general format of the URL is (note will automatically filter Angle brackets, so only images can be uploaded) :

Such as:

www.imooc.com/article/dra… There are a few things missing from this site: port number, username, password, Query and flag. These things are not necessary, even protocol, path can not, the simplest way is imooc.com, the browser will complement some of the default things. For example, the default port number of the Internet URL is 80, the default complement function of the browser will complement the protocol HTTP, some will directly add WWW in front of the domain name. So in fact, even though we typed imooc.com, we actually went to www.imooc.com.

2. Perform DNS resolution on the resolved domain name.

We have obtained the domain name of the server in the first step of address resolution. In this case, you need to change the domain name to the corresponding IP address, which is DNS resolution. DNS resolution is divided into the following steps:

1. Check whether an IP address corresponding to the domain name exists in the DNS cache of the browser.

2. If no, check whether the CORRESPONDING IP address exists in the DNS cache of the operating system (for example, the Hosts file of Windows).

3. The DNS server in the local area is still not requested.

4. If no, go to Root Server for resolution.

There are a few points to note:

<1> DNS uses TCP protocol for zone transmission and UDP protocol for other times.

< 2 >, global logical root server only thirteen, thirteen why is Taiwan, please refer to www.zhihu.com/question/22… The corresponding IP address is returned if any of these parses succeed.

Three, through IP address and ARP, find the target (server) address.

IP resides at layer 3 Internet layer (network layer), and ARP resides at layer 4 network access layer (link layer). In fact, you should have a good understanding of the OSI seven layer architecture, not detailed here, interested can see: baike.baidu.com/item/%E5%BC…

The second step is to obtain the IP address. In this case, the server corresponding to the IP address is directly found through THE IP address, and the MAC address of the server is found through the ARP protocol.

Here are a few things to note:

1. IP address (ipv4, 32-bit). An IP address is a unified address format provided by the IP protocol. It allocates a logical address to each network and each host on the Internet to shield physical address differences. IP addresses are classified into five types: A, B, C, D, and E.

Class A address: A one-byte (8-bit) network address and A three-byte host address. The IP address ranges from 1.0.0.0 to 126.255.255.255.

Class B address: a two-byte network address and a two-byte host address. The IP address ranges from 128.0.0.0 to 191.255.255.255.

Class C address: three bytes of network address and one byte of host address. The IP address ranges from 192.0.0.0 to 223.255.255.255.

Class D Address: Class D addresses are used for Multicast. Class D IP addresses start with lll0 in the first byte and are reserved for Multicast. The IP address ranges from 224.0.0.0 to 239.255.255.255.

Class E ADDRESS: Class E IP addresses start with llll0 and are reserved for future use. The address range is 240.0.0.0 to 255.255.255.254. 255.255.255.255 is used to broadcast an address.

There are two missing parts, one starting with 0, “0” indicates that the address is a local host and cannot be transmitted. The value starting with 127 is the nic itself, which is often used for testing. Why is it a decimal number, why is there a ‘. ‘in the middle, in fact, it is for the convenience of mankind and added artificially. Translated into computer language is binary, each byte is eight bits, the largest number that the eight bits can represent is 255, so that the IP address is complete. Some people may also find the IP address 10.170.8.61/23, which involves the LAN, reserved address, and subnet mask. This means that the first 23 bits represent the network address of the host, which has 2^(32-23) = 512 hosts. I will not expand on the specific, involving too deep content, too much. Interested can refer to www.zhihu.com/question/56…

2. How does IP addressing work?

There are two types of IP address: one is on the same network segment, and the other is on different network segments. To determine whether two IP addresses are in the same network segment, perform and operation on their IP addresses and subnet masks respectively. The result is a network number. If the network numbers are the same, they are in the same subnet; otherwise, they are not in the same subnet.

On the same network segment:

Host A and host B, host A first by the native hosts table or wins or DNS system will first computer name into an Ip address of host B, then use your own Ip address and subnet mask to calculate what A network segment, compare the Ip address of the destination host B with their own subnet mask, discovery and they are for the same network segment, If the MAC address of host B can be found in its ARP cache, it can directly perform data link layer encapsulation and send the encapsulated Ethernet frame to a physical line through the network adapter: If the MAC address of host B does not exist in the ARP cache, host A enables ARP to broadcast ARP packets on the local network to query the MAC address of host B. Host A obtains the MAC address of host B and writes it into the ARP cache table to encapsulate the data link layer and send data.

Different network segments:

Different data link layer networks must be assigned Ip addresses on different network segments and connected by routers. As above, host A finds that it and host B are not on the same network segment, so host A will know that it should send the packets to its default gateway, the local interface of the router. Host A searches for the MAC address of the default gateway in its ARP cache. If it can find the MAC address, host A directly encapsulates the Ethernet data frame on the data link layer and sends the encapsulated Ethernet data frame to the physical line through the network adapter. If there is no MAC address of the default gateway in the ARP cache table, host A sends the Ethernet data frame to the physical line. Host A enables ARP to broadcast ARP packets on the local network to query the MAC address of the default gateway. After obtaining the MAC address, host A writes the MAC address into the ARP cache table to encapsulate data at the data link layer and send data. Data frame to the router first decapsulation after receiving interface, into IP packet, the IP packet processing, according to the destination IP address the route table lookup, decided to do after forwarding interface to adapt to the forwarding interface data link layer protocol frame encapsulation, and sent to the next-hop router, time process continued until the host network to reach the goal and purpose. The whole process is a bit like DNS resolution, except that the DNS server is replaced with a next-hop router, udp is programmed with TCP, and nothing else is much different.

3, arp. Arp is the address translation protocol, which translates an IP address into a MAC address. It’s a lot like DNS, you look in the cache, and then you look in the router.

4. MAC address. A MAC address is the physical address of a computer. Each nic is burned on the nic by the manufacturer before delivery. The hexadecimal number contains six bytes (48 bits). Three bytes are the codes (high 24 bits) allocated to different manufacturers by IEEE registration management agency RA, also known as “Organizationally Unique identifiers”. The last three bytes (low 24 bits) are assigned to the adapter interface by each manufacturer. This is called an extended identifier (uniqueness). How do I change the MAC address? One way to do this is to change the MAC address on the nic and fire it yourself. This is basically unreliable and error-prone. Another method is to change the MAC address in the registry, because the MAC addresses accessed from the network are the MAC addresses in the registry and do not directly access the network adapter. This one is pretty straightforward.

5. Why do you need a MAC address when you have an IP address? It’s a big deal. It’s like I have a driver’s license and you’re asking me for id. This is a bit of a historical problem, because in the beginning when there was no Internet there were MAC addresses, there were no IP addresses. Then as the Internet got bigger and bigger, it became too difficult and time-consuming to find MAC addresses, so IP addresses were invented. And MAC addresses are useful on a local area network, so both exist together. Detailed information, everyone can refer to www.zhihu.com/question/21…

4. Perform the TCP three-way handshake to establish a TCP connection.

I have covered the details of the TCP three-way handshake in previous articles. If you are not clear, you can visit a handshake explanation website. To recap, in step 3 we find the destination IP and get the MAC address of the server IP. The browser then requests a connection to the server to transfer data. TCP is stable bidirectional and connection-oriented. When TCP is disconnected, it is disconnected on both sides. Connection-oriented TCP does not mean that both sides are always open, but rather maintains a state of connection and makes it appear to be connected.

5. The browser sends data and waits for a response from the server.

Step 4 Now that the connection has been established, it’s time to send data. The browser wraps the request as a request message. The format of the request packet is as follows: Start line: for example, GET/HTTP/1.0 (request method request URL request protocol) Header: User-agent Host

The main body

There is a carriage return newline between the request header and the body. If it is a GET request, there is no body part, whereas a POST request has a body part. There was some request header is important, of course, you can refer to www.imooc.com/article/206…

The server processes and responds to requests.

After the browser request packet reaches the server, the server interface processes the request packet, executes the code corresponding to the interface, and responds to the client after the processing is complete. Because HTTP is stateless, the client normally disconnects when it receives the response, and the HTTP transaction is done in one go. But HTTP1.0 has a keep-alive request field that keeps the connection open for a certain amount of time (sometimes even a long time). Http1.1 directly enables the keep-alive option by default. One consequence of this is that the server has finished processing the request, but the client does not voluntarily disconnect, which leads to the server resources being occupied. In this case, the server has to proactively disconnect the connection, and the party that proactively disconnects will appear TIME_WAIT, occupying the connection pool. This is the cause of SYN Flood attacks.

In this case, there are three processing methods: the client disconnects, the server disconnects, and the TCP connection is configured. In the first case, if the data returned by the server has a definite Content-Length attribute, or if the client knows that the content returned by the server terminates, the client disconnects. In the second case, the server can disconnect the TCP connection by setting a maximum supermarket time. In the third case, change the three TCP parameters t. The first is that tcp_synack_retries can be used to reduce the number of retries. The second is tcp_max_syn_backlog, which can increase the number of SYN connections. Tcp_abort_on_overflow rejects the connection.

7. The browser receives the response from the server and gets the HTML code.

In fact, you have a doubt in the mind, what is there to say about this step. There are a lot of things to be aware of here. The browser sends a request packet as follows:

You need to focus on one header — Accept. Accept represents the type of data that the sender (client) wants to accept, and this is the request header that the browser automatically wraps. If the server returns either content-Type accept, the browser can parse it and display it directly on the web page. If the server returns a content-Type of any other type, the browser has three processing states:

1. Normal display. For example, if the return type is text/javascript, the browser can process and display it directly.

2. Download. For example, if the return type is application/octet-stream, the browser will download files that cannot be handled directly.

3. Error reporting. When we return a string hello world in text/ XML format, the browser doesn’t parse it properly, and it displays an error message in the web page.

Browsers can handle a wide variety of formats and render them directly on a web page, not just the fields specified in Accept.

Author: maoruibin links: www.imooc.com/article/236… This article was originally published in the MOOC website, reprinted please note the source, thank you for your cooperation

This article is from:www.imooc.com/article/235…Pure learning