This is the fifth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

The preface

This is the third day of the browser program. We have covered “Multi-process Architecture in Chrome” and “Data Transfer (IP/UDP/TCP)”. With these foundations, today we will learn about the HTTP protocol.

HTTP protocol, based on the TCP connection. HTTP is a protocol that allows the browser to obtain resources from the server. It is the basis of the Web. The browser usually initiates requests to obtain different types of files, such as HTML files, CSS files, JavaScript files, images, and videos. In addition, HTTP is the most widely used protocol in browsers.

HTTP request initiation process

When we visit a web address in the browser, the browser will send a resource request to the target resource server, during which a series of rigorous operations will be carried out to ensure the security and stability of the request. Next, we will understand in detail step by step.

1. Build the request

Before sending the request, you need to build the request line information, which declares the request mode, request file, and request protocol. After the build is complete, the browser is ready to initiate network requests.

// Request line example GET /index.html HTTP1.1Copy the code

2. Check the cache

Before initiating the network, the browser will check whether the cache file of the request exists in the browser cache according to the request line. If so, it will directly fetch the resource from the cache, return the resource, and end the request. This has the following benefits:

  1. Relieves server pressure and improves performance.
  2. Caching is an important part of fast resource loading for web sites.

3. Prepare IP addresses and ports

As we learned earlier, the Internet data transmission is implemented using TCP and IP, and HTTP is used as the application layer protocol to encapsulate the text information of the request, that is, the HTTP content is implemented through the transmission data phase of TCP. So we know:

  1. The first step in an HTTP network request is to make a TCP connection to the server.
  2. The first step in establishing a TCP connection is to prepare the IP address and port number.
  3. We can get IP and port information through the URL address.
  4. Because IP is a bunch of numbers that are really hard to remember, so just to make it easier to remember if you have a DNS, which is a domain name, it’s a mapping relationship with IP, you just type in DNS and it will translate to the corresponding IP.

Summary: The first step of the browser request is to ask the DNS to return the IP address corresponding to the domain name. Of course, the browser also provides the DNS data buffer service. If a domain name has been resolved, the browser will cache the resolution result for the next query, which will reduce the network request. Once you have the IP, you need to get the port number. In general, if the URL does not specifically specify a port number, the HTTP protocol defaults to port 80.

4. Wait for the TCP queue

Because of Chrome’s TCP connection mechanism (a maximum of six TCP connections can be established for the same domain name at a time), when there are more than six requests, the excess requests are queued until the ongoing request completes. Of course, if the current number of requests is less than six, the next step is to establish a TCP connection.

5. Set up a TCP connection

After the queuing is complete, the TCP connection is established through the three-way handshake. “Data Transmission (IP/UDP/TCP)” this article has a detailed description of the TCP connection mode.

6. Send an HTTP request

Once the TCP connection is established, the browser can communicate with the server, and the HTTP data is transferred during this communication. Let’s look at an HTTP request packet.First, the browser sends a request line to the server, which includes the request method, the request URI (Uniform Resource Identifier), and the HTTP version protocol. Send a request line, which tells the server browser what resource it needs. The most common request method is Get. For example, typing the geek time domain (time.geekbang.org) directly into the browser address bar tells the server to Get its home page resources. Another common request method is POST, which is used to send some data to the server. For example, to log in to a website, you need to send user information to the server through POST method. If the POST method is used, the browser also has to prepare data for the server, which in this case is sent through the request body. After the browser sends the request line command, it sends some other information in the form of a request header to tell the server some basic information about the browser. For example, it contains information about the operating system used by the browser, the kernel of the browser, the domain name of the current request, the Cookie of the browser, and so on.

The server processes THE HTTP request flow

1. Return the request

First, the server returns a response line, including the protocol version and status code. But not all requests can be processed by the server, so what about some can not be processed or processing error information, what about? The server tells the browser its processing endpoint by requesting the line status code.

  1. The most common status code is 200, indicating success, or 404 if no page is found.

  2. Then, just as the browser sends a request header with the request, the server sends a response header with the response to the browser. The response header contains information about the server itself, such as when the server generated the returned data, the type of data returned (JSON, HTML, streaming media, etc.), and the Cookie that the server will store on the client.

  3. After sending the response header, the server can continue sending the data for the response body, which typically contains the actual content of the HTML.

2. Disconnect the connection

Normally, once the server returns the requested data to the client, it closes the TCP connection. However, if the browser or server adds the following to its header:

Connection:Keep-Alive 
Copy the code

The TCP connection will then remain open after being sent, allowing the browser to continue sending requests over the same TCP connection. Keeping a TCP connection saves the time of establishing a connection for the next request and speeds up resource loading. For example, the images embedded in a Web page are all from the same Web site. If you initiate a persistent connection, you can reuse that connection to request other resources without having to set up a new TCP connection.