What happens before the browser opens the web page?

The purpose of this article is to help you understand what happens when you type a URL into your browser’s address bar and press Enter.

Let’s take this site as an example

Decrypt the URL

When a user enters a URL and presses the Enter key, the browser first begins decoding the entered URL.

Decompose the URL to find the contents of “protocol”, “domain name”, and “pathname”.

https://juejin.cn/
Copy the code

In the above URL, HTTPS is the protocol, juejin. Cn is the domain name, and/is the pathname.

If there is no protocol description or a valid field, the browser will pass the input text to the browser’s default search engine.

Check the list of HSTS

HSTS (HTTP Strict Transport Security) is just a feature that allows browsers to automatically replace HTTP with HTTPS when a user tries to access a web site over HTTP.

It is designed to protect users accessing websites over HTTP from man-in-the-middle attacks.

Obtain the IP address through DNS

The Domain Name System (DNS) maps IP addresses to Domain names.

Every device connected to the Internet (PC, phone, server, router, etc.) has a unique number. This number is called an IP address.

When you make a phone call, you specify the number of the person you want to call. Similarly, when you communicate over the Internet, you designate each other by IP address. The IP address is a number separated by dots, for example, 10.11.12.13.

Humans are not very good at remembering strings of numbers like “10.11.12.13”, so we created a system where you can type in a domain name and it will tell you what the corresponding IP address is. This is the DNS system.

Check your browser’s cache

First, the browser looks through its cache to see if it already knows your IP address. This is because if you have recently visited the site, your IP address may still be in the cache.

If your browser is Google Chrome, you can check the cached DNS information below.

chrome://net-internals/#dns
Copy the code

If it is still in the cache, the name resolution process is complete.

Checking hosts Files

If your browser has no cache, you can now go to the host file.

The hosts file is an operating system configuration file that describes the mapping between IP addresses and host names on the TCP/IP network.

In the early days of the Internet, translating domain names into IP addresses was done in a text file called hosts.txt, a prototype of the HOSTS file.

This means that all HOSTS on the Internet and their IP addresses are mapped to a single file hosts.txt, which you must refer to on your computer to communicate with other HOSTS.

Sudo vi /private/etc/hostsCopy the code

This is how you can get out.

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1       localhost
Copy the code

It is written in the format of [IP address][host name], as “127.0.0.1 localhost” above.

When hosts.txt was first used (in the 1970s), there were only a few hundred HOSTS, so it was possible to include information about every host on the network.

However, as the Internet grew, hosts.txt grew in size, and by 1983 there were approximately tens of thousands of HOSTS. It is no longer possible to use hosts.txt to resolve names, so the current DNS server was created to resolve names.

The residual hosts file is still in use today. Before querying the DNS server for name resolution, the browser checks the hosts file. If it finds the host name in the file, the name resolution process is complete.

Invoke a stub parser

If the IP address of the destination juejin.cn is not found in the hosts file, the DNS server is queried.

The browser first invokes a stub parser. The stub resolver is a feature of the operating system on the client’s computer. The stub resolver then asks the caching DNS server “Do you know the IP address of juejin.cn? To the cached DNS server.

If the cached DNS server has a cache

As the name implies, the cache DNS server stores the results of each query in the cache for a period of time, so that if the same query is received later, it can be reused and returned.

So if you had previously received a query from a stub parser that said, “Do you know the IP address of juejin.cn? We will return the result of this query to the stub parser because it should be stored in the cache.

The stub parser extracts the IP address from the result it receives and writes it to a browser-specified memory region. This completes the name resolution process.

If the cached DNS server does not cache

If the cached DNS server does not existjuejin.cn, which will query “root name server “→”.cn name server “→”juejin.cnName server “instead of stub resolver.When an IP address is found, it returns a query to the root parser saying “IP address found.” The stub parser then extracts the IP address from the query result and writes it to a browser-specified memory region.

Now that the browser has the IP address of juejin.cn, the name resolution process is complete.

If access is not available

It looks around for the IP address, and if it still can’t find it, it returns an error message.I think users will see a screen like this (´∵).

The port number

A port number is a number in TCP/IP that specifies which of several pieces of software running on the same computer to communicate with.

If you compare an IP address to a phone number, then a port number is like a phone number that you call and say, “I’d like to speak to Miss ____, thank you.”

You specify who you want to speak to by following the domain name of the URL with a colon (:) followed by the port number of your choice.

For example, http://juejin.cn/ would be http://juejin.cn:80/, and 80 because it is HTTP.

For juejin.cn, the port is https://juejin.cn:443/ because it is HTTPS.

We can access a website without entering the port number in the URL in the browser, because the scheme automatically assigns a port to the site, such as 80 for “HTTP :” and 443 for” HTTPS “.

Send an HTTP request

Next, we’ll look at HTTP requests sent to the web server.

An HTTP request is a request sent from a browser to a web server. We’ll take a look at how the browser sends HTTP requests.

Now we know the domain name and pathname from the “decode URL”, which the browser uses to create an HTTP request.

I’ve heard that HTTP requests are created in “request line”, “header”, and “message body” formats, but I don’t understand, so I want to check what exactly is being sent by clicking on the command.

I used the following command.

curl -v https://juejin.cn/
Copy the code

Output the content of the HTTP request.

GET/HTTP/2 Host: juejin.cn user-agent: curl/7.64.1 Accept: */*Copy the code

A request is actually followed by a response, but we’ll discuss this in the next section, “Sending an HTTP response.”

For ease of understanding, the structure of the request is shown below.

The most important of these is the first line, the request line.

GET/HTTP/2 stands for “HTTP method “, “target URI”, and “HTTP version “, starting from the left.

GET is the most common method and represents a request from a browser to a web server to retrieve a page. The second “/” on the left indicates which page is being requested. This is transcribed directly from the path name embedded in the URL.

Also, although the format is “request line”, “header”, and “body”, there is no body in the actual request because the HTTP method is GET. In the case of GET, the method and URL are enough to let the web server know what to do, so there is no need to write anything in the body of information.

Load balancer

A load balancer is a device that distributes the load on a network server to multiple servers. It is also known as a “load balancer”.

When a network service runs with only one server, it is forced to go out of service when the server is down for centralized access, so it is common to have multiple servers.

A load balancer is a device that binds these multiple network servers together and distributes requests to the network servers in a balanced manner.

The load balancer can perform health checks to track the state of the server and session maintenance to ensure that requests from the same customer are continually routed to the same server.

Health detection

This is a feature that constantly checks to see if the network server underneath it is working properly.

If the server does not respond correctly, it is considered an “exception” and the request is not sent to that server, but to another normal server.

Session of the maintenance

This is a function that assigns access from the same user to the same server.

Without this feature, when a user logs in and the load balancer sends a request to a different server in the next communication, the server will not know the previous communication status and will ask “who are you? Because the server does not know the status of the previous communication.

Therefore, access from the same user will be assigned to the same server, for example by checking the IP address of the sender.

Another approach to session maintenance using cookies is that communications with the same cookie are always sent to the same network server.

Send an HTTP response

When a request is sent, a response is returned from the network server.

The HTTP response is omitted from “Send AN HTTP request”, which reads as follows.

HTTP/2 200 etag: "de7-OJQMWJz+xf8wsmQufuQRjAHeH+c" content-type: text/html; charset=utf-8 accept-ranges: none vary: Accept-Encoding x-cloud-trace-context: 0a770f14325a57bd2fca0a614fd11841; o=1 date: Mon, 08 Mar 2021 15:17:31 GMT server: Google Frontend content-length: 3559 <! doctype html> <html > <body > </body> </html>Copy the code

Unlike the request, the first line of the response is called the status line, where HTTP/2 200 stands for “HTTP version “and” status code, “respectively.

A status code is a code that indicates whether the request was successful or whether an error occurred. In this case, it is 200, which means that the web server has successfully processed the request.

Also in the header are the content types: text/ HTML; Charset = utf-8. This is an indication of the format of the data in the body. In this case, it means “the content is an HTML file with a UTF-8 character set”. The browser looks at this and decides what to do with the data.

The message body contains HTML, the requested resource, as described in the content type.

When the response is returned, the data is retrieved from the message body and displayed in the browser.

HTML rendering

Browser rendering has four main stages.The first process is loading. It loads HTML, CSS, JavaScript, images, and other resources for drawing. The first resource retrieved is an HTML file.

The browser will load the HTML file sequentially from the top, and if it finds any external resources along the way, such as CSS, JavaScript, or images, it will ask the Web server to retrieve them.

The loaded resources are then converted to internal resources for the rendering engine.

HTML is converted to a DOM tree and CSS is converted to a CSSOM tree. These materials are then used in the subsequent rendering and painting phases.

Once that’s done, it’s time to write the script.

In Scripting, the lexical analysis, parsing, and compilation processes are complete, and only then does the JavaScript code execute.

In ShareFull, we use Vue as our JavaScript framework, so Vue is called when JavaScript code is executed.

If there is an API calling process in the Vue, it will make a request to the API server to retrieve the data.

Retrieving JSON data

For example, a ShareFull web page displays the logged-in user’s name and department signature at the top right of the screen. This data is obtained by using the API server.

An API server is a server that provides data using API mechanisms.

An API server is a server that provides data using AN API mechanism. The browser requests data from the API server by sending a request to the endpoint using HTTP methods and parameters.

The resource you want is specified by mentioning it in the body of the message. Specify Content-Type: Application /json in the header and send the request in JSON format.

Based on the received request, the API server issues an SQL statement to query the database server and retrieve the data. It is then formatted as JSON and returned to the browser. Basically, the structure is as follows.

{ "data": {... }, "errors": [...] }Copy the code

Data is the result of a query, and errors are stored in errors. The browser takes these results and draws them out.

The end of the