When the interviewer asked this question, most people heard the heart is secretly happy: long back the eight-part essay.

But wait, can you answer the following questions:

  1. Why do browsers parse URLS? What character encoding is used for URL parameters? What’s the difference between encodeURI and encodeURIComponent?
  2. What are the disk and memory caches of the browser cache?
  3. What is the difference between prefetch and preload?
  4. What is the difference between async and defer for JS scripts?
  5. Why three TCP handshakes, four wave hands?
  6. HTTPS handshake?

The same question can be asked of a P5 or a P7, but in different depth. Therefore, I reorganized the whole process. This article is quite long, so I suggest storing it first.

An overview of the

Before we dive into the topic, let’s take a quick look at the browser architecture as a primer. Browser is multi-process work, “from URL input to render” will mainly involve, is the browser process, network process and render process these three:

  1. Browser processes handle and respond to user interactions, such as clicking and scrolling;
  2. Network processes are responsible for processing data requests and providing download functions;
  3. The rendering process is responsible for processing the obtained HTML, CSS, JS into visible, interactive pages;

“From URL input to page rendering” the whole process can be divided into network request and browser rendering two parts, respectively by the network process and rendering process to deal with.

Network request

The network request section does the following:

  1. The URL parsing
  2. Checking the resource Cache
  3. The DNS
  4. Establishing a TCP Connection
  5. TLS Negotiation key
  6. Send a request & receive a response
  7. Disabling a TCP Connection

I’m going to go through them all.

URL parsing

The browser first determines whether the input is a URL or a search keyword.

If it is a URL, the incomplete URL is synthesized into a complete URL. A complete URL should be: protocol + host + port + path [+ parameter][+ anchor]. For example, if we type www.baidu.com in the address bar, the browser will eventually concatenate it to https://www.baidu.com/, using port 443 by default.

If it is a search keyword, it is concatenated to the parameters of the default search engine. This process requires escaping unsafe character encodings entered (safe characters are numbers, English, and a few symbols). Because URL parameters can not have Chinese characters, also can not have some special characters, such as =? Otherwise when I search for 1+1=2, if I didn’t escape it, the URL would be /search? Q =1+1=2&source= Chrome, and the URL itself separator = ambiguity.

The encoding used when urls escape insecure characters is called percent encoding because it uses a percent sign plus two digits of a hexadecimal number. These two hexadecimal numbers come from UTF-8 encoding, which translates each Chinese character into three bytes. For example, if I type “Chinese” in the Google address bar, the URL will become /search? Q =%E4%B8%AD%E6%96%87, a total of 6 bytes.

EncodeURI and encodeURIComponent, which we often use when writing code, do just that. They have basically the same rules, except that =? &; Uris such as/form symbols. These are not encoded in encodeURI, but are all encoded in encodeURIComponent. Because encodeURI encodes the whole URL, while encodeURIComponent encodes the parameter part, it needs to be more strict.

Check the cache

Checking the cache must be done before the actual request is made for the caching mechanism to work. If a cache resource exists, check the validity period of the cache.

  1. If the cache resource is used within the validity period, it is called strong cache. If the cache resource is used within the validity period, it is called strong cache. If the cache resource is used within the validity period, it is called strong cache. Memory cache refers to the retrieval of resources from memory, disk cache refers to the retrieval of resources from disk. Reading from memory is much faster than reading from disk, but whether resources can be allocated to memory depends on the current state of the system. Typically, the memory cache is used to refresh the page and the disk cache is used to reopen the page after closing it.

  2. If the validity period exceeds, it sends a request to the server with the cached resource identification to verify whether it can continue to use. If the server tells us that it can continue to use local storage, 304 will be returned without data. If the server tells us that we need to use the updated resource, we will return 200 and cache the updated resource and resource identifier locally for next use.

The DNS

If the local cache is not used successfully, it is time to initiate a network request. The first thing to do is DNS parsing.

Will search in order:

  1. Browser DNS cache;
  2. DNS cache of operating system;
  3. Router DNS cache;
  4. Query the DNS server of the service provider.
  5. Query 13 root DNS servers around the world;

To save time, you can do DNS preparsing in the HTML header:

<link rel="dns-prefetch" href="http://www.baidu.com" />
Copy the code

To ensure timely response, DNS resolution uses UDP

Establishing a TCP Connection

The request we send is based on TCP, so the connection has to be established first. To establish a connection for communication is to call both parties online; Connectionless communication is SMS in which the sender speaks his own language regardless of the receiver.

This confirmation that the receiver is online is done using the TCP three-way handshake.

  1. The client sends a request to establish a connection.
  2. The server sends a connection confirmation, and allocates resources for the TCP connection.
  3. The client sends an acknowledgement for establishing a connection. Then, the client allocates resources for the TCP connection.

Why does it take three handshakes to complete a connection?

You can assume what happens when you make a connection only twice. Modify the status diagram above a bit and everything looks fine.

However, if the server receives an invalid connection request, we will find that the server’s resources are wasted — the client does not want to send data to it, but it has memory and other resources waiting for it.

Therefore, the three-way handshake is used to keep the client alive and prevent the server from wasting resources when it receives invalid timeout requests.

Negotiate encryption key — TLS handshake

To ensure communication security, we use HTTPS protocol, where S stands for TLS. TLS uses an asymmetric + symmetric encryption approach.

Symmetric encryption means that both sides have the same secret key and both sides know how to encrypt and decrypt the ciphertext. This encryption is fast, but the problem is getting both parties to know the secret key. Because the transmission of data is to go through the network, if the secret key is transmitted through the network, the secret key is intercepted, it loses the meaning of encryption.

Asymmetric encryption, everyone has a public key and a private key, the public key can be known to everyone, the private key is only known to oneself, the data with the public key encryption, decryption must use the private key. This encryption method can perfectly solve the problems of symmetric encryption, the disadvantage is that the speed is very slow.

We adopt asymmetric encryption to negotiate a symmetric key, which is known only to the sender and receiver. The process is as follows:

  1. The client sends a random value along with the required protocol and encryption;
  2. The server receives the random value from the client, sends its own digital certificate, attaches a random value to it, and uses the corresponding method according to the protocol and encryption mode required by the client;
  3. The client receives the certificate from the server and verifies whether the certificate is valid. If the certificate is valid, the client generates a random value and encrypts the random value through the public key of the server certificate and sends it to the server.
  4. The server receives the encrypted random value and decrypts it with the private key to obtain the third random value. At this time, both ends have three random values. The three random values can be used to generate the key according to the previously agreed encryption mode, and the subsequent communication can be encrypted and decrypted with the symmetric key.

It can be seen from the preceding steps that asymmetric encryption is used at both ends to communicate during TLS handshake. However, asymmetric encryption has higher performance loss than symmetric encryption, so symmetric encryption is used at both ends during data transmission.

Send a request & receive a response

The default HTTP port is 80, and the default HTTPS port is 443.

The basic composition of a request is the request line + request header + request body

POST /hello HTTP/1.1
User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
Host: www.example.com
Accept-Language: en, mi

name=niannian
Copy the code

The basic composition of a response is the response line + response header + response body

HTTP/1.1 200 OK Content-Type: Application /json Server:apache {password:'123'}Copy the code

Disabling a TCP Connection

When the data transfer is complete, the TCP connection will be closed. The active party to close the connection can be either the client or the server. Here, for example, there are four handshakes in the whole process:

  1. If the client requests to release the connection, it only means that the client no longer sends data.
  2. The server confirms that the connection is released, but there may still be data to process and send.
  3. When the server requests to release the connection and the server no longer needs to send data;
  4. Client confirms connection release;

Why four waves

TCP is two-way, requiring a request and an acknowledgement in each direction. There is no way to merge the second confirmation with the third because there is data transfer on the server after the second handshake.

Why did the active party wait for 2MSL

After sending the fourth acknowledgement packet, the client waits for 2MSL to close the connection. MSL refers to the maximum lifetime of a packet in the network. The goal is to make sure that the server receives the acknowledgment segment,

Suppose the server does not receive the fourth handshake. Imagine what happens. After the client sends packets of shaking hands for the fourth time, the server will wait for first, after a MSL, it found that more than the largest survival time data packets over the Internet, but haven’t received the packet, so the server that the packet has been lost, it decided to take the third handshake to give the client sends a packet, This packet takes up to one MSL to reach the client.

Therefore, after sending the fourth handshake packet, the client waits for 2MSL, which is a bottom-pocket mechanism. If no other packet segment is received within 2MSL, the client considers that the server has successfully received the fourth wave and the connection is officially closed.

Browser rendering

That’s the web request part. Now that the browser has the data, it’s up to the renderer to do the work. Browser rendering mainly completes the following tasks:

  1. DOM tree construction;
  2. Style calculation;
  3. Layout positioning;
  4. Layer layering;
  5. Layer drawing;
  6. Display;

Build a DOM tree

The structure of HTML files cannot be understood by browsers, so the first step is to change the tags in HTML into a structure that can be used by JS.

At the console, you can try printing a Document, which is the DOM tree parsed.

Style calculation

CSS files are also not directly understood by browsers, so CSS is first parsed into stylesheets. All three styles are parsed:

  • External CSS files referenced by link
  • <style>Styles within the tag
  • Element’s style property is embedded with CSS

Print Document. styleSheets in the console, and this is the parsed stylesheet.

Using this style sheet, we can calculate the style of each node in the DOM tree. It is called a calculation because each element inherits its parent element’s attributes.

<style>
    span {
        color: red
    }
    div {
        font-size: 30px
    }
</style>
<div>
    <span>Year after year</span>
</div>
Copy the code

For example, the year above not only accepts the style set by SPAN, but also inherits the style set by div.

The nodes in the DOM tree are styled and are now called rendered trees.

Why put CSS at the head and JS at the end of the body

In the process of parsing HTML, the characteristics of resources to be loaded are as follows:

  • CSS resources are downloaded asynchronously, and neither download nor parse will block the building of the DOM tree<link href='./style.css' rel='stylesheet'/>
  • JS resources are downloaded synchronously, and the download and execution block the building of the DOM tree<script src='./index.js'/>

Because of this feature, it is often recommended to place CSS stylesheets in the head and JS files at the end of the body to allow rendering to begin as early as possible.

Does CSS block HTML parsing

As mentioned above, page rendering is the task of the renderer process, which is subdivided into GUI rendering thread and JS thread.

Parsing HTML to generate DOM trees, parsing CSS to generate stylesheets, and then to generate layout trees and layer trees are all done by the GUI rendering thread. This thread can parse HTML and CSS at the same time. These two do not conflict, so CSS is also advocated in the header.

However, while the JS thread is executing, the GUI rendering thread has no way to parse the HTML, because JS can manipulate the DOM, which can cause conflicts if both are executed simultaneously. If JS to modify the style at this time, then the CSS parsing and JS execution can not be carried out at the same time, will wait for CSS parsing completed, then to execute JS, and finally to parse HTML.

From this perspective, CSS can block HTML parsing.

What is a preloaded scanner

The above mentioned external link resources, whether synchronous loading of JS or asynchronous loading of CSS, images, etc., will not start until the HTML is parsed to this tag, which seems not a very good way. In fact, since 2008, browsers have been gradually implementing preloading scanners: when they receive an HTML document, they scan the entire document and download CSS, JS, images and Web fonts in advance.

What is the difference between Async and defer when js scripts are introduced

Preloading scanners solve the problem of JS synchronous loading blocking HTML parsing, but we have not solved the problem of JS execution blocking HTML parsing. So you have the async and defer properties.

  • Without defer or Async, the browser loads and executes the specified script immediately
  • The async property means that asynchronous execution of incoming JavaScript, once loaded, begins execution
  • The defer property means that you defer executing the introduced JS until DOM parsing is complete

While async executes sequentially when loading multiple JS scripts, defer executes sequentially

What is the difference between preload and prefetch

The preload scanner was mentioned earlier, which preloads the resources needed for the page, but this feature only works for the outer chain of a particular script, and there is no way to give the important resources a higher priority as we would like, hence preload and Prefetch.

  1. Preload: loads resources for the current page with a high priority;
  2. Prefetch: Loads future resources for subsequent pages at low priority and only when idle;

Both preload and prefetch are loaded and not executed. If the preloaded resource is set to cache-control by the server, it will go to disk. Otherwise, it will only be stored in memory.

Specific use is as follows:

<head>
    <! -- File loading -->
    <link rel="preload" href="main.js" as="script">
    <link rel="prefetch" href="news.js" as="script">
</head>

<body>
    <h1>hello world!</h1>
    <! -- File file execution -->
    <script src="main.js" defer></script>
</body>

Copy the code

To ensure that resources are correctly preloaded, pay attention to the following:

  1. Preload resources should be used immediately on the current page. If a preloaded resource is executed without a script tag, a warning will appear in the console indicating that the preloaded resource is not referenced on the current page.
  2. The purpose of prefetch is to fetch resources that will be used in the future. Therefore, when the user jumps from page A to page B, preload resources in progress will be interrupted, but Prefetch does not.
  3. When preload is used, it must be combined with the AS attribute to indicate the priority of the resourceas="style"Attributes are given the highest priority,as ="script"Will get low or medium priority, other values that can be taken arefont/image/audio/video;
  4. Preload fontcrossoriginProperty, even if it is not cross-domain, that would otherwise be loaded repeatedly:
<link rel="preload href="font.woff" as="font" crossorigin>
Copy the code

In addition, these two preloaded resources can be set not only with HTML tags, but also with JS

var res = document.createElement("link"); 
res.rel = "preload"; 
res.as = "style"; 
res.href = "css/mystyles.css"; 
document.head.appendChild(res); 
Copy the code

And HTTP response headers:

Link: </uploads/images/pic.png>; rel=prefetch
Copy the code

Layout of the positioning

Now we have styles for the nodes in the render tree, but we don’t know where to draw them. So another layout tree is needed to determine the geometry of the elements.

The layout tree only takes visible elements from the render tree, meaning that the head tag, display: None elements are not added.

Layer hierarchical

Now we have the layout tree, but we still can’t start drawing directly. Before we do that, we need to layer and create a corresponding layer tree. The browser page is actually divided into many layers, which are superimposed to create the final page.

Because there are many complex effects on the page, such as some complex 3D transformations, page scrolling, or z-sorting using z-Index, we wanted to make it easier to implement these effects.

Not every node in the layout tree can generate a layer. If a node does not have its own layer, then the node is subordinate to the parent node’s layer

Generally, elements that satisfy either of the following two points can be promoted to a separate layer.

1. Elements with cascading context attributes will be promoted to a separate layer: elements with explicit position, elements defining opacity, and elements using CSS filter, etc., all have cascading context attributes.

2. Places that need clipping will also be created as layer overflow

In Chrome developer Tools: More Options – More Tools -Layers you can see how Layers are layered.

Layer to draw

After building the layer tree, it’s time to finally draw each layer. The Layers are first broken down into drawing instructions and arranged into a drawing list. In the Layers panel of the developer tool mentioned above, click the profiler in the detail to see the drawing list.

At this point, the main thread in the renderer, the GUI renderer thread, has completed all of its tasks and is then handed over to the compositor in the renderer.

The compositing thread then splits the viewport into graphics and converts the blocks into bitmaps.

At this point, the rendering process is done, and the generated bitmap is returned to the browser process for display on the page.

Performance tuning, what else can be done

This article does not focus on performance optimization, but merely adds some common tools to the topic.

Preparse, prerender

In addition to using preload and prefetch, you can also use DNS prefetch, Prerender, and Preconnect

  1. DNS Prefetch:DNS Prefetch.
 <link rel="dns-prefetch" href="//fonts.googleapis.com">
Copy the code
  1. Preconnect: Perform some operations before an HTTP request is formally sent to the server. This includes DNS resolution, TLS negotiation, and TCP handshake.
<link href="https://cdn.domain.com" rel="preconnect" crossorigin>
Copy the code

3. Prerender: Get all resources for the next page and render the entire page at idle time.

<link rel="prerender" href="https://www.keycdn.com">
Copy the code

Reduce backflow and redraw

Backflow is when the browser recalculates styles, layout positioning, layering, and drawing, and backflow is called rearrangement;

An operation that triggers backflow:

  • Add or remove visible DOM elements
  • The position of the element changes
  • The size of the element changes
  • The browser window size changes

Redraw is a re-pixelated drawing only, triggered when changes to the element style do not affect the layout.

Backflow = Compute style + Layout + Layer + draw; Redraw = draw. Therefore, reflux has a greater impact on performance

Backflow and repainting should be avoided as much as possible. Using GPU acceleration to modify style, for example, the transform/modify opacity/filters these attributes are not completed in the main thread, not redrawn, more will not return.

conclusion

After the whole process of “URL input to render” is covered, it is not difficult to find the answer to the first few tricky questions in the article:

  1. The browser will parse the input content and splice it into a complete URL, in which the parameters use UTF-8 encoding, that is, encodeURI and encodeURIComponent two functions commonly used in our development. EncodeURI is the encoding of the complete URL. EncodeURIComponent is the partial encoding of URL parameters, which has stricter requirements.
  2. The disk cache and memory cache of the browser are read from the disk and memory respectively. Generally, the page is refreshed directly from the memory, but the page is read from the disk after the TAB is closed.
  3. Prefetch is to load resources for subsequent pages at idle time with low priority; Preload, on the other hand, preloads the resources needed for the current page at a high priority.
  4. The async of the script means asynchronous loading and will be executed immediately after loading. Defer is asynchronous loading and will be executed after HTML parsing is completed.
  5. The TCP handshake is three times to ensure the survival of the client and prevent the waste of server resources. The TCP handshake is four times because TCP is a duplex communication, and connection release and reply are required once in each direction.
  6. HTTPS handshake is used to negotiate a symmetric key. The two parties send three random numbers to calculate the key known only to the two parties by using the three random numbers. All formal communication is encrypted with this key.

If this article is helpful to you, give me a thumbs up. It means a lot to me

(Click follow!)