What happens from entering the URL to rendering the page? What does the browser do

[TOC]

preface

Maybe every front-end engineer wants to understand how a browser works, but doesn’t know where to start, or seems to get it at some point, but isn’t clear enough. The next few articles reveal the mystery of how the browser works as pathetically as possible, from point to line, from phenomenon to essence.

We want to know what the browser does in the few seconds between entering a URL in the browser’s address bar and rendering the page. (Go through the stages to see what the browser does.)

1. What happens from entering the URL to rendering the page? 2. What happens from entering the URL to rendering the page? 3. What happens from entering the URL to rendering the page? 4. What happens from entering the URL to rendering the page? 5. What happens from entering the URL to rendering the page? (Page rendering stage)

We want to know the composition of the browser so that we can have a better understanding of how the components of the browser and how the process threads work together (from the perspective of the browser itself)
We want to know why the various code optimizations we often hear about actually work. (Understood the working principle of the browser, how to break through the limitation and bottleneck, optimize the page and improve the page performance)

URL request to web page rendering process

Let’s take a look at how the browser goes from entering the address to rendering the page:

The browser processes the input information (determine whether to search for the content or request the URL). 2. The browser constructs the request line information. 3. If no, perform the following steps. 4, the browser initiates a real request, resolves the server’s host name from the URL, for DNS query, and translates the host name into the server’s IP address. 5. The browser resolves the port number from the URL. The default port number is 80. 7. The browser establishes a TCP/IP connection with the server, shakes hands three times and disconnects four times. 8. The browser sends an HTTP request to the server over a TCP/IP connection, requesting data packets. 9. The server processes the HTTP request and returns a response. 10. The browser checks whether the HTTP response is a redirect (3XX result status code), a validation request (401), an error (4XX, 5XX), and so on, which needs to be handled in a case-by-case manner. 11. The browser receives the HTTP response and may either close the TCP connection or re-establish the connection with a new request and get a new response. 12. The browser receives the response data and caches it if the response header contains a cacheable identifier. 13. The browser parses the response and displays the HTML page. 14. The browser sends a request to obtain resources (such as CSS, JS, images, audio, and videos) embedded in HTML. 16. The page is completely rendered.

What happens from entering the URL to rendering the page? (Navigation process stage)

When the user input information, in the process of the browser UI thread to capture the input, if access to the site, the UI thread will start a network thread to request DNS to resolve domain, and then through the TCP/IP protocol link server to get data, if you input is not a site but a string of keywords, the browser to search, Use the default search engine for queries. In either case, the goal is to build the request row and move on to the next stage.

GET /index.html HTTP1.1
Copy the code

What happens from entering the URL to rendering the page? (Find cache phase)

After a user enters information, the browser processes the input, constructs the request line information, and searches the browser cache for requested resources before actually making a web request. Among them, browser caching is a technique for saving a copy of a resource locally for immediate use on the next request. When the browser discovers that the requested resource already has a copy in the browser cache, it intercepts the request, returns a copy of the resource, and ends the request without going to the source server for a new download. Of course, if the cache lookup fails, the actual network request process begins.

What is the browser caching mechanism? Browser caching is a mechanism that controls file caching through cache-Control (or Expires) and Etag (or Last-Modified) fields in the HTTP header. The browser communicates with the server in reply mode, that is, the browser initiates an HTTP request and the server responds to the request. After the browser receives the request result from the server for the first time, it decides whether to cache the result based on the cache identifier of the HTTP header in the response packet. If yes, the request result and cache identifier are stored in the browser cache.

The browser cache mechanism is implemented through these two cache phases, which can also be understood as two strategies: freshness and checksum to verify the validity of a locally cached copy.

Cache way	Form of resource acquisition	Status code	Send the request to the server
Strong cache phase	From the slow access	200 (from cache)	No, cache access directly
Negotiation cache phase	From the slow access	304 (not modified)	Yes, as the name suggests, the server tells you whether the cache is valid or not

So back to the problem, what happens when you find the cache phase? After the address is entered and the request line is built, the browser matches the cache based on the line information before the actual network request is made. 1. Enter the strong cache phase (search the local cache to check the freshness). Before sending a request, the browser determines whether the strong cache policy is matched according to the expires and cache-control fields in the request header. If the browser confirms that it has expired (strong cache invalidation), it enters the negotiation cache phase (verifying whether the file has been changed) and sends a real HTTP request (with last-Modified /Etag(if any) returned from the Last request). The server determines whether the resource file is updated based on the corresponding request header field (if-modified-since/if-none-match) and returns 304 or 200 to inform the browser of the validity of the locally cached copy. To determine whether 304 or 200 is returned to inform the browser of the validity of the locally cached copy.

Of course, not all HTTP requests go through the strong cache and negotiated cache phases, depending on the caching strategy used by the server.

What happens from entering the URL to rendering the page? (Network request stage)

The general process is as follows:

Querying DNS(Network)
Waiting for TCP queues (local)
Establishing a TCP Connection (network)
Send a request (network)
Waiting for a response (network)
Receiving data (network)
Disconnect (local)

The DNS query

In this process, the resource URL is obtained according to the constructed request line information, and the domain name field is extracted from the URL for DNS domain name resolution. Before you do this, learn about the DNS. The Domain Name System (DNS) associates server names with IP addresses. Devices on the Internet are looked up by IP addresses, and usually we enter an address URL, a form of URI(a string used to identify the name of an Internet resource), by protocol, domain name or IP address and the location of the resource catalog, so you need to access a DNS server to get the IP address. The message is then sent to the corresponding target.

The DNS resolution process is as follows: 1. Search the browser cache: the browser will cache the DNS information of websites visited within 2-30 minutes. Enter chrome://net-internals/# DNS in the address bar. Check the hosts file, which holds data about the domain names and IP addresses visited by websites – Windows platform ipconfig /flushdns to flush the DNS cache. You can also use the command ipconfig /displaydns to view the DNS cache contents. 4. Check ISP DNS cache: ISP DNS cache (local server cache), if not found 5. Recursive query: search IP addresses of the target domain names from the root DNS server to the TOP-LEVEL DNS server and then to the ultimate DNS server

Waiting for TCP queue

With today’s web design more and more cool, more and more rich functions. Accompanied by more and more web page loading resources, often a page loading CSS, JS, images, interfaces, more than dozens of hundreds. For client operating systems, excessive concurrency involves port numbers and thread switching overhead. For the server, sending all requests to the server at the same time is also likely to cause the server to be banned due to concurrency threshold control. Optimization results based on many factors: HTTP/1.1 has Keep Alive, which supports the reuse of existing connections. After the request is returned, the reuse of connection requests can be much faster. Therefore, the browser does not necessarily open a TCP connection for each resource to request loading, and the browser imposes a concurrent limit on the number of requests at the same time. Therefore, if a page with complex functions needs a lot of resources during parsing, the maximum number of concurrent requests of the browser is reached, and the subsequent resources can only be loaded after the TCP connection is released and reused by subsequent requests.

A TCP connection is established through a three-way handshake

In the CASE of HTTPS, HTTPS consists of HTTP + SSL/TLS, which adds a layer to process encrypted information on the basis of HTTP. All information transmitted between the server and client is encrypted through TLS

1. First handshake: Establish a connection. The client sends a connection request packet with the SYN position set to 1 and Sequence Number set to X.

2. Second handshake: The server receives a SYN packet. The server needs to confirm the received SYN packet segment by setting Acknowledgement Number as X +1(that is, Sequence Number +1). Meanwhile, it also needs to send SYN request information to the client by setting the SYN position as 1. If the Sequence Number is Y, the server adds the preceding two information to the same packet segment (that is, the SYN + ACK packet segment) and sends the packet to the client. Then, the server enters the SYN_RECV state.

3. The third handshake: The client receives (SYN + ACK packet segment) from the server, and then sets the Acknowledgement Number to Y +1 to send an ACK packet segment to the server. After sending this packet segment, both the client and the server enter the ESTABLISHED state to complete the TCP three-way handshake.

After a period of time, the TCP connection is disconnected by four waves of the hand

When an HTTP request reaches the server, the server processes it. Finally, the data is passed to the browser, which returns a web response. The browser has to decideConnectionField if contained in the request header or response headerConnection: Keep-Alive, indicating that a persistent connection is establishedTCPThe connection remains and is reused later by resources requesting the unified site. Otherwise disconnectedTCPConnect, request-response process ends.

What happens from entering the URL to rendering the page? (Analytical calculation stage)

If the content-Type value in the response header is text/ HTML, then it is time for the browser to parse and render. Specifically, browsers need to load and parse not only HTML, but also CSS, JS, and other media resources such as images and videos. The browser parses HTML to generate a DOM tree (if there is Javascript code in the parsing process, it is handed over to the Javascript engine for processing and returned to the DOM tree after processing), parses CSS to generate A CSS rule tree, and then generates a rendering tree through the DOM tree and THE CSS rule tree. Unlike DOM trees, render trees do not have nodes that do not need to be displayed, such as head or display none. Note that the browser parsing process is not string, such as in the resolution of CSS at the same time, you can continue to load parsing HTML, but in parsing JS script, will stop parsing HTML follow-up, the congestion will occur and related questions about JS blocked, but here, more behind the single opening.

The general process is as follows

HTML parsing, generate DOM Tree
The CSS is parsed to generate CSSOM
Combine DOM and CSSOM to generate the Render Tree

4. Iterate over the Render Tree to generate a Layout Tree

Parsing HTML to generate a DOM Tree

The DOM Tree definition: Cannot be directly understood by the browserHTML string, so turn this series of byte streams into a meaningful and easy to operate data structure, which isDOM Tree.DOM TreeIt’s essentially a todocumentIs the multi-fork tree of the root node.

DOM Tree build process 1, Conversion. The browser reads the raw bytes of HTML (the byte data of zeros and ones) from disk or the network and converts them to strings based on the specified encoding of the file (for example, UTF-8). 2, participle Tokeniser Convert strings into tokens, such as < HTML >, , etc. Tokens are identified as “start tag”, “end tag”, or “text”. 3. Lexical analysis Lexing A bunch of tokens from word segmentation are converted into node objects that define their properties and rules, respectively. 4. DOM construction. DOMTree is constructed by using the relationship between different Node objects

Note: The whole building process is not to wait for the Token conversion to complete before generating nodes to build a DOM Tree, but to consume the nodes as they are generated. This is easy to explain. Token generation may encounter , ,

Load and parse the CSS to generate the CSSOM

The browser interprets CSS code in four parts: the browser default basic style, external (link), imported (@import), and inline (e.g. Style =”display: None “). If a link flag is encountered, the browser immediately sends a request for the style file. Of course, we can also use inline or embedded styles directly to reduce requests; But you lose modularity and maintainability.

When calculating CSS rules, we will check which rules it matches on the built elements, and then overwrite and adjust the rules according to their priorities. And CSSOM is mainly a description of boxes on the DOM structure, which is basically attached to the DOM tree. CSS computing is the process of applying CSS rules to the DOM tree and adding display properties to the DOM structure. CSSOM has a rule section, which is constructed before the DOM starts, and a View section, which is built synchronously with the DOM

Note that when parsing the CSS rule tree, JS will pause until the CSS rule tree is ready.

JavaScript can read and modify not only DOM properties, but also CSSOM properties. Therefore, browsers delay JavaScript execution and DOM building when there are blocking CSS resources
CSS parsing blocks the rendering process, which means that the browser will not render any of the processed content until CSSOM is built.

Build the Render Tree

Render trees correspond to DOM elements, but not one to one. During this process, non-visual DOM elements, such as the “head” element, are not inserted into the rendering tree. At the same time, remove the display: None node from the DOM Tree, and create some elements that do not exist in the original DOM structure Tree, such as ::before, ::after, ::first-letter. These added contents will be displayed in a specific UI, which can be seen by the user. This content does not change the content of the document, does not appear in the DOM, cannot be copied, and is simply added to the CSS rendering layer.

Generate a Layout Tree

By traversing the information of rendering objects in the rendering tree, the position and size of each rendering object are calculated to generate a Layout tree

Layout tree generation generally works as follows:

Iterate over the generated DOM tree nodes (excluding those with display none) and add them toThe layout in the tree.
Computes the coordinate positions of nodes in the layout tree.

Based on the render tree layout, compute CSS styles, which are geometric information such as the size and position of each node in the page. HTML has a streaming layout by default, but CSS and JS break this layout by changing the look and feel of the DOM as well as its size and position. Two important concepts come up here: Repaint and Reflow.

Repaint: Part of the screen is redrawn without affecting the overall layout, for example, the background color of a CSS is changed, but the geometry and position of elements remain the same. Reflow: means that the geometry of the component has changed and we need to revalidate and evaluate the render tree. Part or all of the render tree has changed. That’s a Reflow, or Layout.

What happens from entering the URL to rendering the page? (Page rendering stage)

Generate the Layer Tree

After the browser builds the layout Tree, it also layers the specific nodes to create a Layer Tree. The order of drawing must be determined before it is displayed on the screen. So to determine the drawing order, the main thread traverses the Layout Tree, layering it and creating a drawing record table, which records the drawing order and generates the Layer Tree

It is rastered.

What is rasterization? The information that’s going to be drawn is converted into pixels and displayed on the screen and what is composition? Divide several parts of the page into several layers, rasterize them separately and synthesize the page separately in the synthesizer thread. The whole process is as follows:

When the Layout tree is generated and the drawing order is determined, the main thread passes this information to the synthesizer thread, which rasterizes each layer
Since a layer can be as large as a page, the synthesizer thread splits them into blocks and sends them to the raster thread
The raster thread rasters each graph block and stores it in GPU memory

4. After the completion of rasterization, synthesizer thread generates synthesizer frames according to these blocks, which are transmitted to the browser process through IPC. The browser process hands the synthesizer frame to the GPU, which then renders it to the screen

conclusion

This article begins and ends with an answer to a question that involves too much knowledge to discuss in detail.

Q: What happens from entering the URL to rendering the page? A: The whole process can be roughly divided into five stages

Navigational flow stage
Check the cache phase
Network request phase
Analytical calculation stage
Page rendering phase

For reasons of space, there are two more questions that will be discussed next

We want to know the composition of the browser so that we can better understand how the components of the browser and how the threads of the process work together
We want to know why the various code optimizations we often hear about actually work.

expand

Still agonizing over optimizing web rendering performance? HTTP cache