Browser principles from user input to page rendering

First of all, this article is a summary of my notes on how browsers work and practice. I think this article is full of dry goods, and it is necessary to understand the process, so I share it (full text of more than 8,500 words )

Below is the schematic diagram of the complete process from entering URL to page display:

As you can see, the process requires coordination between processes, so before we start the formal process, let’s take a quick look at the main responsibilities of the browser process, renderer process, and web process.

Browser processes are responsible for user interaction, child process management, and file storage.
Web processes are web downloads for renderers and browser processes.
The rendering process is mainly responsible for parsing HTML, JavaScript, CSS, images and other resources downloaded from the Web into pages that can be displayed and interacted with. Because all the contents of the renderer process are obtained through the network, there will be some malicious code to exploit browser vulnerabilities to attack the system, so the code running in the renderer process is not trusted. This is why Chrome makes the rendering process run in a security sandbox, just to keep the system safe.

User input

When a user enters a query keyword in the address bar, the address bar determines whether the entered keyword is the search content or the requested URL.
If the entered content complies with URL rules, for example, baidu.com is entered, the address bar combines the content with the protocol according to the rules to synthesize a complete URL, such as www.baidu.com/.

When the user enters the keyword and enters Enter, it means that the current page is about to be replaced with a new page, but before the process continues, the browser also gives the current page the opportunity to perform a beforeUnload event, which allows the page to perform some data cleaning before exiting. Users can also be asked if they want to leave the current page, for example if the current page may have unfinished forms, so users can unnavigate by using the beforeUnload event without the browser doing any subsequent work.

The current page does not listen to the beforeUnload event or agrees to continue the process, so the browser enters the state shown below:

As you can see from the figure, the icon on the TAB page enters the loading state as soon as the browser starts loading an address. But at this time the page in the figure is still displayed before the open page content, and did not immediately replace the Baidu page. The content of the page will not be replaced until the submission stage.

URL Request process

Next, you enter the page resource request process. In this case, the browser process sends the URL request to the network process through interprocess communication (IPC). After receiving the URL request, the network process initiates the actual URL request process here. So what’s the process?

First, the network process looks up whether the local cache has cached the resource. If there is a cached resource, it is returned directly to the browser process. If the resource is not found in the cache, the network request flows directly. The first step before the request is to perform a DNS resolution to get the server IP address for the requested domain name. If the request protocol is HTTPS, you also need to establish a TLS connection

The next step is to establish a TCP connection with the server using the IP address. After the connection is established, the browser side will construct the request line, request information, etc., and attach the data related to the domain name, such as cookies, to the request header, and then send the constructed request information to the server.

After receiving the request information, the server generates response data (including response line, response header, and response body) based on the request information and sends it to the network process. After the network process receives the response line and header, it parses the contents of the header. (For the sake of illustration, I refer to the response headers and response rows returned by the server as response headers below.)

(1) Redirect

Upon receiving the response header from the server, the network process begins to parse the response header. If the status code returned is 301 or 302, the server needs the browser to redirect to another URL. The network process reads the redirected address from the Location field in the response header, then initiates a new HTTP or HTTPS request and starts all over again.

During the navigation, if the status code in the response line of the server contains information such as 301 or 302, the browser redirects to the new address to continue the navigation. If the response line is 200, then the browser can continue processing the request.

(2) Response data type processing

After processing the jump information, we continue the analysis of the navigation flow. The data type of the URL request, sometimes a download type, sometimes a normal HTML page, so how do browsers distinguish between them?

The answer is content-type. The Content-Type is a very important field in the HTTP header that tells the browser what Type of response body data the server is returning. The browser then uses the value of the Content-Type to decide how to display the response body Content.

Note that if the content-type is incorrectly configured on the server, such as setting the text/ HTML Type to application/octet-stream, the browser may misinterpret the file’s Content, for example, by turning a page intended for presentation, It becomes a download file.

Therefore, the subsequent processing flow of different Content-Types is quite different. If the value of the Content-Type field is determined by the browser to be a download Type, the request is submitted to the browser’s download manager and the navigation of the URL request ends. But if it’s HTML, the browser will continue with the navigation process. Since Chrome’s page rendering runs in the render process, the next step is to prepare the render process.

Prepare the render process

By default, Chrome assigns a render process to each page, meaning that a new render process is created for each new page opened. However, there are some exceptions, in some cases the browser will allow multiple pages to run directly in the same render process.

When can multiple pages be running in a render process at the same time?

To solve this problem, we need to understand what same-site is. Specifically, we define “same site” as the root domain (for example, geekbang.org) plus the protocol (for example, https:// or http://), plus all the subdomains and different ports under the root domain.

Chrome’s default strategy is one render process per TAB. However, if a new page is opened from one page and belongs to the same site as the current page, the new page will reuse the parent page’s rendering process. Officially, this default policy is called process-per-site-instance.

To summarize, the rendering process strategy used to open a new page is:

Typically, a separate rendering process is used to open a new page; If page B is opened from page A, and A and B belong to the same site, then page B reuses the rendering process of page A;
If otherwise, the browser process creates a new renderer for B.

Once the renderer process is ready, it cannot immediately enter the document parsing state because the document data is still in the network process and has not been submitted to the renderer process, so the next step is to submit the document.

Submit the document

Submitting a document means that the browser process submits the HTML data received by the web process to the renderer process. The process looks like this:

First, when the browser process receives the response header data from the network process, it sends a “submit document” message to the renderer process.
After receiving the “submit document” message, the renderer process establishes a “pipeline” with the network process to transfer data.
After the document data transfer is complete, the renderer process returns a “confirm submit” message to the browser process.
After receiving the “confirm submission” message, the browser process updates the browser interface status, including the security status, the URL of the address bar, the historical status of forward and backward, and the Web page.

When the browser process confirms the submission, the update content is shown below

This explains why, when you type an address into your browser’s address bar, the previous page doesn’t disappear immediately, but instead takes a while to load before the page is updated.

At this point, a complete navigation flow is “gone”, after which it is time to enter the rendering phase.

Rendering phase

This stage is very important, understand its related processes can allow you to “see through” page works, with this knowledge, you can solve a series of related problems, such as able to skillfully use the developer tools, because can understand the meaning of the developer tools inside most of the projects, to optimize the page caton problem, use JavaScript to optimize the animation process, Optimize style sheets to prevent forced synchronization of layouts, and so on.

Since it’s so powerful, let’s talk about the rendering process.

As you can see, the input on the left side is HTML, CSS, JavaScript data, which is processed by the intermediate rendering module and finally output as pixels on the screen. In the middle is the render module

The rendering mechanism is too complex, so the rendering module is divided into many sub-stages during execution. The input HTML passes through these sub-stages, and finally the output pixels. We call such a process a rendering pipeline, and its general flow is shown in the following figure:

According to the chronological order of rendering, the pipeline can be divided into the following sub-stages:DOM tree building, style calculation, layout stage, layering, drawing, chunking, rasterization, and composition. As you go through each stage, you should focus on the following three things:

At the beginning each sub-stage has its input;
Then each sub-stage has its own process;
Eventually each sub-stage generates output.

Understanding these three parts will give you a clearer understanding of each sub-stage.

Build a DOM tree

Why build a DOM tree? This is because HTML cannot be understood and used directly by browsers, so you need to transform the HTML into a structure that browsers can understand — a DOM tree.

As you can see from the figure, the input for building a DOM tree is a very simple HTML file, which is then parsed by an HTML parser and finally outputs the DOM as a tree structure.

DOM is almost identical to HTML content, but unlike HTML, DOM is stored in an in-memory tree structure that can be queried or modified using JavaScript.

Ok, so now that we have generated the DOM tree, we still don’t know the style of the DOM node. To get the DOM node to have the correct style, we need to do style calculation.

Recalculate Style

The purpose of style calculation is to calculate the specific style of each element in the DOM node, which can be roughly divided into three steps.

1. Convert CSS to a structure that browsers can understand

Just like HTML files, browsers don’t understand the CSS styles of plain text directly, so when the rendering engine receives CSS text, it performs a conversion operation that transforms the CSS text into styleSheets that browsers can understand.

2. Transform property values in the stylesheet to standardize them

Now that we’ve converted the existing CSS text into a structure that browsers can understand, it’s time to standardize the property values.

To understand what attribute value normalization is, you can look at CSS text like this:

body { font-size: 2em }
p {color:blue; }span {display: none} 
div {font-weight: bold} 
div p {color:green; }div {color:red; }
Copy the code

As you can see in the CSS text above, there are many attribute values, such as 2em, Blue and bold. These values are not easily understood by the rendering engine, so you need to convert all values into standardized computed values that the rendering engine can easily understand. This process is called attribute value standardization.

3. Calculate the specific style of each node in the DOM tree

Now that the style properties have been standardized, you need to calculate the style properties for each node in the DOM tree. How?

This is where CSS inheritance and cascading rules come in.

The first is CSS inheritance. CSS inheritance is when each DOM node contains the style of its parent node. While this may seem a bit abstract, we can see how the following styles are applied to DOM nodes in combination with a concrete example.

body { font-size: 20px } 
p {color:blue; }span {display: none} 
div {font-weight: bold;color:red} 
div p {color:green; }Copy the code

The final effect of the stylesheet applied to the DOM node is shown below:

As can be seen from the figure, all child nodes areinheritanceThe parent node style. For example, if the font-size property of the body node is 20, then all nodes below the body node will have a font-size of 20.

The second rule in style calculation is style cascade. Cascading is a fundamental feature of CSS. It is an algorithm that defines how to combine property values from multiple sources. It is at the heart of CSS, which is highlighted by its full name, cascading style sheets. I won’t go into too much detail here about the specific rules of cascading.

In short, the purpose of the style calculation stage is to calculate the specific style of each element in the DOM node. In the calculation process, two rules of CSS inheritance and cascading need to be observed. The final output of this phase is the style of each DOM node, stored in the ComputedStyle structure.

The layout phase

Now, we have the DOM tree and the styles of the elements in the DOM tree, but that’s not enough to display the page because we don’t yet know the geometry of the DOM elements. The next step is to figure out the geometry of the visible elements in the DOM tree, a process we call layout.

Chrome performs two tasks in the layout phase: creating a layout tree and calculating the layout.

Creating a layout tree

You may have noticed that the DOM tree also contains many invisible elements, such as the head tag and elements that use the display: None attribute. So, before displaying, we also build an additional layout tree with only visible elements.

As you can see from the figure above, all invisible nodes in the DOM tree are not included in the layout tree. To build the layout tree, the browser basically does the following:

Iterate through all visible nodes in the DOM tree and add these nodes to the layout tree;
Invisible nodes are ignored by the layout tree, such as everything under the head tag, or the body.p.pan element, which is not included in the layout tree because its attribute contains dispaly: None.

Layout calculation

Now we have a complete layout tree. The next step is to calculate the coordinate positions of the nodes in the layout tree. The layout calculation process is very complicated, so we’ll skip it here.

When a layout operation is performed, the result of the layout operation is written back into the layout tree, so that the layout tree is both the input and the output. This is an unreasonable place in the layout phase, where the input and output are not clearly distinguished. To address this problem, the Chrome team is refactoring the layout code. The next generation of layout system, called LayoutNG, attempts to separate input and output more clearly, thus making the newly designed layout algorithm simpler.

layered

Now that we have the layout tree and the exact location of each element calculated, is it time to start drawing the page?

Again, the answer is no.

Because there are many complex effects on the page, such as complex 3D transformations, scrolling, or z-indexing, the rendering engine will need to generate a LayerTree for each node to make it easier to achieve these effects. If you are familiar with Photoshop, you will easily understand the concept of layers, which are added together to form the final page image.

To visualize what Layers are, open Chrome’s Developer Tools and select the Layers TAB to visualize the Layers on your page.

The rendering engine divides the page into layers, which are stacked together in a certain order to form the final page.

The relationship between layer and layout tree nodes is shown in the figure:

In general, not every node in the layout tree contains a layer, and if a node has no corresponding layer, then the node is subordinate to the layer of the parent node. If the SPAN tags in the image above do not have their own layer, they are subordinate to their parent layer. Eventually, however, each node is directly or indirectly subordinate to a layer.

So what criteria does the rendering engine need to meet to create a new layer for a particular node? Generally, elements that satisfy either of the following two points can be promoted to a separate layer.

Elements with cascading context attributes are promoted to a separate layer
Places that need to be clipped will also be created as layers.

First, elements with cascading context attributes are promoted to separate layers.

A page is a two-dimensional plane, but a cascading context gives a three-dimensional concept to HTML elements that are distributed along the Z-axis perpendicular to the two-dimensional plane in terms of their attribute priorities. You can use the following image to get a feel for it:

As you can see from the figure, elements with explicitly positioned attributes, elements with transparent attributes defined, elements with CSS filters, and so on, all have cascading context attributes.

Secondly, the places that need to be clipped will also be created as layers.

But first you need to understand clipping, combined with the following HTML code:

<style>
      div {
            width: 200;
            height: 200;
            overflow:auto;
            background: gray;
        } 
</style>
<body>
    <div >
        <p>So elements that have the properties of a cascading context or need to be clipped can be promoted to a separate layer, as you can see below:</p>
        <p>From the figure above, we can see that the Document layer has A and B layers, and the B layer has two more layers. These layers are organized into a tree structure.</p>
        <p>The LayerTree is created based on the layout tree. To find out which elements need to be in which layers, the rendering engine iterates through the layout tree to create the Update LayerTree.</p> 
    </div>
</body>
Copy the code

In this case, we limit the size of the div to 200 pixels by 200 pixels. The div contains a lot of text, and the text must be displayed in more than 200 pixels by 200 pixels. At this point, clipping occurs. The following is the run-time result:

When this clipping happens, the rendering engine creates a separate layer for the text section, and if the scroll bar appears, the scroll bar is promoted to a separate layer. (In fact, THE second point of stratification I did not go into the depth of )

Layer to draw

After building the layer tree, the rendering engine will render each layer in the tree, so how does the rendering engine render layers?

Imagine if you were given a piece of paper and told to color the background blue, then draw a red circle in the middle, and then a green triangle on top of that circle. How would you do that? Normally, you’ll break your drawing into three steps: draw a blue background; Draw a red circle in the middle; Draw a green triangle on the circle.

The rendering engine implements layer drawing in a similar way, breaking a layer’s drawing into smaller instructions, which are then sequentially assembled into a list of instructions to draw, as shown below:

As can be seen from the figure, the instructions in the draw list are actually very simple. They are asked to perform a simple drawing operation, such as drawing a pink rectangle or a black line. Drawing an element usually requires several drawing instructions, because each element’s background, foreground, and borders require separate instructions to draw. So in the layer drawing phase,The output is these lists to draw.

At this point, the draw list is in place to keep track of the order and instructions to draw, while the actual drawing is done by the compositing thread in the rendering engine

Raster operation

A draw list is simply a list of draw orders and draw instructions that are actually done by the compositing thread in the rendering engine. You can see the relationship between the render main thread and the composition thread in the following image:As shown above, when the drawing list of layers is ready, the main thread commits the drawing list to the composition thread. How does the composition thread work next?

Usually a page may be large, but the user can only see part of it. We call the part of the page that the user can see the viewport.

In some cases, some layer can be very big, such as some pages you use the scroll bar to scroll to scroll to the bottom for a long time, but through the viewport, users can only see a small portion of the page, so in this case, to draw out all layer content, will generate too much overhead, but also it is not necessary.

For this reason, the composition thread will divide the layer into tiles, which are usually 256×256 or 512×512

The composition thread then prioritizes bitmap generation based on the blocks near the viewport, and the actual bitmap generation is performed by rasterization. Rasterization refers to the transformation of a map block into a bitmap. The graph block is the smallest unit for rasterization. The renderer maintains a rasterized thread pool, where all rasterization is performed, as shown below:

Generally, GPU is used to accelerate the generation of bitmaps during rasterization. The process of using GPU to generate bitmaps is called fast rasterization, or GPU rasterization. The generated bitmaps are stored in GPU memory.

GPU operation is run in GPU process, if rasterization operation uses GPU, then the final bitmap generation operation is completed in GPU, which involves cross-process operation. You can refer to the picture below for the specific form:

As can be seen from the figure, the rendering process sends the instruction of generating map blocks to GPU, and then executes the bitmap of generating map blocks in GPU, which is saved in GPU memory.

Composition and display

Once all the tiles have been rasterized, the composition thread generates a command to draw the tiles — “DrawQuad” — and submits the command to the browser process.

The browser process has a component called viz that receives DrawQuad commands from the compositing thread, draws its page contents into memory, and displays them on the screen.

At this point, through this series of stages, the HTML, CSS, JavaScript, etc., written by the browser will display a beautiful page.

Rendering assembly line summary

Now you’ve analyzed the entire rendering process, from HTML to DOM, style calculation, layout, layering, drawing, rasterization, composition, and display. Here’s a diagram to summarize the entire rendering process:

Combined with the above image, a complete rendering process can be summarized as follows:

The renderer transforms the HTML content into a readable DOM tree structure.
The rendering engine translates CSS styleSheets into styleSheets that browsers can understand, calculating the style of DOM nodes.
Create a layout tree and calculate the layout information for the elements.
Layer the layout tree and generate a hierarchical tree.
Generate a draw list for each layer and submit it to the composition thread.
The composite thread divides the layer into blocks and converts the blocks into bitmaps in the rasterized thread pool.
The composite thread sends the DrawQuad command to the browser process.
The browser process generates the page from the DrawQuad message and displays it on the monitor.

conclusion

When the user enters a URL, the browser will judge whether it is a search or a WEB address based on the information the user enters. If it is a search content, the search content + the default search engine will synthesize a new URL. If the content entered by the user complies with the URL rules, the browser adds the protocol to the content to synthesize a valid URL
After entering content, the user presses enter. The loading state is displayed in the navigation bar of the browser, but the previous page is displayed. This is because the response data of the new page has not been obtained
Browser process The browser builds the request line and sends the URL request to the network process via interprocess communication (IPC)
Network process to obtain the URL, first to the local cache to find whether there is a cache file, if there is, intercept the request, directly 200 return; Otherwise, enter the network request process
The network process requests the DNS to return the IP address and port number corresponding to the domain name. If the DNS data caching service has cached the current domain name information before, the DNS data caching service directly returns the cached information. Otherwise, a request is made to obtain the IP address and port number resolved based on the domain name. If there is no port number, the default value is 80 for HTTP and 443 for HTTPS. If the request is HTTPS, you need to establish a TLS connection.
A TCP three-way handshake establishes a connection, and the HTTP request is sent down with a TCP header that includes the source port number, the destination port number, and the serial number used to verify data integrity
The network layer adds an IP header to the packet, including the source and destination IP addresses, and continues down to the bottom layer
The underlying layer is transmitted to the destination server host over the physical network
The destination server host network layer receives the packet, parses the IP header, identifies the data part, and transmits the unwrapped packet up to the transport layer
Destination server The host transport layer obtains the packet, parses the TCP header, identifies the port, and forwards the unwrapped packet to the application layer
The application layer HTTP parses the request header and request body. If a redirect is required, HTTP directly returns the status code301 or 302 of the HTTP response data. At the same time, attach the redirect address in the Location field of the request header, and the browser will redirect the operation according to code and Location. If the resource is not redirected, the server will determine whether the requested resource is updated based on the if-none-match value in the request header. If the resource is not updated, the server will return the 304 status code, which tells the browser that the previous cache is still available. Otherwise, the new data, the status code of 200, is returned, and if you want the browser to Cache the data, a field is added to the corresponding header: cache-Control: max-age =2000, and the response data is returned to the network process in the order of application layer – transport layer – network layer – transport layer – application layer
When data transfer is complete, TCP waves four times to disconnect the connection. If the browser or server adds Connection: keep-alive to the HTTP header, the TCP Connection is kept Alive. Maintaining a TCP connection saves the time for establishing a connection next time and speeds up resource loading
The network process will parse the data packets obtained and determine the type of response data according to the Content-Type in the response header. If the type is byte stream, the request will be handed over to the download manager. The navigation process ends and is no longer carried out. If it is text/ HTML, the browser process is notified that it has the document ready to render
The browser process gets A notification and, depending on whether the current page B is opened from page A and whether it is the same site as page A (the root domain and protocol are considered the same site), reuses the previous page process if the above conditions are met, otherwise, A separate renderer process is created
The browser sends a “submit document” message to the renderer process. After receiving the message, the renderer process and the network process will establish a “pipeline” to transfer data. After the document data transfer is complete, the renderer process will return a “confirm submission” message to the browser process
After the browser receives the “submit confirmation” message, it updates the browser’s page status, including the security status, URL of the address bar, historical status of forward and backward, and updates the Web page. At this time, the Web page is blank, and the rendering process parses the document and loads sub-resources
Build the DOM tree. The input is a very simple HTML file, which is parsed by an HTML parser and finally outputs the DOM as a tree structure.
Recalculate Style. This stage can be roughly divided into three steps to complete.
- Convert CSS to a structure that browsers can understand. When the rendering engine receives CSS text, it performs a conversion operation that converts CSS text to a structure that browsers can understand — styleSheets.
- Convert the property values in the stylesheet to normalize such as 2em blod Color: Blue
- To calculate the specific style of each node in the DOM tree, two rules of CSS inheritance and cascading need to be followed in the calculation process. The final output of this phase is the style of each DOM node, stored in the ComputedStyle structure.
Layout stage (Create layout tree) The layout stage requires two tasks: create the layout tree and calculate the layout.
Layering Layering the layout tree and generating a layering tree. Because there are many complex effects on the page, such as complex 3D transformations, scrolling, or z-indexing, the rendering engine will need to generate a LayerTree for each node to make it easier to achieve these effects. Not every node in the layout tree contains a layer. If a node has no corresponding layer, then the node is subordinate to the parent node’s layer.
- Elements with cascading context attributes are promoted to a single layer: any element with explicit location attributes, any element with transparent attributes, any element with CSS filters, etc., all have cascading context attributes
- The places to clip will also be created as layers. For example, if the size of a DIV is limited to 200 * 200 pixels, and there is a lot of text in the div, the area displayed by the text will definitely be more than 200 * 200 pixels, then the clipping will occur
Layer drawing generates a drawing list for each layer and submits it to the composition thread.
The composite thread divides the layer into blocks and converts the blocks into bitmaps in the rasterized thread pool.
(Once all the tiles have been rasterized) The composition thread sends the DrawQuad command to the browser process.
The browser process generates the page from the DrawQuad message and displays it on the monitor