Introduction to the browser process
The concept of processes and threads
Process: A process is a running instance of a program. When a program is started, the operating system creates a block of memory for the program that holds the code, running data, and a main thread to perform the task.
Thread: Threads cannot exist in isolation and need to be started and managed by a process.
Process and thread characteristics:
- Failure of any thread in the process causes the entire process to crash.
- Data in a process is shared between threads.
- When a process is shut down, the operating system reclaims the memory used by the process, even if one of the threads leaks memory due to improper execution and the process exits.
- The content between processes is isolated from each other, using an inter-process communication (IPC) mechanism.
The history of the browser
The era of single-process browsers
All the functional modules of the browser run in the same process, including the network, plug-ins, JS runtime environment, rendering engine, and page. The problem:
- Instability: An unexpected crash of a plugin can cause the entire browser to crash.
- Not smooth: all the page rendering modules, JS execution environment and plug-ins are running in the same page thread, and only one module can execute at a time.
- Unsafe: Plugins can be written using C/C++ and other code. Plugins can access any resources of the operating system. Running plugins means that the plugins can fully operate your computer. Page scripts can also cause security problems by obtaining system permissions through browser vulnerabilities.
The age of multi-process browsers
In earlier multi-process architectures, pages ran in separate rendering processes and plug-ins were separate plug-in processes that communicated with each other via IPC. This solves the problem of a single process: the processes are isolated from each other, do not interfere with each other, and do not block each other, and the plug-in and rendering processes use a secure sandbox, where programs run and do not read or write data to the hard disk.
The latest Chrome process architecture diagram:
The options are as follows: One Browser main process, one GPU process, one NetWork process, multiple rendering processes, and multiple plug-in processes. Because the rendering and plug-in processes need to run user code, they are sandboxed for safety. Functions of each process:
- Browser process. Mainly responsible for interface display, user interaction, child process management, while providing storage and other functions.
- Render process: The core task is to convert HTML, CSS, and JavaScript into web pages that users can interact with. Blink and JavaScript engine V8 both run in this process. By default, Chrome creates a render process for each Tab.
- GPU process: At first, in order to achieve the effect of 3D CSS, then the Web page and Chrome UI are drawn by GPU.
- Network process: mainly responsible for the page network resources loading.
- Plug-in process: It is mainly responsible for the running of plug-ins. Because plug-ins are prone to crash, you need to isolate them by plug-in process to ensure that the crash of plug-in process does not affect browsers and pages.
Disadvantages of the multi-process model:
- Higher resource usage. Because each process contains a copy of the common infrastructure (such as the JavaScript runtime environment), this means the browser consumes more memory resources.
- More complex architecture. Problems such as high coupling between browser modules and poor scalability make it difficult for existing architectures to adapt to new requirements.
The rendering process strategy for the new page
Specifically, the same site with a connection (same protocol, same root domain name) will share a rendering process. To connect two tabs, do the following:
- Use the A label in the A TAB page as follows
<a href="./b.html" target="_blabk"> B </a>
Open TAB B, and you can access A’s window through window.opener in TAB B. - The TAB opened by the window.open() method.
These connected tabs are called a browser context group, and Chrome assigns tabs belonging to the same site to the same rendering process. But there are exceptions:
- The ref attribute of the A label is as follows
<a target="_blank" ref="noopener noreferrer">
, where noopener sets the opener value of the newly opened TAB page to null, and noreferrer indicates that the newly opened TAB page should not be referenced. This makes the newly opened TAB not belong to the same browsing context group as the current TAB. - If the iframe in the current TAB and the TAB belong to different sites, the iframe will also run in a separate rendering process.
Navigation process
The process from the time the user makes the URL request to the time the page begins parsing is called navigation.
From the input URL to the page display flow:
- User input: After the user enters the content in the address bar, the address bar determines whether it is the search content or the requested URL. If it is the search content, the default search engine is used to synthesize the URL with the search keyword. If it is a URL, the complete URL will be synthesized according to the rules and protocols. The browser starts to load, and the icon of the TAB page enters the loading state. At this time, the page content is the same as before.
- Request resource: The browser process sends the URL request to the network process through IPC, and the network process initiates the real request flow.
- Search cache: If yes, return resources to the browser process directly, if no, enter the network request process.
- Establish connection: Perform DNS resolution to obtain the IP address, establish TCP connection and TLS connection (if using HTTPS).
- Initiate a request: After the connection is established, build the request line, request and other information, and attach the relevant data such as cookies to the request header, and then send the built request information to the server.
- Receive response: The network process receives the response from the server and begins to parse the response. If the status code is found to be 301 or 302, the redirect address is read from the Location field in the response header and a new request is made.
- Processing response data Type: Determine the data Type of the response body based on the Content-type. Application/OCTEt-STREAM indicates that the data is a byte stream Type and is processed as a download Type. The request is submitted to the browser’s download manager and the navigation process ends. If the type is TEXT/HTML, the render preparation process starts.
- Prepare rendering process: The browser process learns from the network process that it is an HTML type and selects or creates a rendering process for the request.
- Submit document: The browser process sends a “submit document” message to the rendering process, and the rendering process receives the message and establishes a “pipeline” with the network process to transmit the data. After the data transmission of the response body is completed, the rendering process will return a “confirm submission” message to the browser process, and the browser process will update the interface status: security status, URL of the address bar, forward and backward historical status, and update the page.
After confirming the submission, the navigation process ends and the rendering process begins.
Rendering process
The rendering process will first create a blank page, called parsing white screen. Page parsing and subresource loading begin.
Parse HTML — Produce a DOM tree
Since the browser doesn’t understand HTML, you need to parse the HTML into a structure that the browser understands — a DOM tree. DOM is a tree structure stored in memory. DOM is the basic data structure for page generation. It provides a set of interfaces for JavaScript scripts to query and change the structure, style and content of documents, and is also a line of security protection.
There is an HTML parser module inside the rendering engine that converts the HTML byte stream into a DOM structure. The parsing process is progressive, with the HTML parser parsing as much data as the network process loads. The rendering process dynamically receives the byte stream from one end of the data pipeline previously established with the network process and parses it into the DOM.
The parsing process
-
The byte stream is converted to Token by the word separator, which is divided into Tag Token and text Token.
-
The Token is resolved into a DOM node, and then the node is added to the DOM tree. The HTML parser maintains a stack of tokens, pushing StartTag tokens and creating a DOM node to add to the DOM tree (the HTML parser initially defaults to creating an empty DOM structure with root document). If it is a text Token, generate a text node and add it to the DOM tree. If it is an EndTag Token, check whether the top of the stack is the StartTag Token of the same tag. If it is, the stack is outloaded.
The impact of JavaScript and CSS on DOM building
When a script tag is encountered during parsing, the HTML parser pauses and the JavaScript engine steps in to execute the script, at which point you can only access the DOM that has already been built on top of the script. Accessing the next element returns NULL, and the operation is performed with an error. The script executes, and the HTML parser resumes parsing.
If a script is not embedded, but is loaded via SRC (a script with a SRC attribute ignores the code inside the tag), it will wait until the download is complete and execute, while the HTML parser stays paused. The good news is that Chrome’s pre-parsing operation will start a pre-parsing thread after the rendering engine receives the byte stream to analyze the JavaScript, CSS, and other related files contained in the HTML file and download them in advance.
Adding the async attribute to the script tag that has the SRC attribute to load the script indicates that the script is asynchronous. When the script is loaded in the background, the HTML parser continues to work, and as soon as the script is loaded, the HTML parser (if not already parsed) is interrupted to execute the script. Adding the defer property will also load asynchronously, but wait until the HTML is parsed and executed before the DOMContentLoaded event. These two properties do not take effect on SRC – free scripts.
CSS does not block the DOM tree because it is not directly involved in building the DOM. However, if the page has a script that is executed synchronously, the rendering engine simply assumes that the script will rely on CSSOM in order to avoid the possibility of script execution errors, because it is not known whether the script will operate CSSOM before execution. You download the CSS file, parse it into CSSOM, and execute the script. The HTML parser is paused until the script is finished.
CSS does not block DOM tree construction when the page is unscripted:
CSS blocks DOM tree construction when pages have synchronous scripts:
When an asynchronous script exists on a page, the DOM tree will continue to be built. Before the script is executed, wait for the CSS. Whether to continue rendering depends on whether the CSS is ready, or wait if the CSS is still loading.
-
If it is the defer script, the CSS will be rendered as soon as it is loaded, but the DOMContentLoaded and Load events will be delayed.
-
If it is an async script, DOMContentLoaded fires normally, the CSS starts rendering after loading, and the load event waits for the script to execute before firing:
Whether CSS blocks the DOM tree or not depends on the timing of the script execution.
Two Style calculation (Recalculate Style) – output ComputedStyle
Style calculation is to calculate the specific style of each element in the DOM node. This stage can be roughly divided into three steps:
- Build CSSOM. There are three sources of CSS: external CSS referenced through the link tag, style tags, and the style attribute of elements. Since browsers can’t understand plain text CSS directly either, the rendering engine needs to first convert it into a browser-understandable structure, CSSOM (accessible via document.stylesheets). Eventually styles from all the different sources are included in styleSheets. CSSOM provides an interface for JavaScript to manipulate style sheets and provides basic style information for composition of layout trees.
- Standardized style attribute values: Convert values such as 2em, Blue, bold, etc., into standardized computed values that are easily understood by the rendering engine. This resolves to standard values like 32px, RGB (255,0,0), and 700.
- Compute the specific style of each node in the DOM tree: Use two rules:
- Inheritance: Each DOM node contains attributes that can be inherited from its parent, such as the font-size of the body.
- Cascade: Defines an algorithm that combines attribute values from multiple sources, with the results stored in a ComputedStyle structure.
Layout — Output Layout tree
This stage is used to calculate the geometric positions of visible elements in the DOM tree. There are two steps:
- Create a layout tree: Walk through the DOM tree and build a layout tree with all visible nodes. Elements that are not visible, such as head or display: None, will be filtered out. The style query for this procedure uses ComputedStyle.
- Layout calculation: calculate the coordinates of nodes in the layout tree, and write the results of layout calculation back to the layout tree. The Chrome team is refactoring the layout code. The next generation of the layout system, called LayoutNG, will try to separate the input and output and simplify the layout algorithm.
Update Layer Tree – Produces a Layer Tree
Because the page may contain many complex effects such as 3D transformation, page scrolling, z-index ordering, etc., in order to avoid every change to cause the whole page to rearrange or redraw, the page is divided into multiple layers and a corresponding LayerTree is generated. Each node in the LayerTree corresponds to a layer. These layers are added together in a certain order to form the final page image.
Typically, not every node in the layout tree occupies a single layer. Nodes without layers will be subordinated to the parent node’s layer, and eventually each node will be directly or indirectly contained within a single layer. When placed in the same z-axis space, a layer (or RenderLayers) is formed.
The rendering engine creates separate layers for specific nodes:
- Meet the following special conditions:
- 3D Transforms: Translate3D, translateZ, etc
- Video, Canvas, iframe and other elements
- The opacity animation is converted using element.animate ()
- The OPACITY animation is converted using the с SS animation
- position: fixed
- Has the will-change attribute
- Animation or Transition is applied to opacity, Transform, Fliter, and BackdropFilter
- Scrollable Overflow element: When the content of an element exceeds the container, if overflow is set so that the content can be scrollable, separate layers are created for the container, for all the content that is scrollable, and for the scrollbar. This way, the parts don’t have to be rearranged/redrawn to affect each other when scrolling.
Create layer implicitly (Compositing Layer)
If an element is overlaid on an element that has a separate layer, and that element has a larger Z-index, that element will also be promoted to a separate layer.
Layer explosion and layer compression
If you don’t pay attention to the above implicit layer creation situation, a large number of independent layers may be created in extreme scenarios, namely layer explosion, which will take up a large amount of GPU and memory resources and seriously consume page performance. The browser also has a counter-measure. The browser’s layer compression mechanism compresses implicitly created layers into a single layer.
The z-index:3 element is a separate layer, and the following three elements with larger Z-index values are compressed into one layer:
However, in many specific cases, the browser is not able to perform layer compression. For example, if a section of the page is animated using transform and animation, implicit layer creation can occur without overlapping because of the possibility of dynamic overlap, resulting in all nodes on the page with z-index higher than it being promoted to a separate layer. The page produces a large number of separate layers. This can be eliminated by increasing the z-index value or optimizing the structure of the page elements.
Paint – Produces a list of layers to Paint
After getting the layer tree, the rendering engine will draw each layer in the layer tree, break the drawing of a layer into a number of small drawing commands, and then put the commands in order to form a list:
The output of the layer drawing phase is the list to draw. You can view a drawing list of layers in the Developer Tool, and drag the progress bar in area 2 to reproduce the drawing process of the list:
Six raster – Produces a combined bitmap
When the layer drawing list is ready, the main thread in the render process submits the drawing list to the compositing thread.
The content of the page is often much larger than the screen, and waiting for all layers to be drawn before composing can create unnecessary overhead and take longer to compose the image. The composition thread divides the layer into fixed tiles, usually 256×256 or 512×512 in size. If the layer is very large, blocks near the viewport (the visible area of the page on the screen) will be prioritized.
The rendering process maintains a rasterized thread pool within which rasterization of all blocks is performed. Rasterization is the process of generating images by following the draw list instructions, one image for each layer, and the compositing thread will combine these images into “one” image. A block is the smallest unit of deletion.
Typically, the rasterization process uses the GPU to speed up the generation. This process is called fast Rasterization or GPU rasterization. The rendering process sends instructions to the GPU to generate the bitmap, and the resulting bitmap is stored in the GPU memory.
But even a map of the highest priority, also want to spend a lot of time, because involves a very key factor – texture upload (bitmap will be stored in the Shared memory is uploaded to the GPU texture, finally by the GPU will more bitmaps for synthetic), that is because in the operation of the computer memory is uploaded to the GPU memory will be slow. Chrome’s strategy is to use a low resolution image when composing blocks for the first time. Cutting the resolution in half reduces the texture by three-quarters. When the content of the page is displayed for the first time, this low resolution image is displayed, and the synthesizer continues to draw the normal scale content of the page, and then replaces the current low resolution content.
Since the composition is done on the composition thread, it does not affect the main thread, which is why often the main thread gets stuck but the CSS animation still executes. While using JavaScript to achieve animation, will involve the entire rendering pipeline, rendering efficiency. For situations where you can use compositing threads to handle CSS effects or animations, use will change to tell the rendering engine ahead of time to prepare separate layers for that element and to turn on GPU acceleration.
For example will – change: the transform, opacity; Tell the render engine that some effects will be applied to the element, and the render engine will assign a separate layer to the element. When the transformation occurs, the render engine will handle the transformation directly through the compositorthread, which greatly improves the rendering efficiency because it does not involve the main thread. So CSS animations are more efficient than JavaScript animations.
But it also takes up more memory, because each phase of the hierarchy starts with one more layer, which requires extra memory.
Seven shows
Once all the blocks have been rasterized, the composite thread generates an instruction to draw the block — “DrawQuard” — and then submits this instruction to the browser process. The VIz component in the browser process receives the instruction, draws the page contents into memory, and finally renders the in-memory page to the screen.
Each monitor has a fixed refresh rate, usually 60Hz, that is, 60 pictures are updated every second, with the updated pictures coming from the front buffer of the graphics card. The monitor reads the image in the front buffer 60 times per second to display. The GPU in the graphics card synthesizes the new image and saves it to the back buffer. The system then swaps the front and back buffers of the graphics card so that the display can read the latest images. Usually, the update rate of the graphics card and the refresh rate of the monitor are the same, but sometimes in some complex scene, the graphics card processing a picture is slow, resulting in visual lag.
For animations on the page such as scrolling, the rendering engine generates new images through the render pipeline and sends them to the back buffer of the graphics card. To achieve smooth animation, the rendering engine needs to update 60 images per second to the back buffer of the graphics card. Each image is called a frame, and the number of frames updated per second is called the frame rate. If 60 frames are updated per second during scrolling, the frame rate is 60Hz (or 60FPS). If the rendering engine takes too long to generate certain frames in a single animation, the user will feel stuck.
When the render process finishes, the render process sends a message to the browser process, which stops the load animation on the label icon.
conclusion
- The rendering process parses the HTML into a DOM tree structure.
- The rendering engine parses the CSS stylesheet into CSSOM and calculates the style of the DOM node.
- Create a layout tree and calculate the layout information for the elements.
- Layering the layout tree and generating a layering tree.
- Generate a draw list for each layer and submit it to the compositing thread.
- The compositing thread divides the layer into blocks and converts the blocks into bitmaps in the rasterization thread pool.
- The compositing thread sends the DrawQuad command to the browser process.
- The browser process generates the page from the DrawQuad message and displays it on the monitor.
A few concepts related to performance tuning:
rearrangement
Changing an element’s geometric position attribute, such as width and height, causes a relayout and subsequent process called rearrangement.
The following DOM API is called, and to ensure accuracy, the browser triggers a reorder to get the latest information:
- OffsetTop, offsetLeft, offsetWidth, offsetHeight(elements contain width/height of borders)
- ScrollTop, scrollLeft, scrollWidth, scrollHeight scrollTop, scrollLeft, scrollWidth, scrollHeight
- ClientTop, clientLeft, clientWidth, clientHeight
- Window.getcomputedstyle (eleM), currentStyle in IE
- getBoundingClientRect()
redraw
Changing only the element’s background color, without changing the geometric position information, triggers a redrawing and subsequent process called redrawing.
synthetic
If you don’t want the properties of either layout or draw, the rendering engine skips the layout and draw and only performs the subsequent compositing, a process called compositing. For example, using CSS transform to achieve animation effects can avoid the rearrangement and redraw stages, and perform synthesis animation operations on the non-main thread, without consuming the resources of the main thread, greatly improving efficiency.
Therefore, in terms of execution efficiency: Synthesis > redraw > rearrangement. When it comes to performance optimization, you can adjust the solution according to this principle.
References
- Geek Time: Browser Rendering Principles and Practices
- Synthesis and browser rendering layer optimization: mp.weixin.qq.com/s/knmQ1XRwt…