It has been almost a month since we started to sort out how browsers work (mainly webKit-powered browsers). This article aims to help front-end engineers understand how browsers work and optimize the performance of our web pages accordingly. I mainly refer to WebKit Technology Insider, MDN, W3C and other websites. Some pictures in the following are excerpted from WebKit Technology Insider. Thank you here. This article is a bit long, if any discomfort, it is unexpected. If there are incorrect places, but also hope to correct, after all, the spread of truth will not mislead other students, common progress is the goal.

The browser kernel

The browser kernel is composed of a rendering engine and a JS engine. Different browsers, even different models of the same browser, may have different rendering engines and JS engines.

Rendering engine

1)Trident Rendering engine -> old IE series 2)Edge rendering engine -> IE in Win10 3)Gecko Rendering engine -> Mozilla Firefox 4)Presto Rendering engine -> Opera 5)KHTML rendering engine -> Early Safafi and Google Chrome 6)Webkit rendering engine -> Safari and Chrome after 2001 and some browsers in China 7)Blink Rendering engine -> New version of Chromium Browser Google project

JavaScript engine

SpiderMonkey engine > Mozilla Firefox 3)V8 engine > Google Chrome 4) Linear B/Futhark engine > Opera

Progress history of the browser rendering engine

To provide you with two common development query sites:

The browser’s html5 support

Is different from Can I use

Browser rendering engine and dependency module analysis

The dotted lines in the figure above represent the capabilities provided by the rendering engine.

Here the rendering engine includes the JavaScript engine, and many times the two are indistinguishable.

Most of what follows is based on this diagram. We will explain step by step what happens from the moment the user enters the URL to the moment the page is presented to the user.

Let’s start with the basics of the web.

Web basics

HTML – Structure CSS – Style JavaScript – Behavior Also requires some static resources: PNG, GIF, webp, MP4, FONT, SVG, etc. From the above several parts constitute our web page.

Enter the URL to the page to display the schematic diagram

The reader should not worry too much about the order of the graph, which may be out of order at some point.

Browser kernel parts explained

HTML interpreter: An interpreter that interprets HTML text into a DOM tree, which is a document representation. CSS interpreter: Cascading style sheet interpreter that computes style information for individual element objects in the DOM and provides the infrastructure for calculating the layout of the final web page. Layout: After the DOM is created, WebKit needs to combine the element objects in the DOM with the style information, calculate their size and location and other layout information, and form an internal parity representation model that can represent all this information. JavaScript engine: Using JavaScript code, you can modify the content of the web page, as well as the CSS information. The JavaScript engine can change the rendering result by interpreting the JavaScript code and modifying the content and style information of the web page through the DOM interface and CSSOM interface. Drawing: use the graphics library to draw the nodes of each web page into image results after the layout calculation.

PS: These modules rely on many other base modules, including networking, storage, 2D/3D graphics, audio and video, and picture decoders. The basic modules will not be explained here.

Below, I’ll go through the process by process, and I’ll omit some procedures that are not for the purpose of this article, such as the DNS link.

HTML parser

Let’s take a look at how the HTML interpreter works

Bytes > Characters > Tokens > Nodes > DOM tree

First byte streams, then character streams after decoding, then Tokens are interpreted as words through a lexical analyzer, then nodes are built through the parser, and finally these nodes are built into a DOM tree. Lexical analysis: HTMLTokenizer (lexical analysis, similar to a state machine), input is a string, output is a word. XSSAuditor: XSSAuditor (Token Stream) XSS refers to Cross Site Security, mainly for Security reasons. Word to Node: WebKit is used to build DOM nodes. This is done by the HTMLDocumentParser class calling the constructTree function of the HTMLTreeBuilder class. Node to DOM tree: The element nodes in the tree create attribute nodes and so on is done by the HTMLConstructionSite class, which contains an HTMLElementStack as a stack to hold element nodes. JavaScript execution: WebKit assigns the JavaScript code that needs to be executed during DOM tree creation to HTMLScriptRunner class. DOM event mechanism: webkit uses EventTarget class to represent the EventTarget defined in the Events part of the DOM specification. Node Node inherits from EventTarget class, so Node has the EventTarget class related methods.

The ShadowRoot interface of the Shadow DOM API is the root node of a DOM subtree that is rendered separately from the main DOM tree of the document. MDN- Shadow node

Shadow DOM

Definition: Shadow DOM provides encapsulation for DOM and CSS in Web components. The Shadow DOM keeps these things separate from the DOM of the main document. The ShadowRoot class inherits from the DocumentFragment class. PS: you can use the document. CreateDocumentFragment method or constructor to create an empty DocumentFragment.

CSS interpreter and style layout

Let’s look at how CSS works with DOM to present a page.

The CSS interpreter and rule matching occur after the DOM tree is created, and before the RenderObject tree is created, the results interpreted by the CSS interpreter are stored, and the RenderObject tree uses the results to perform specification matching and layout calculations.

CSSOM (CSS Object Model)

The CSSOM View Module defines apis that Web developers can use to inspect and programmatically change the visual properties of documents and their content, including layout box positioning, viewwidths, and element scrolling. You can view the StyleSheetList object of the current page, and each link and style will generate CSSStyleSheet as the value of the StyleSheetList object.

CSS interpreter and rule matching (this section will give us a better understanding of CSS selector weights)

DocumentStyleSheetCollection class (as part of the Document), the class contains all CSS stylesheets, includes its CSSStyleSheet internal representation of the class, it contains the CSS href, types, content and other information. CSS interpretation process: The process by which CSS strings are processed by the CSS interpreter into internal rules for the rendering engine, using the CSSParser class. Before explaining the custom CSS styles in a web page, the WebKit rendering engine actually sets a default style for each web page, which is the root of why we reset the browser style. Rule matching: The StyleResolver class matches styles for element nodes in the DOM. The StyleResolver class looks for the most matching rule from the style rules based on element information, such as tag names, categories, and so on, and saves the style information into a new RenderStyle object. Finally, these RenderStyle objects are managed and used by the RenderObject class. Among them, the matching of rules is calculated and obtained by ElementRuleCollector CLASS. It obtains the rule set from DocumentRuleSets according to the attribute information of elements and other information, and then matches the style of elements according to ID, CLASS, tag and other selector information successively. Webkit then sorts the rules, and for the style attributes needed for the element, WebKit selects from the higher-priority rule and returns the style attribute value. On a less relevant note, I introduced the Block Formatting Context (BFC), which is the part of a Web page that is visually rendered in CSS. It is the area where block-level box layout occurs and where floating elements interact with other elements. For those of you who are not familiar, please asynchronous MDN-BFC

Its layout

When WebKit creates RenderObject objects, each object does not know its position, size and other information. Webkit calculates its position, size and other information based on the box model, which is called layout calculation/typesetting. Classification of layout calculation: The first is the calculation of the entire RenderObject tree; The second type is a calculation of a subtree in the RenderObject tree, which is common for text elements or overflow: Auto blocks. Layout calculation: Layout calculation is a recursive process because the size of a node usually requires the location, size, and other information of its children to be calculated first.

Expand knowledge

We talk about reflow and repaint. Reflow will degrade performance due to changes in geometry properties of elements (transform, opacity and other properties do not cause reflow). Why does the extension say that the Transform implementation animation performs better than setting geometry properties directly? 1. Webkit rendering process: Style -> Layout(reflow occurs here) -> Paint (repaint occurs here) -> Composite, transform is located in ‘Composite’, Width, left, margin, etc. are in the ‘Layout’ layer, which must result in reflow. 2. Modern browsers enable GPU acceleration for transform, etc. Style -> Layout(reflow happens here) -> Paint (repaint happens here) -> Composite (transform happens here) This is the fundamental reason why repaint must occur when reflow occurs. Explains the Animation performance of transform from the perspective of redrawing and rearranging

Some theories of the rendering process

The RenderObject tree, along with other trees such as the RenderLayer tree, forms the primary infrastructure for WebKit rendering.

RenderObject tree (DOM tree -> RenderObject tree)

A RenderObject holds all the information needed to draw DOM nodes, such as style layout information, and after being processed by WebKit, the RenderObject knows how to draw itself.

The following situation causes a DOM tree node to create a RenderObject (DOM and RenderObject are not one-to-one correspondence).

1. Document node of DOM tree.

2.DOM tree visual nodes, such as HTML, body, div, etc. Webkit does not create RenderObject nodes for non-visual nodes, such as meta and Script.

3. In some cases, WebKit needs to create anonymous RenderObject nodes, which do not correspond to any nodes of DOM tree, but are required by WebKit processing. Typical examples are anonymous RenderBlock nodes.

When HTML constructs a page structure, WebKit introduces a hierarchical structure to improve web page performance.

Page hierarchy (CSS can also have an important impact on the layering strategy of web pages)

For an HTML file, WebKit creates a new layer for some elements and their children, so that WebKit can operate on a layer to improve performance. 1. The Video TAB – WebKit effectively handles interaction and rendering between the video decoder and the browser in a new layer. 2. Div, P and other common tags – when 3D transformation is involved. 3. Canvas tag – complex 2D and 3D drawing operations.

RenderLayer tree

Webkit creates RenderLayer objects for each layer of the page. When certain types of RenderObject nodes or CSS style RenderObject nodes appear, WebKit creates RenderLayer objects for those nodes.

The RenderLayer tree is a new tree based on the RenderObject tree. RenderLayer nodes and RenderObject nodes do not have a one-to-one relationship, but a one-to-many relationship.

When does a RenderObject node need to create a new RenderLayer node?

1. The Document node of the DOM tree corresponds to the RenderView node.

2. The child node of the Document in the DOM tree, that is, the HTML node corresponding to the RenderBlock node.

3. Explicitly specify the RenderObject node for the CSS location.

4. RenderObject nodes with transparent effects.

5. RenderObject nodes with overflow, alpha, or reflection effects.

6. RenderObject nodes using Canvas 2D and 3D (WebGL) technology.

7. RenderObject corresponding to the Video node.

rendering

Drawing context (there are two types of drawing context) : the first is the context used to draw 2D graphics, called a 2D drawing context. The second is the context in which 3D graphics are drawn, called GraphicsContext3D. Three rendering methods of web pages: 1. Software rendering (CPU memory) 2. Synthetic rendering using software drawing (GPU memory) css3D, WebGL 3. Hardware-accelerated synthetic rendering (GPU memory)

Webkit software rendering technology

Webkit uses software rendering techniques to render pages when hardware accelerated content is not required (including but not limited to CSS3 3D transformations, CSS3 3D transformations, WebGL, and video). For each RenderObject object, there are three stages to draw itself: the first stage is to draw the backgrounds and borders of all blocks in the layer. The second stage is to draw the floating content. The third stage is Foreground, that is, the content part, outline, font color, size, etc. (the background and border of embedded elements occur at this stage).

Hardware acceleration mechanism

Hardware acceleration refers to using the hardware power of a GPU to help render web pages (the GPU is used to draw 3D graphics and is particularly good).

Chrome hardware acceleration

Canvas development, can decompose the canvas into smaller canvas, so when updating only need to update the small canvas to reduce overhead. Css3 3D deform technology, which allows the browser to animate all the layers by just using a synthesizer (Composite instead of Style -> Layout(reflow happens here) -> Paint).

WebGL

WebGL is a set of javascript interfaces based on 3D graphics definitions proposed by the Khronous organization. It is based on the Canvas element, and unlike canvas2D, Web developers can use a 3D graphics interface to draw various 3D graphics.

CSS 3 d deformation

This includes 3D morphing and animation. Webkit creates a new layer for processing to improve performance.

JavaScript engine

JIT (Just-in-time) is a powerful tool to improve the speed of JavaScript. JIT: Compilation of code into machine code on the target machine while it is running on the target platform. Compilation principle: C++ : source code – > abstract syntax tree – > native code Java: source code – > abstract syntax tree – > bytecode (cross-platform) – > JIT – > native code

Some of V8’s features (too many here for readers to delve into)

In JS, the basic data types are Boolean, Number, String, Null, Undefined, Symbol, and other data are objects.

Data representation

In V8, the representation of data is divided into two parts. The first part is the actual contents of the data, which are varied-length and of different types: String, object, etc. The second part is the handle to the data. The size of the handle is fixed and the handle contains Pointers to the data.

Storage of data

Handle: A Handle class that manages basic data and objects for garbage collector manipulation.

There are two main types, a Local class (inherited from Handle) that represents data on the Local stack and is therefore lightweight.

The other is the Persistent class (inherited from the Handle class) that represents data and object access between functions.

For shaping data, Handle itself stores it for fast access.

Other data is stored in memory from the heap, and other data types are stored in the heap due to the size and length of Handle, etc.

V8’s Deferred feature: Much of the compilation of javascript code does not occur until it is called at runtime, reducing the time overhead.

extension

Common language types

Machine language (it is the only language that a computer can execute directly. The machine instructions of an electronic computer are a list of binary digits.) Assembly language Assembly instructions are written in an easy-to-remember format for machine instructions, but they need to be translated into machine language by a compiler so that the machine can execute them.

What are the disadvantages of using setTimeout or setInterval over requestAnimationFrame?

What is the appropriate time interval? Does it depend on the screen resolution (there is a minimum for different browsers)? Will the set time be executed accurately according to it? Will the animation be displayed smoothly? Callbacks should be complex or simple. window.requestAnimationFrame

Knowledge of other browsers

Plug-ins and Javascript extensions

In the early days of browsers, the capabilities were very limited, and Web front-end developers wanted to extend the capabilities of browsers through some mechanism (plug-in mechanism such as flash plug-in).

NPAPI stands for Netscape Plugin API

NPAPI is the most popular plug-in architecture today, almost all browsers support, but there are great security risks, plug-ins can steal system low-level permissions, launched malicious attacks.

PPAPI is also known as the Pepper Plugin API

In 2010, Google developed a new VERSION of PPAPI, and put plug-ins into the sandbox to run. In 2012, Windows and Mac versions of Chrome have upgraded PPAPI Flash Player, and hope to completely eliminate NPAPI by the end of this year.

Extension mechanisms for JavaScript engines

View the current installed Extensions of Chrome at the following url ://extensions/

multimedia

WebRTC

WebRTC implements web-based video conferencing under the WHATWG protocol, which aims to provide real-time Communications (RTC) capabilities with simple javascript in a browser. MDN – the most important methods: WebRTC navigator. MediaDevices. GetUserMedia (constraints) and Video, Audio, etc.

Security mechanism

The first part is web security, including but not limited to web data security transmission, cross-domain access, user data security and so on. The second part is the security of the browser, which means that although the web page or Javascript code has some security problems or security holes, the browser can also be squeaky safe when running them, so as not to be attacked to leak data or damage the system.

Web security model

Basic security model: Same Origin Policy XMLHttpRequest, cookie reading and writing, DOM object operation, etc. XSS (Cross Site Scripting) executes cross-domain JS script code. Developers can avoid this by converting user input data to characters. Webkit helps us filter through the XSSAuditor object (enabled by default). Content-security-policy (CSP) HTTP header field. A Content Security Policy (CSP) is used to detect and mitigate specific types of attacks used on Web sites, such as XSS and data injection. The Cross Origin Resource Sharing (CORS) standard adds a new set of HTTP header fields that allow servers to declare which source sites have access to which resources. Specific server code Settings

// Request header(' access-Control-allow-origin: http://arunranga.com'); // header('Access-Control-Allow-Methods: POST, GET, OPTIONS'); header('Access-Control-Allow-Headers: X-PINGARUNER'); Access-control-allow-origin, access-control-allow-credentials, access-control-allow-headers, Access-control-expose-headers, access-Control-allow-methods, access-Control-max-ageCopy the code

Cross Document Messaging communicates via window.postMessage and message events. HTTPS (Secure Transport Protocol) SPDY (pronounced “SPeeDY”) is a TCP-based application layer protocol developed by Google to minimize network latency, improve network speed, and optimize users’ network experience. The core idea of SPDY is multiplexing. Quick UDP Internet Connection (QUIC) is an Internet transport layer protocol with low latency based on UDP developed by Google.

Differences between CSP and CORS:

CSP defines certain domains and resources that the web page itself can access. CORS defines how a web page can access cross-domain resources that are prohibited by the same origin policy, and defines the protocol and mode of interaction between the two.

The sandbox models

Browser sandbox model is to use the security technology provided by the system, so that the web page in the execution process will not modify the operating system or access the private data in the system, but need to access the system resources or system call, through a proxy mechanism to complete.

Chrome Browse its tips (in order of usefulness)

Enter the following URL directly in the browser, enter

URL	role
chrome://inspect	Mobile web debugging
chrome://net-internals	Net-internals is a set of tools to help diagnose network request and access problems. It listens to and collects EVENTS and data such as DNS, Sockets, SPDY, and Caches to give developers feedback on the process, status, and potential impact factors of various network requests. For example, check DNS host resolution cache chrome://net-internals/# DNS
chrome://view-http-cache/	View internal storage contents and details
chrome://downloads/	To download content Management, the shortcut key is Ctrl+J
chrome://extensions/	Extension management
chrome://bookmarks/	The bookmarks management
chrome://history	Access History Management
chrome://restart	Restart Chrome
chrome://apps	Chrome Web app store
chrome://flags/	New Feature Management
chrome://dns	See DNS prefetch naming (to predict from hyperlinks, etc.)
chrome://quota-internals	View the disk space quota used by the browser
chrome://settings	Browser Settings
chrome://sync-internals	Check the chrome synchronization status
chrome://about/	View all Chrome commands

Looking forward to joining a nice technical atmosphere team in Chengdu

How browsers work – WebKit kernel research

The browser kernel

JavaScript engine

Browser rendering engine and dependency module analysis

Web basics

Browser kernel parts explained

Shadow DOM

CSSOM (CSS Object Model)

Its layout

Some theories of the rendering process

Page hierarchy (CSS can also have an important impact on the layering strategy of web pages)

rendering

Hardware acceleration mechanism

WebGL

JavaScript engine

Data representation

extension

What are the disadvantages of using setTimeout or setInterval over requestAnimationFrame?

Plug-ins and Javascript extensions

PPAPI is also known as the Pepper Plugin API

multimedia

Security mechanism

Differences between CSP and CORS:

Chrome Browse its tips (in order of usefulness)

Related Posts