Author: Zhen Zi

When it comes to performance optimization, it usually starts from two directions: industry-standard optimization methods and actual performance bottlenecks. Being able to learn from advanced models, methods and routines and design solutions based on actual performance is already the path to high-quality work. However, the road is still full of judgments and choices. If you make a mistake, you will still get stuck in the mud and walk on the road of “3,000 lives”.

There are roughly three parts to this “3,000 lives” path: perception, healing, and preservation. The so-called “perception” refers to the ability to detect and understand performance problems? By “healing”, I mean can the performance problems discovered and understood be resolved? The so-called “anti-corrosion” refers to whether the performance problem will deteriorate in the future after being solved? This three parts have very rich experience in time, articles, and theory, go here, I want to try from the perspective of overall and systematic, to share my humble opinion of “the nature of the rendering performance optimization”, and tries to put forward a path from the underlying principle, on the rendering performance optimization direction, in the face of the complicated problem, Have more precise and clear basis and more valuable scheme.

Hardware perspective

Broadly speaking, the essence of performance is to find a balance between experience, processing power and power consumption. The inspiration of my definition comes from the hardware chip design, which requires the chip to balance the three directions of area, performance and power consumption from the perspective of hardware engineering. In the use of FPGA, CPLD chip design and verification, logic gate number is limited by chip and the size of the production process and produces an overall area constraint, at that time, in order to use some combination of logic gates into a dedicated circuit (the so-called IP) so as to improve performance and reduce power consumption (dedicated circuit power consumption is less than the software plus general circuit), otherwise, It can only be designed as a general circuit such as register operation, general processing instructions… And so on. Therefore, it can be seen that, when conditions permit, the dedicated circuit is called “specialization” can provide better area, performance, power ratio.

Some students will ask: how to do when the utilization rate of these specialized circuits is less than that of the universal circuits?

That’s why M1 can help Apple to be number one in the industry, because Apple has a global plan from software ecosystem to operating system, from underlying system API to hardware driver, from driver to hardware electronic circuit design. This global planning ensures that the most frequently called system programs are hardwareized, while improving the performance and power ratio, it also ensures the utilization of specialized circuits, which is what I call the hardware perspective.

Another classmate asked: You this is Apple’s system Android can not be used.

It is true that Android as an open source ecosystem is not as polished, concise, and consistent as Apple’s closed system, but if you are willing to take a look at the Android open source ecosystem, you can find clues to the capabilities of the open source ecosystem in terms of: The path from the software ecosystem to the operating system, from the underlying system API to the hardware driver, from the driver to the professional electronic circuit provided by the hardware is carded out. According to the path corresponding to the software engineering, the performance optimization problem is viewed globally from the hardware perspective, and the ability of the underlying hardware is fully utilized.

When I was in charge of the international browser, due to the backward infrastructure construction in India, India and Southeast Asia, the mobile network conditions were very poor and the bandwidth was touching. The mobile Internet in the multimedia era was full of pictures and videos. My team and I worked together on super resolution (a technique that uses machine learning algorithm model to predict images to achieve a super resolution conversion from 240p to 720p that traditional interpolation cannot achieve), hoping to bring a better experience to UC browser users in India and Southeast Asia.

Model and algorithm with the help of the team will soon be a breakthrough, we have solved most of the models to predict the image display an error, but the entire model required force is both India and southeast Asia mobile terminal equipment is unable to support, even if we tried to reduce accuracy, model compression and pruning, knowledge means such as distillation, On low-end models like the Redmi (which is defined as an ARMV7 processor with 1 gigabyte of memory), the speed is still only a few frames per second, making it unusable.

So we turned to ARM NEON instruction optimization: a set of instructions for parallel floating-point and vector computation acceleration. Using the open source XNN framework, we were able to optimize the OP NEON to improve the forward speed of the algorithm model and reduce the pressure on memory and CPU. After nearly a year’s effort, we finally achieved 240p to 720p super-resolution capability of 24 frames per second. Deployed in UC Browser to serve India and Southeast Asia users.

Although I have often come into contact with assembly instruction optimization in software engineering before, this global optimization experience from software (algorithm model) to system API and from driver to hardware electronic circuit makes me really feel the importance of hardware perspective. The algorithmic engineer who worked late into the night, teaching himself Android programming and NEON’s instruction set, was Zhenqi, who was promoted to senior expert on P8 algorithms for the project.

So, you might ask: What does this have to do with the front end? I will try to give an example to illustrate the similarity between them. First, the front end contains two parts: rendering and calculation. The rendering part is defined by BOTH HTML and CSS, and then rendered by the browser, so the browser really blocks most of the connectivity to the underlying capabilities, leaving the front-end with no grip. However, as new apis such as WebGL and WebGPU are exposed, there are still some grips on the front end. The computing part is defined by JavaScript and then executed by the script engine, so the script engine blocks most of the connectivity to the underlying capabilities, leaving the front end without a griper. Meanwhile, the script engine basically uses virtual machines to block the underlying instruction set and hardware differences, thus adding another layer of masking. However, this will change as technologies like Node.js and WASM allow some programs to be executed locally, and as the V8 engine uses special strategies to allow some JavaScript to be compiled into Local code by Sparkplug. V8. Dev /blog/sparkp…

Therefore, the front-end has dealt with the underlying hardware more directly through the browser/Webview in many scenes. The hardware perspective will become the key to find a balance in the three directions of experience, processing power and power consumption. Next, the core scenes of rendering performance optimization, including rendering and computing, will be introduced respectively.

Render the view

HTML and CSS define the content of the rendering, and JavaScript can interfere with the content and rendering process. Without the intervention of JavaScript, RENDERING as defined by HTML and CSS is rendering of static HTML pages (not discussed in this article due to special features like animation and video). After the demise of DHTML and XSLT, dynamic rendering is more done by JavaScript. This demonstrates the advantages of decoupling, and at the same time, elevates dynamic rendering capabilities from simple API calls to complex logic control of programming languages, freeing up unlimited possibilities. To sum up, I split the rendering perspective into three parts: rendering content, rendering process and JavaScript intervention, to explain my understanding respectively.

To render content

First of all, the emergence of the WWW has pushed mankind from an information island to an interconnected world Wide Web era, and the carrier of information is HTML on the Web commonly known as hypertext Markup Language (HTML). At that time, the core of HTML rendering is the typography engine, and Web standards are also around typography. As technology has evolved, people have grown tired of typesetting static content, technologies like DHTML and XHTML (XML + XSLT) and advanced apis like HTTPWebRequest… Bringing dynamic capabilities, Flash, Silverlight, JavaAplate… As Web 2.0 decentralization broke the monopoly of portals, the entire Internet industry brought unprecedented prosperity.

With the development of the industry and the progress of technology, rendering the content from the original simple information “document layout”, the “rich media” carry multimedia information, and then to today’s augmented reality WebXR carrying complex digital and real information, rendering the content to the rendering engine, display, hardware acceleration is put forward different requirements. At its simplest, each engine will separate the Animation API to distinguish such heavy rendering work and make special optimizations for the framework and underlying engine.

There are also differences in the ability to render content, most commonly resolution, HDR… And other contents have special requirements for display ability. Hardware acceleration is easier to understand. According to the rendering work of different loads, firstly, the pressure of CPU, memory and disk I/O is reduced, and secondly, professional electronic circuits such as GPU and DSP are used to replace them, so as to achieve higher performance/power ratio.

The purpose of understanding this is to distinguish between software engineering constructs. The selection of these constructs has a decisive impact on rendering performance, and this difference is limited by the underlying hardware such as CPU, GPU, DSP, specialized acceleration circuits… And so on, they are integrated and closely related from content parsing to underlying hardware acceleration capabilities. Even with the lack of direct control over how content is chosen (UI controls and Draw API selections in apps, HTML markup and CSS style selections in the Web affect Paint… And so on) require insight and understanding down to the bottom, in order to describe an optimal path closer to the essence of the problem on the basis of “it takes your life 3000”.

In the case of UI controls and Draw API selection in the end app, content is very limited in terms of API selection. In 2016, when I was leading the browser team, I read the source code for Servo, a next-generation browser engine developed by Mozilla and Samsung. Because Programmers from both sides of Servo invented the great Rust programming language, which I love and have learned for a long time, I recommend you to read it.) The Demo provided by Servo’s open source project uses Android SurfaceView to ensure browser rendering performance. The reason for not using View is that View sends VSYNC signal to redraw through the system, and the refresh interval is 16ms. If the drawing cannot be completed in 16ms, the interface will lag. Therefore, Servo chose SurfaceView to solve this problem. At a deeper level, it’s essentially the complex and dynamic content of HTML that makes the View inappropriate. As you can imagine, the View’s constant partial refresh will cause the page to flicker, while SurfaceView’s dual buffering technology allows the image to be processed in memory and then displayed on the screen. This solved Servo’s problem of displaying HTML content. Similarly, many games opt for double buffering because what they display is “game”.

OpenGL GPU has two rendering modes: On-screen and off-screen rendering, rasterization, masks, rounded corners, and shadows trigger off-screen rendering, while off-screen rendering requires creating separate buffers, switching context multiple times (on-screen to off-screen conversion), and finally switching context from off-screen to current to display the rendering results of the off-screen buffer. All of these are just API capabilities, but the choice of content determines rasterization, masks, rounded corners, shadow triggering, and performance dissipation, which is the underlying and hardware impact of rendering content.

The principle of front-end and end applications is the same, but the difference is that the front-end path is longer, and it is more difficult to penetrate the perspective to the bottom layer and hardware, because the front-end host environment: browser/browser kernel/rendering engine is wrapped on top of the UI engine and rendering engine of the system. This layer also covers different implementations by different browser vendors on different platforms. But with WebGL and WebGPU… Technology in the field of the front-end application, through to the underlying hardware and the difficulty of was simplified by the parallel technology ability and reduce the front even in perspective through to the underlying hardware and at the same time, to the underlying hardware and have certain ability to intervene, especially content rendering hardware acceleration, to render the content design and realize a more relaxed environment.

With the rendering content of the control, there is no need to repeat the implementation, it is very simple one far one near two steps. “One far” refers to penetrating the perspective down to the bottom and examining what new and interesting hardware technology capabilities can bring based on business requirements and product design. The “one close” is to pull back the perspective and select the appropriate UI controls and Draw API, HTML tags and CSS to build the content to render, leaving the rest to do the intermediate rendering.

Rendering process

From the perspective of imaging principle, the rendering process includes: CPU calculation (UI engine or browser engine, rendering engine work), graphics rendering API (OpenGL/Metal), GPU driver, GPU rendering, VSYNC signal transmission, HSYNC signal transmission process. Common rendering problems include: stalling, tearing, dropping frames… Stutter, rip and frame drop are usually caused by long rendering time. Most of the rendering time problems are spent on CPU calculation, and some of them are spent on graphics rendering. In fact, it is very simple. Take a complex page with poor rendering performance and use a high-end machine to record the screen in a smooth rendering process, and then take a low-end machine to play the video and open the page on the same phone for the browser to render. The image complexity is the same, but the video playback is much faster than the page rendering performance. This is how much time it takes to demonstrate CPU and GPU computing and graphics rendering, as video decoding and rendering is much simpler and much shorter than a browser engine. (except special codec format and high bit rate video)

Therefore, from the perspective of rendering process, the essence of performance optimization is to reduce CPU and GPU computing load first. Secondly, if conditions exist (the business side needs to be convinced that the implementation difference has an impact on the business) to influence the rendering process through different construction methods of rendering content, the low-level API with CPU, GPU optimization instructions and special electronic circuit acceleration should be preferred to construct the rendering content. For example, in today’s world of H.264 hardware acceleration, it is debatable whether x.265 / H.265 should be used.

When discussing the rendering process, the smoothness index needs to be paid attention to first. According to the frame rendering speed of 16.6ms at 60Hz refresh rate, the CPU and GPU processing time of 16.6ms x 2 (double buffering) and 16.6ms x 3 (triple buffering) can be defined from the perspective of time. Compress the rendering process to ensure smoothness.

OOPD (Out of Process Display Compositor), Its main purpose is to migrate the Display Compositor from the Browser process to the Viz process (the original GPU process) and Browser becomes a Client of Viz. Renderer creates a CompositorFrame link with Viz through Browser, but commits the CompositorFrame link directly to Viz. The Browser also submits the CompositorFrame to Viz and generates the final CompositorFrame from Viz to the Renderer via Display.

OOPR (Out of Process Rasterization) The main differences between OOPR and current GPU Rasterization mechanism are as follows:

  1. In the current GPU rasterization mechanism, when the Worker thread performs the rasterization task, Skia will be called to convert 2D drawing instructions into GPU instructions, and the GPU instructions issued by Skia will be transmitted to the GPU thread of Viz process for execution through the Command Buffer.
  2. In OOPR, when Worker thread performs raster task, it directly serializes 2D drawing instruction (DisplayItemList) to Command Buffer and transmits it to GPU thread. The part running on GPU thread will call Skia to generate the corresponding GPU instruction, and the instruction will be directly executed by GPU.

In short, the rasterized parts of Skia are transferred from the Renderer process to the Viz process. When OOPD, OOPR, and SkiaRenderer are enabled:

  1. Rasterization and integration moved to the Viz process;
  2. Rasterization and Chengdu use Skia to do 2D drawing. In fact, all 2D drawing of Chromium is finally done by Skia, which generates corresponding GPU instructions.
  3. During rasterization and synthesis, Skia finally outputs GPU instructions in GPU thread and uses the same Skia GrContext (GPU drawing context within Skia).

This means that after Skia’s support for Vulkan, Metal, DX12 and other 3D apis is improved, Chromium can decide which GPU API Skia uses for rasterization and composition based on different platforms and devices. Vulkan, Metal, DX12 and other lower-level apis bring lower CPU overhead and better performance than GL apis.

Throughout the rendering process, different Low Level apis are affected by raster process, raster process is affected by synthesizer working process, synthesizer working process is affected by Blink on rendering content processing:

To be interested in the rendering process, take a look at this document: Life of a Pixel, the learning and understanding of the rendering process, can understand the impact of different selection of rendering content on performance, analysis of the performance impact process can accurately locate performance problems, at the same time, understand that the rendering process will generate many optimization means to fight against white screen, pit, flicker, stuttering… Performance and user experience issues.

The above mentioned content is relatively more static rendering content, but today’s software engineering is faced with complex, dynamic scenarios, such as: dynamic data loading and dynamic rendering, conditional rendering, animation… And so on. Therefore, JavaScript intervention will also lead to changes in the rendering content, thus affecting the rendering process. Below, we will introduce the related problems of JavaScript intervention.

JavaScript intervention

In principle, Blink exposes the DOM API to JavaScript calls. (In fact, there is also a part of the CSSOM API exposed for CSS intervention, which is not mentioned here, because most modern front-end development frameworks will directly inline this part of the intervention results into THE HTML. CreateElement an HTML tag append to Document.body. firstChild childNodes[1]. The DOM Tree changes, causing the entire rendering process to change:

This is how virtual-Tree technology can improve browser rendering performance: batch bindings are incorporated into DOM Tree changes to reduce the frequency and frequency of re-entry rendering processes.

To put it simply, from the Blink perspective, V8 is actually an outsider. The browser engine has decoupled V8’s interference in the DOM, which is limited to the HTML tag itself. However, JavaScript interference can lead to CHANGES in the DOM. It will also lead to changes in the subsequent rendering process. Therefore, sometimes changes in merging DOM Tree may lead to errors in rendering results. Without understanding the rendering process, some rendering problems using Virtual-Tree may be more difficult to locate and solve.

Secondly, in some conditional rendering or some routing logic of SPA applications, the selection and change of rendering content will also have a negative impact on the rendering process, which may exceed the 16.7ms limit and cause problems such as lag. Optimization of conditions and judgment logic is likely to alleviate some of the rendering performance issues (not to mention JavaScript), in a nutshell: JavaScript executes and returns as quickly as possible. The following will analyze the impact of computational complexity on rendering performance from the perspective of Passer, Layout and Compositor.

Computational perspective

In simple terms, the calculation perspective is to look at DOM, style, layout, comp.assign, paint (including prepaint), which is the calculation part of the rendering process, because the calculation time will directly affect the rendering performance. There is a concept of CRP in the process industry, so let’s start with CRP and look at the problems and approaches to rendering performance optimization from a computational perspective.

An overview of the CRP

Link after loading HTML CSS via network I/O or disk I/O (cache) : Decoding HTML, CSS files (compression before GZip text transfer, etc.), processing (HTML, CSS Passing), DOM Tree construction, Style inlining, layout, synthesizer, rendering, which involves a lot of parsing and calculation of the browser engine processing process, so that, There is a concept that needs to be introduced: Critical Rendering Path (CRP) (the following part is extracted from the content that Damo Teacher and I combed together)

  • First, as soon as the browser gets the response, it starts parsing it. When it encounters a dependency, it tries to download it
  • If it is a style file (CSS file), the browser must fully parse it before rendering the page (this is why CSS is rendering obstructing)
  • If it is a script file (JavaScript file), the browser must: ** stop parsing, download the script, and run it. ** Only after that, it can continue parsing because JavaScript scripts can change the content of the page (especially HTML). (This is why JavaScript blocks parsing)
  • Once all the parsing is done, the browser sets up the DOM tree and the CSSOM tree. Put them together and you get a render tree.
  • The penultimate step is to convert the render tree to the layout. This stage is also known as rearrangement
  • The last step is to draw. It involves literally coloring pixels based on data calculated by the browser in previous stages

When you put these steps into the rendering engine process, you can see more clearly that CRP goes through the following processes:

In a nutshell, the steps of CRP:

  • Process HTML tags and build DOM trees
  • Process CSS tags and build CSSOM trees
  • Merge the DOM tree and the CSSOM tree to make a render tree
  • Layout according to render tree
  • Draw the individual nodes to the screen

Note: When the DOM or CSSOM changes (JavaScript can manipulate them through the DOM and CSSOM apis to change the visual or content of the page), the browser needs to perform the above steps again. This is where the virtual-Tree rendering optimization described above came from

Optimize the Critical Rendering Path of a page for three things:

  • Reduce the number of critical resource requests: Reduce the use of blocking resources (CSS and JS). Note that not all resources are critical resources, especially CSS and JS (for example, CSS using media queries, asynchronous JS is not critical).
  • ** Reduce the size of critical resources: ** Use various methods, such as reducing, compressing, and caching critical resources. The smaller the amount of data, the less complexity of engine computation
  • Shorten critical render path length

The following general steps can be followed when specifically optimizing CRP:

  • Analyze and characterize CRP, and record the number, size, and length of critical resources
  • Minimize the number of critical resources: delete them, delay their download, mark them as asynchronous, and so on
  • Optimize key resource bytes to reduce download time (round-trip times) and CRP length
  • Optimize the loading order of the remaining critical resources, all critical resources need to be downloaded as early as possible to shorten the CRP length

Use Lighthouse or Navigatio Timing APIS to detect critical request chains

We need some tools to help us measure some important indicators of CRP, such as critical resource quantity, critical resource size, CRP length, etc. Using the Lighthouse plugin in Chrome, the Node version of Lighthouse:

/ / install lighthouse » NPM I - g lighthouse » lighthouse https://jhs.m.taobao.com/ - locale = useful - CN - preset = the desktop --disable-network-throttling=true --disable-storage-reset=true --view Copies codeCopy the code

A detailed report is available in which you can see the information related to the key request:

For more details, see the documentation on Lighthouse’s website

In addition to using the Lightouse instrumentation tool, you can also use the Navigation Timing API to capture and record the true CRP performance of any page.

We can also use the relevant apis in the performance monitoring specification to monitor the performance of real user scenarios on the page:

The results of CRP analysis can be optimized with appropriate tools or techniques.

CRP optimization strategy

Please wait for teacher Damo’s article to be published for the complete content. I only list relevant parts from the perspective of calculation.

Computational view of HTML

  • Writing an effective readable DOM:

    • Write in lower case, every tag should be in lower case, so please don’t use any uppercase letters in HTML tags

    • Close self-enclosed labels

    • Avoid excessive use of comments (it is recommended to use appropriate tools to clear comments)

    • Organize the DOM so that you only create elements that are absolutely necessary

  • Reduce the number of DOM elements (the number of tokens in HTMLDocumentParser processing of the Blink kernel is closely related to the number of DOM elements, and reducing the number of tokens can speed up the construction of DOM Tree, thus speeding up the speed of typesetting and rendering the first frame), Having too many DOM nodes on a page can slow down initial page load time, slow down rendering performance, and possibly lead to a lot of memory usage. So monitor the number of DOM elements that exist on your page to make sure your page doesn’t:

    • There are no more than 1500 DOM nodes

    • DOM nodes are not more than 32 levels nested

    • The parent node does not have more than 60 children

Reference: zhuanlan.zhihu.com/p/48524320… Blink Html parsing, by not wearing a plaid shirt program ape

CSS from the computing perspective

  • Length of CSS classes: The length of class names can slightly affect HTML and CSS files (debatable in some scenarios, see CSS selector performance for details)

  • Critical CSS: The CSS on a page is divided into Critical CSS and non-critical CSS. The Critical CSS is inline to the page in

  • Use media query: the rendering of the page will be blocked only if it matches the style of the media query. Of course, the download of all styles will not be blocked, but the priority will be lowered.

  • Avoid using @import to import CSS: The CSS imported by @import can only be discovered after the included CSS is downloaded and parsed, which increases the number of round trips in the critical path and increases the computing load of the browser engine.

  • Analyzing the complexity of stylesheets: Analyzing stylesheets helps you find problematic, redundant, and duplicate CSS selectors. By analyzing redundant and duplicate CSS selectors, you can remove these codes and speed up CSS file reading and loading. You can use TestMyCSS, Analysis-CSS, Project Wallace, and CSS Stats to help you analyze and correct CSS code

    / / if the node is the id attribute the if (element) hasID ()) collectMatchingRulesForList ( matchRequest.ruleSet->idRules(element.idForStyleResolution()), cascadeOrder, matchRequest); If (element.isStyledelement () &&element.hasClass ()) {for (size_t I = 0; i < element.classNames().size(); ++i) collectMatchingRulesForList( matchRequest.ruleSet->classRules(element.classNames()[i]), cascadeOrder, matchRequest); } // Pseudo-class processing… / / tag selector processing collectMatchingRulesForList (matchRequest ruleSet – > tagRules (element) localNameForSelectorMatching ()), cascadeOrder, matchRequest); // Wildcard… Copy the code

Through the code, you can intuitively feel the difference in computing overhead caused by different CSS selectors, thus providing guidance to optimize computing performance. Reference: nextfe.com/how-chrome-… Li Yincheng

Optimization calculation itself

The browser engine itself is software, and once you understand the rendering process, you actually understand the software details of rendering. Then, from the perspective of software engineering, there are abundant methods to optimize calculation. If the programming object is understood as algorithm + data structure, I believe everyone is familiar with the theory. I want to say that algorithm + data structure can be seen as time + space from the perspective of performance optimization. Therefore, a common performance optimization strategy can be introduced: time for space or space for time. When space pressure is high (that is, storage pressure is high), you can trade time for space, a typical case is: file compression. When time pressure is bigger (that is, to calculate pressure), can use a space to change time, typical cases are said to: buffer (cache), the use of the long process of computation intensive tasks divided into a series of subtasks, concurrently stored after parallel computing (buffer/cache), then by the GPU output on the Display.

Below, take a look at rendering performance from a computational perspective using Layout and Compositing examples.

Layout instance

In addition to the partial optimization ideas of HTML and CSS as described in CRP (DOM and style as shown below), the rendering pipeline includes:

The part in red box in the figure that takes time for CPU and GPU rendering pipeline is the optimization direction of rendering pipeline.

Core idea: reduce the calculation load of rendering pipeline

The different choices of HTML tags and CSS styles, as well as the layout methods we use in them, can inadvertently impose computational load on the layout engine. To avoid this kind of load is to make the layout engine calculate as little as possible. We can use the method of space for time mentioned above to avoid the estimation or calculation of the layout engine with deterministic values. Reference: www.chromium.org/developers/…

Note that this is not a specific rendering performance optimization approach, but rather an idea. Therefore, using this idea in a specific project requires additional work, including debugging ability, statistical analysis ability… , the most commonly used is to debug the Chromium kernel to find the root of the computational load. So if we open a page, for example, we put the breakpoint in blink, Source for third_party/blink/renderer/core/dom/document_init DocumentInit in cc: : Type DocumentInit: : ComputeDocumentType

Reference: zhuanlan.zhihu.com/p/260645423… Build, debug by Mark-0xg

“The performance goal of The Blink project is to be able to run web content at 60fps on a mobile phone, which means we have 16ms per frame to handle input, execute scripts, and execute the rendering pipeline for changes done by scripts through style recalculation, render tree building, layout, compositing, painting, and pushing changes to the graphics hardware. So style recalculation can only use a fraction of those 16ms. In order to reach that goal, the brute force solution is not good enough.

At the other end of the scale, you can minimize the number of elements having their style recalculated by storing the set of selectors that will change evaluation for each possible attribute and state change and recalculate computed style for each element that matches at least one of those selectors against the set of the descendants and the sibling forest.

At the time of writing, roughly 50% of the time used to calculate the computed style for an element is used to match selectors, and the other half of the time is used for constructing the RenderStyle (computed style representation) from the matched rules. Matching selectors to figure out exactly which elements need to have the style recalculated and then do a full match is probably too expensive too.

Space for time:

We landed on using what we call descendant invalidation sets to store meta-data about selectors and use those sets in a Process called style invalidation to decide which elements need to have their computed styles recalculated.”

Reference: docs.google.com/document/d/… Invalidation in Blink by

[email protected]

Compositing instance

The main idea: Find the problems that render engine designers strive to optimize and avoid.

To put it simply, when reading browser kernel design documents or blogs like Chromium, I often see some design schemes describing the problems encountered by the Blink kernel team as well as their ideas and ideas for solving the problems. Then, if we change our thinking to avoid understanding the causes of these rendering performance problems, By avoiding these situations in your own projects, you can optimize well.

Article: “the Multithreaded Rasterization, www.chromium.org/developers/…

Question:

Examples provided by Da Mo teacher:

background: url(//www.textures4photoshop.com/tex/thumbs/watercolor-paint-background-free-thumb32.jpg) repeat center center; background-size: 20%; Copy the codeCopy the code

In addition, it is to study some: www.chromium.org/developers/… Documents such as Compositor Thread Architecture to understand CPU, GPU, memory (space size, switch-in, switch-out, memory alignment… Memory problems such as those that slow down the calculation process can also cause computation load because it lengthens the calculation time. What are the circumstances or conditions that cause interthread switching? What problems trigger competition for resources? Start with these questions: Are we giving Compositor the right stuff? Use this reverse thinking approach to find computing bottlenecks and targeted optimization. At the same time, the root of these thoughts are very simple software engineering ability and programming ability, in the absence of awareness and understanding of these problems, may wish to supplement these basic software engineering ability and programming ability.

Consider, for example, books like UNIX Network Programming. (book.douban.com/subject/150…

conclusion

Achieving high quality rendering performance optimizations using the “take your life 3000” approach has obvious bottlenecks: you can’t do it better than anyone else. Only layers go deep from: Decoding HTML, CSS files (compression before GZip text transfer, etc.), processing (HTML, CSS Passing), DOM Tree construction, Style inlining, layout, synthesizer, drawing, and then to WebGL, Vulkan, Skia and other low-level programming interfaces, Finally, how your page is rendered from a hardware perspective. The more I know, the deeper I can find deeper and more valuable problems, combined with my constantly honed programming ability and software engineering ability, I can put forward some of my own solutions, thus: do better than others!