(sequel)

Now, how do we make it faster? This is what unoptimized style calculation looks like.

Browsers do a lot of things in style calculation. This process is not just sent when the page first loads. This process repeats over and over as the user interacts with the page, either hovering over an element or changing the DOM structure triggers style changes

This means that CSS style calculation is an important option for optimization. Browsers have been experimenting with various optimization strategies for the past 20 years. Quantum CSS combines various strategies from different engines to create a super fast new engine. So let’s take a look at how they work together. The Servo project, where everything runs in parallel (hence the origin of Quantum CSS), was to have an experimental browser render all the different parts of the page in parallel. What does that mean? Computers are like human brains. There is a section dedicated to thinking, the Arithmetic Logic Unit (ALU). Near this part, there are some that store short-term memory, called registers. Together they make up the CPU. And then there’s some for long-term memory, or RAM.

Early computers using such cpus could only handle one thing at a time. But over the last decade, cpus have evolved to have cores made up of multiple ALUs and registers. This means that the CPU can process multiple things in parallel at once.

Quantum CSS takes advantage of these latest features in today’s computers to assign style calculations for different DOM nodes to different cores. This may seem like a very simple thing to do, just separate branches of a tree and work on different cores. For some reason, it turned out to be a lot harder than I thought. One is that DOM trees are often lopsided. This means that one core may have a lot more work to do than another.

To distribute the work more evenly, The Quantum CSS uses a technique called work stealing. When working with a DOM node, the code takes its immediate child elements and divides them into one or more “units of work.” The units of work are then put into a queue.

Once one of the cores completes the tasks in its current queue, it looks for new tasks in other queues. This means that we can distribute tasks evenly without having to traverse the entire tree in advance to calculate their average tasks.

In most browsers, it is difficult to guarantee that this method is correct. Parallelism is a well-known problem, and CSS engines are complex. It happens to be in between two other very complex parts of the rendering engine, the DOM and the layout. So it’s bug-prone, and because of parallelism, a bug called data race is hard to track. I’ll cover more of these bugs in another article. If your program receives the hard work of hundreds of engineers, how do you make your program run in parallel? That’s what Rust is all about.

With Rust, you can statically validate to ensure there is no data contention. This means that you can avoid hard-to-debug bugs by preventing them from being written into your code in advance. The compiler won’t let you do that. I’ll be writing more about this in the future. At the same time, You can watch this video [intro video about parallelism in Rust] (https://www.youtube.com/watch?v=cNeIOt8ZdAY) or the video [more in – the depth Talk about the work stealing] (https://www.youtube.com/watch?v=gof_OEv71Aw). With this, CSS style computing becomes what’s called an awkward parallelism problem — there’s very little that stops you from running more efficiently in parallel. That means we can get a near-linear speed increase. If you had four cores on your computer, it would run nearly four times as fast. For each DOM node, the CSS engine is required to iterate through all the rules to achieve selector matching. For most nodes, this match does not change very often for the most part. For example, when the user hovers over a parent element, the matching rules might change. But we still need to recalculate the style for all the descendant elements to handle attribute inheritance, whereas the descendant elements of the matching rule most likely won’t change at all. It would be nice if we could record the descendants of these matched elements, so we don’t have to do selector matches for them anymore. This is known as a rule tree — borrowed from Firefox’s previous CSS engine, Does. The CSS engine uses this process to figure out which selectors to match and categorize them by specificity. In this way, a linked list of rules is created. The list will be added to the tree.

The CSS engine tries to keep the tree with the fewest branches. To do this, it tries to reuse branches as much as possible. If most of the selectors in the list are the same as existing branches, it follows the same path. But it may encounter situations where the next rule in the list is not in the current branch of the tree, and only then will it add a new branch.

The DOM node gets a pointer to the last rule that was inserted (in this case, the ‘div#warning’ rule). This is the most this is the most special place. For style resets, the engine does a quick check to see if a change on the parent element could potentially change the rules for matching on the child element. If not, then for any descendant element, the engine can retrieve that rule through a pointer on the descendant element. From there, it can follow the tree back to the root node to get a complete list of rule matches, from the most specific to the least specific. This means it can skip selector matching and sorting entirely.

This can greatly reduce the effort during style resets. But it still takes a lot of work to initialize the style. If you have 10,000 nodes, you still need to do 10,000 selector matches. But there are other ways to speed up the process. Speed up initial rendering (and cascading) with style cache sharing Imagine a page with thousands of nodes, many of which match the same rules. For example, a very long Wikipedia page. Paragraphs in the main content area all end up matching the same rules and having the same computed style. Without optimization, the CSS engine would have to do selector matching and style calculations for each individual paragraph. But if there is a way to prove that this style is the same from paragraph to paragraph, then the engine can do just one operation and point each paragraph node to the same style of calculation. This is called style cache sharing – inspired by Safari and Chrome – does. When the engine finishes processing a node, the calculation style is put into the cache. Then, before the engine starts calculating the style of the next node, it runs some checks to see if there are any caches available. These checks are:

  • Do the two nodes have the same ID, class name, or something else? If so, they match to the same rule.

  • For all those non-select-based — inline styles, the engine checks for example, do the nodes have the same value? If so, then the previous rule is either not overridden or overridden in the same way.

  • Does the parent element of the node point to the same computed style object? If so, their inherited values will be the same.

From the beginning, these checks were in the early style shared cache. But there may still be many individual cases where the styles don’t necessarily match. For example, if the CSS rule uses the ‘:first-child’ selector, then the two paragraphs do not necessarily match. Even though these checks suggest they match. In WebKit and Blink, these situations would forgo the use of style shared caching. As more sites use these modern selectors, this optimization strategy is becoming less and less useful, so the Blink team has recently removed this feature. It turns out there’s another way to keep the style shared cache up to date with these changes. In Quantum CSS, we gather all these weird selectors together and check if they are used in DOM nodes. Then we store the results as 1 and 0. If two elements have the same 1 and 0, then we know they match.

If a DOM node can share a calculated style, you can skip a lot of tasks. Because pages often have many nodes of the same style, style sharing caches can save memory and really speed things up.

Conclusion This is a major technology migration from Servo Tech to Firefox. Along the way, we learned how to bring modern, high-performance code written in RUST to the core of Firefox. We’re very excited to be able to bring this huge project to users first hand. We’re happy to let you try it out, and let us know if you find any questions.

Quantum CSS (Stylo) : Quantum CSS (Stylo) Lin Clark (http://code-cartoons.com) Lin is an engineer on the Mozilla Developer Relations team. She focuses on JavaScript, WebAssembly, Rust, and Servo, and also draws comics about coding. * [@ linclark] (http://twitter.com/linclark) [More articles by Lin Clark…]. (https://hacks.mozilla.org/author/lclarkmozilla-com/) > the article reprinted from: [the] into translation (http://www.zcfy.cc) > the translator: [Mactavish] (http://www.zcfy.cc/@Mactaivsh) > review: [huangxiaolu] (http://www.zcfy.cc/@huangxiaolu) > link: [http://www.zcfy.cc/article/4041] > the original: [https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo]


firefox

Mozilla Firefox

firefox.com.cn


Long press to identify the QR code

Download Firefox Mobile


Read more soon!