This is a translated version of the BlinkOn9 conference talk
Presentation video/PPT
At the BlinkOn9 conference, Google Blink team developer Philip Rogers and Stefan Zager shared Blink Rendering – Rebuilding the Engine Mid-flight, The purpose is to introduce the basic principles of Blink rendering and the recent improvements made by the development team to scroll performance, rendering composition and typography.
Part one: What is Rendering?
Simply put, rendering is something fundamental to the browser, which parses your HTML and CSS into a DOM tree and turns it into pixels on the screen.
The main stages of the Document life cycle are shown in the figure. The four black boxes in the middle are render pipelines.
I’ve always found that studying Chrome’s trackers helps to understand the Document life cycle. So here is a Chrome tracker panel for a renderer process, with the main render thread highlighted and a small part of the compositor thread at the bottom. At the beginning of rendering, we might deal with resource loading, running JavaScript, modifying the DOM tree, etc., with an idle phase in between for general tasks.
Next, VSync (Vertical Synchronization) occurs. Vsync is when the browser just pushes a full pixel window onto the monitor and starts generating the next pixel window. So for the rendering process, this means that everyone is ready to generate new pixels.
Vsync triggers BeginMainFrame, which is an important method that drives the rendering pipeline. BeginMainFrame first handles input events such as scroll, touch, gesture, mouse, etc., and then runs the requestAnimationFrame callback.
The next step is to start executing the rendering pipeline, which consists of four steps as shown below:
-
Style: Convert the DOM tree to a Layout tree, traverse the Layout tree to annotate the style information for each node, and then pass the Layout tree with the style information to the next stage
-
Layout: We will iterate through the Layout tree again to annotate the size and position of the nodes, so far we have annotated the Layout tree twice and passed it to the composition phase
-
Composition Setup: In the composition setup phase we determine how many compositing layers we need to draw, their size, location, layering order, etc
-
Paint: The Paint phase takes the annotations of the Layout tree and the information recorded in the compositing setup phase, and then creates a “display list” of raw drawing commands that instruct the synthesizer how to draw pixels.
At the end of the drawing phase, the main thread is switched to the composite thread (the green area in the tracker below), and the rasterization work is cut into several “tiles” and assigned to several worker threads. Once rasterization is complete, we’ll go into Chrome synthesizer. The process repeats itself over and over.
So that’s a brief introduction to rendering, but it’s worth noting that the main thread is very busy, all the action happens on the main thread, the script runs on the main thread, takes care of rendering and a lot of other functions, so the main thread is very crowded. After years of optimization, we found that a very effective optimization method is to split the main thread’s work and hand it over to other threads.
Part TWO: the importance of rendering and current problems
Rendering is very important for Web platforms.
One is that the nature of a dynamic web page is to take input generated by a user or script and turn it into a visual result. Rendering is central to the process, so no matter how cool your page is, if the rendering goes wrong, the user won’t have a good experience.
Second, rendering is a major determinant of web performance (perceived and actual), rendering cannot be interrupted, and if JavaScript is running for too long the page will become bulky, which will of course attract users’ attention.
Third, modern web pages are dynamic — content is constantly being modified, loaded and animated. To keep up and keep interactions flowing, rendering code must be first-class citizens.
Let’s start with the challenges we encountered in our rendering code and the improvements we’re working on to address them.
Scroll 1.
As mentioned above, rendering is the main determinant of web performance, and scrolling is the most important. Users are very sensitive to the scrolling experience, and the scrolling experience determines their perception of the overall performance of the page. If the scrolling experience is bad, no amount of cool can save the page. Blink’s scrolling code is cleverly hidden everywhere, across the main and composite threads in the renderer, and even the browser process.
Back in history, document scrolling was first given in the original version of KHTML in 1998. Later, in 2003, Divs could be scrolled in WebKit, but both needed to trigger the rendering pipeline again. At first, the code for the two types of scrolling was written separately, which was no big deal.
A few years later, however, as many features and optimizations were added to scrolling, the code for scrolling became one of the most complex and difficult parts of Blink. We still maintain the two sets of scrolling code, with everything written twice. Not only that, but because scrolling is core code, it becomes increasingly complex and difficult to maintain because it has to be modified to implement other functions.
Due to the current state of scrolling code and the fact that every feature change had to be written twice, it was difficult for all of us as developers, so in 2014 Steve Kobus and Elliott came up with a brilliant idea: Solve the problem by Root Layer Scrolling.
Their decision to do away with document document-level scrolling and use Overflow for all scrolling was primarily to reduce code complexity and improve code quality. There are other benefits, such as the fact that the two sets of code behave differently because they have been maintained separately for a long time. In fact, document-level scrolling behavior is significantly different because document-level scrolling and div scrolling have completely unrelated bugs. One scroll has a Bug and the other may not. What a mess.
The implementation of root scrolling was also a long and arduous process that took 4 years to complete and was delivered in the M66 version.
The first thing you need to do to make major changes to the layout of your rendering code is to pass about 45,000 layout tests. The number of test failures in the diagram above starts at 1,500. In fact, when we first started making changes, about 6,000 tests failed. These tests need to be sorted and solved individually, so in the process we have solved a lot of historical bugs.
In our performance benchmark graph, there was a significant degradation in performance when we first started, about 40 to 50 percent. As we looked into the performance bugs, we found that these were code that was deeply recursive to the CPU path. So we had to do cpu-related optimizations and code changes in Chrome Chromium. It was a very difficult process, and it took a lot of different code fixes to really get us back to baseline performance.
So I have to reiterate that this piece of code is really hard to deal with, and if we make any errors, the user will notice them immediately and they will affect all pages.
Let’s take a look at the improvements we’ve made to draw and compose.
2. Drawing and composition
Like the scrolling code, the drawing and composition parts of the code are pretty old, about 16 years old, and it’s not easy to develop new features in the current code architecture. Now there is an opportunity to optimize the performance of this part of the code to reduce memory footprint and make it easy to expand and develop new features. So we started a comprehensive engineering project: drawing code thin.
It’s worth starting with a technical overview of what drawing is, why it’s so cool, and where we fit into the overall project. So let’s start with how scrolling works from the previous article.
In the past, if we wanted to do div scrolling, we would have to redraw each frame. This means that if the user keeps dragging the wheel, we need to generate all the pixels, and the user needs to wait until we run the entire rendering pipeline before moving on.
One amazing innovation is called Composited scrolling, which has two parts. One is composited, much like a video game inspiration, where the idea is to draw the entire scrollable area into a graphics buffer and then instead of redrawing the moving area every frame, Instead, a child texture is copied into different textures. The second innovation is to take scrolling out of the main thread. Remember how precious the main thread is, the basic idea here is that you can scroll while JavaScript is running. The combination of these two things is an amazing innovation, and this idea of synthetic thread rendering can be generalized to any texture modification that needs to be done.
For example, transform, opacity, filter, clip, etc., can be implemented by synthesizing the threading idea. When you’re running on software, drawing pixels on a CPU is fast, but when you’re running on a GPU, it’s lightning fast.
But there is a problem called lair explosion. As shown below, if we rotate the green box using a composite thread, it will run through the blue box. The problem is that we need to make sure that the blue box will be drawn on top of the green box, so the blue box will also be composited. This can take up quite a bit of memory. You’re a front end engineer, you put transparency on the page, and suddenly you see a memory explosion because the rest of the page is composited as well.
The following to introduce the current synthesizer architecture system to describe how the synthesizer is working, drawing code thin and what kind of effect.
We have a simple DOM tree structure, and divs with emoji smiley faces are scrollable. Its life cycle is the same as described above, so we will annotate the size and position of the layout tree in the layout section, and then the composition section, which we will focus on.
A, B, and D are not scrollable, so all three can be drawn together into the same graphics buffer. Emoji smiley faces are scrollable, and we don’t want to redraw each frame for scrolling, so we put it in a separate graphic buffer. Now that we have two graphics buffers, it’s time to draw.
During the drawing process, we are actually walking through the layout tree, recording the drawing commands. And then rasterization.
At this point we will execute the drawing command recorded in the drawing step to generate the real pixels.
Eventually we’ll put them together on the page so scrolling up and down emojis won’t trigger a redraw step.
Under the current architecture, there are two problems. One is that composition is limited to specific subtrees. The Layout tree has a property that determines whether we can compose. Not all subtrees have this property, so we can’t arbitrarily convert divs on a page into graphic buffers, which leads to a fundamental composition Bug that was first discovered in 2014.
When we tried to make the iframe composite anywhere to improve scrolling, the content on the page disappeared instantly, because if you make a composite iframe, you also need to make sure that anything drawn on top of it is also composite. This is a devastating errors found in 2014, because you have already established the special logic to don’t create too many graphics buffer deal with such things, the results found in the later stages of the game a basic defects, which bound the one who holds your hand and this is not your hand tied to an edge case, This one possible scenario (Gmail had this problem when rolling optimizations didn’t work) prevented us from continuing to build in the current architecture.
The second problem with our current composition architecture is that the composition setup is done before drawing. We created the image buffer very early in the system, and you need to recalculate it during the drawing step, so we have repetitive logic, and it’s hard to describe how complex that logic is, but I can say that about half of the drawing code is for this size and effect, like clip.
In addition to this composition setting before drawing, there is a problem because it is on the main thread, which means that any effects that might change the size of the drawn object need to go back to the main thread. For example, if you have two composable boxes and one of them is scrollable, then in many cases you have to assume the worst. You have to assume that the synthesizer can work anywhere on the page, so you have to create image buffers for a lot of things on the page, which is the nest explosion problem we discussed earlier that causes real performance problems.
The Drawing code Lite project changed both of these issues in our overall architecture. It changes the granularity of how we choose to compose things so that you can compose, convert any effects into image buffers, and secondly we move the composition Settings after drawing. This not only solves fundamental composition bugs, but also avoids logic duplication.
As a result, the new composition architecture can be composed at any boundary, and we have moved the composition Settings application to relieve the stress on the main thread. This allows us to make precise composition decisions about things that overlap, and we can do things like change the size of objects drawn outside the main thread.
As a milestone in this project, we have completed the drawing cache feature, currently in M67, and have just released V1.75, a slimmer drawing code. At the end of this year (2018), we will release version V2, which moves composition Settings to after drawing.
3. Layout and layout
There are two main problems with layout. The first is The Web platform Problem, which we call The Combinatorial Problem. We have a large number of Web standards, and we are constantly adding new ones, while the old ones still exist, and every time we define a new CSS standard, it creates a new set of interactions with all the existing CSS standards. The way they combine is a bit strange, and with that comes a lot of boundary cases. Let’s take flexbox as an example:
Very simple three Flex Item boxes, let’s add a few properties and see what happens to the layout.
Set direction: RTL changes the layout direction from right to left.
To add flex-direction: row-reverse, the layout is reverted to left-to-right.
Remove the direction attribute and arrange it from right to left.
Flex-direction is set to columb-reverse, and the layout is arranged in columns.
Setting writing-mode and flex-direction to line layout changes the direction of the text.
Flex-direction is reversed, still compound the expectation.
Flex-direction is changed to a column, same thing. Suffice it to say that the above compound expectation is due to the fact that I spent three weeks working on various bugs.
This is not necessarily the case in other browsers with other cores, as shown above. The first graph shows the flexbox example above in Chromium, and the second browser in the first row is almost the same, while the third and fourth are far from the same.
I don’t want to diss other browsers, but chromium is probably the worst one. I want to emphasize that compatibility issues do exist and complex CSS features continue to pile up.
The second problem is that the layout-related code in Blink is very ancient, full of unwrapped, non-reentrant, non-thread-safe spaghetti monolith code.
Here we have a Layout tree. The nodes are Layout objects. Suppose we change the CSS on one of the elements below the tree. The element is now dirty and needs to be forwarded. The next thing we want to do is mark the entire ancestor chain. When we want to perform the Layout phase, we always start at the top of the tree and work our way down. Now we have a bunch of optimizations, but we don’t skip many steps.
We’re still going to do a full tree traverse, which is also expensive, and we’re going to do it every time we execute Layout. The bottom node might be in a fixed size box, and it can even use CSS containment, which is a new feature, sort of like a browser contract, meaning that the subtree doesn’t affect anything outside of itself, and nothing outside of the subtree can affect it.
It would be nice if we had all the information we needed when we laid out this subtree, without having to look for any additional information outside the subtree to determine the size and location. In fact, however, we’ve been running layout code to get other information.
At this node in the graph, can we jump to another part of the tree for some reason? No, this is a destructive operation.
As for thread safety, remember when we first learned about the rendering pipeline? We walk through the Layout tree, annotate it, and pass it to the draw phase. When we are done and ready to generate the next frame of content, we start with the layout tree we used last time and update it with the content that has changed. Here nothing is thread-safe, and multiple threads can modify it.
There are two solutions to these two problems. The solution to the composition problem is a CSS custom layout or Houdini, which means you can set specific CSS properties on an element and then define a JavaScript function that is responsible for laying out that element and its subtrees. During normal layout, we pause and call a JavaScript function, passing it the information needed for a set of layout elements, which the function consumes. I won’t go into too much detail about the Houdini, but you can do your own research.
The solution to the second problem is Layout NG, which is essentially a complete rethink of how to complete a Layout. Layout NG has two features. First, it uses a constraint-driven Layout. We input a subtree for Layout, and we pass it all the information it needs to Layout in the subtree, and it doesn’t even look outside the subtree. This isn’t easy either, and by forcing encapsulation in, we make it easier for the underlying layout code to implement the CSS custom layout mentioned earlier. We create a new layout tree each time. Once we create it, the tree is immutable. Instead of commenting on the input tree, we copy it and change the subtree with new replacement subtrees. We will have a brand new copy of the layout tree.
The implementation of these two features will enable a variety of powerful optimizations in layout. The project is in its early stages, with phase one expected to be released late this year or early next year.