First of all, just to review what we did in the last article, we first introduced the concept of content, we knew that the final result we wanted was the pixels on the screen, we knew what the browser was aiming for [HTML/CSS/JS transform to the correct OpengL call to adjust the pixel Style], we knew what Style was, Now that you know the Layout, you know the Panit, and all the drawing operations are recorded in the display Items list, it’s time to start reading this article.

Material: docs.google.com/presentatio…

Read the premise to

Because browsers render mechanism of two PPT as the starting point of the article is based on the previous data, so the order of the articles didn’t write in complete rendering order, experience should be more general, read articles there may be more or less, but I will in the subsequent revisions, and tease out an article on the clear and easy to understand.

Raster rasterizer

Drawing in the list of display items is performed by a process called rasterization. Rasterization converts a display item list into a bitmap of color values. Each cell of the generated bitmap holds the encoding of the bitmap’s color values and transparency (see figure FFFFFFF, which is actually a hexadecimal representation of RGBA).

The rasterization process also decodes the image resources embedded in the page. The drawing operation refers to the compressed data (JPEG, PNG, etc.), while rasterization calls the appropriate decoder to decompress it appropriately.

  • In the past, the GPU was just used as memory, which was referenced by OpenGL texture objects. We would put rasterized bitmaps into main memory and upload them to the GPU to share the memory pressure.
  • Today’s Gpus can also perform the command to generate a bitmap (” accelerate rasterization “), which is hardware acceleration, but whether it’s hardware rasterization or software rasterization, the essence is to generate some kind of bitmap of the pixels in memory.

Now only the bitmap is stored in memory, and the pixels are not displayed on the screen. GPU rasterization is not called directly from the GPU, but via Skia graphics library (a 2D graphics library maintained by Google, available on Android, Flutter, Chromium).

Skia provides some level of abstraction to understand more complex things like Bessel curves. Skia is open source and is loaded into Chrome binaries rather than being in a separate code base. When a rasterized display item is needed, it first calls the method on SkCanvas, which is the entry point for Skia. SkCanvas provides more abstraction inside Skia. When the hardware is accelerated, it builds another drawing buffer and then refreshes it. At the end of the rasterization task, we obtain the true GL instruction by flush, which runs in the GPU process.

Skia and GL directives can run in different processes or in the same process, resulting in two ways of calling.

  1. In Process Raster
  2. Out of Process Raster

1.In Process Raster

The old version adopted this way, Skia was executed in the rendering process, it would produce the GL call instruction, GPU has a separate GPU process, Skia could not directly call the rendering system in this mode. When Skia is initialized, it is given a list of function Pointers (which point to the GL API, but not the actual OpenGL API, but the Chromium proxy). GpuChannelMsg_FlushCommandBuffers below is a command buffer that converts the function pointer table to the true OpenGl API.

Separate GPU processes help isolate GL operations and improve stability and security. This mode also becomes a sandbox mechanism (that is, unsafe operations are executed in separate processes).

(Figure GLES2 back-end mapping to desktop OpenGL 2.1)

2.Out of Process Raster

The new version puts all the drawing operations in the GPU process. Running Skia in the GPU process can improve the performance.

Rasterized drawing operations are wrapped in the GPU’s command buffer and sent to the IPC channel (interprocess communication mode).

The next step is to execute the GL instructions, which are usually provided by the underlying SO library and are translated into DirectX (Microsoft’s Graphics API for graphics acceleration) on Windows.

The problem

Now we’ve gone from content to pixels in memory, but rendering a page is not a static process (page scrolling, JS script execution, animation, etc.). , rerunking the entire pipeline in the event of a change is expensive.

So how can we improve performance??

Compositing Update

After the Layout operation is completed, it is supposed to Paint. However, it is very expensive for us to Paint directly, so we introduce the concept of layer composition acceleration.

So what is layer acceleration?

Layer composition acceleration is to divide the entire page into multiple layers according to certain rules. During rendering, only the necessary layers are required to operate, and other layers are required to participate in the composition. In this way, rendering efficiency is improved. The Compositor Thread also has the ability to handle event input, such as scrolling events, but if the event is registered and listened for in JS, it will forward the input event to the main Thread.

The main thread splits the page into layers that can be rasterized independently and merges these layers in another thread (the synthesizer thread).

This allows certain renderlayers to have their own caches, called Compositing layers, for which the kernel creates GraphicsLayer.

  • The RenderLayer that has its own GraphicsLayer will be drawn in its own cache when it is drawn
  • RenderLayer without its own GraphicsLayer will look up the parent GraphicsLayer until RootRenderLayer(which always has its own GraphicsLayer), It is then drawn in the cache of the GraphicsLayer’s parent node.

This forms the GraphicsLayer Tree corresponding to the RenderLayer Tree. When the content of the Layer changes, you only need to update the corresponding GraphicsLayer, while in a single cache architecture, the entire Layer will be updated, which is time-consuming. This improves the efficiency of rendering. However, too many GraphicsLayer will also bring memory consumption. Although unnecessary drawing is reduced, the overall rendering performance may be reduced due to memory problems. Therefore, accelerated layer composition is a dynamic balance.

Layering decisions are currently taken by Blink, generating a layer tree from the DOM tree and recording the contents of each layer with a DisplayList.

With the Compositing acceleration in mind, let’s look at the Compositing Update that occurs after the Layout operation. The Compositing update is the process of creating a GraphicsLayer for a specific RenderLayer, as follows:

Prepaint

Tree attributes: in the hierarchical structure of the description attribute this piece, before the way is to use layer tree way, if the parent layer with matrix transformation (translation, scaling, or perspective), cutting or special effects (filter, etc.), need recursive applied to child nodes, time complexity is O (layers), which could be a performance problem in extreme circumstances.

Therefore, in order to improve the performance, the concept of attribute tree is introduced, and the synthesizer provides transform tree, clipping tree, effect tree and so on. Each layer has a number of node ids, corresponding to the matrix transformation node, clipping node, and effects node of different attribute trees. This time complexity is O(node to be changed), as follows:

The process of prepaint is essentially the process of building a property tree.

commit

As mentioned earlier, we had two more things to do before the paint phase. We first split the page into many layers and built the property tree. What does the Paint phase do? The Paint phase places the drawing operation inside the Display item.

Next comes the COMMIT phase, which updates the copy of the layer and property tree to the synthesizer thread.

Tiling

When the synthesizer thread receives the data, it does not immediately start composing it. Instead, it blocks the layers. This involves a technique called “block rendering”.

Why block rendering?

  • GPU synthesis is usually implemented using OpenGL ES maps, where the cache is actually the GL Texture. Many Gpus have limits on the size of the Texture, for example, the width and length must be a power of 2, the maximum cannot exceed 2048 or 4096, etc. Caches of arbitrary size are not supported.
  • Block caching makes it easier for browsers to use a unified buffer pool to manage caches. The buffer pool cache is shared by all Webviews. When a page is opened, the buffer pool cache is requested. When a page is closed, the cache is reclaimed.

Tiling blocks are also the basic unit of rasterization, which prioritizes rasterization according to their distance from the visible viewport. Closer ones get rasterized first, and farther ones get downgraded to rasterized priority. These blocks are put together to form a layer like this:

Activate

After Commit, there is an Activate operation before Draw. Both Raster and Draw occur on the Layer Tree in the synthesizer thread, but we know that the Raster operation is asynchronous, and there is a chance that the Draw operation will be performed before the Raster operation has completed. This problem needs to be resolved.

It divides the LayerTree into

  • PendingTree: Receives the commit and Raster the Layer
  • ActiveTree: Draws the rasterized Layer from here.

The layer tree of the main thread is owned by the LayerTreeHost, and each layer has its child layers in a recursive manner. Pending trees, Active trees, and Recycle trees are all instances owned by LayerTreeHostImpl. These trees are defined under the CC/Trees directory. They are called trees because in the early days they were implemented based on a tree structure, and today they are implemented as lists.

Draw

Ok, now to the Draw step, after each block has been rasterized, the synthesizer thread will generate Draw Quads for each block. These Draw Quads will be encapsulated in the CompositorFrame object, which is also a product of the Render Process and will be submitted to the Gpu Process. The 60fps output frame rate is actually a CompositorFrame.

Draw is the process of converting rasterized blocks into Draw Quads.

Display Compositor

Display

Once the Draw operation is complete, the CompositorFrame is generated, which is output to the Gpu process, which receives the CompositorFrame from multiple sources of the Render process.

Multiple sources:

  • The Browser Process also has its own Compositor to generate a Compositor Frame, which is used to draw the Browser UI (navigation bars, Windows, etc.)
  • Each time a TAB is created or an iframe is used, a separate Render Process is created.

Display Compositor runs on the Viz Compositor Thread. Viz calls OpenGL to render the Draw Quads in the Compositor Frame, which outputs pixels to the screen.

VIz is also double-buffered, drawing The Draw Quads in the background buffer and then executing the swap command to get them to the screen.

Double buffering mechanism:

During the rendering process, if only one buffer is read and written, the screen will have to wait to read and the GPU will have to wait to write, resulting in poor performance. A natural idea is to separate reading and writing into:

  • Front Buffer: The screen reads frame data from the Front Buffer for output display.
  • Back Buffer: The GPU is responsible for writing frame data to the Back Buffer.

The two buffer will not direct data copy (performance), but in the background buffer write complete, front desk buffer read complete, direct exchange of pointer, front desk becomes the background, the background at the front desk, so what time exchange, if the background buffer is ready to, the screen has not been processed at the front desk buffer, so there will be a problem, Obviously you have to wait for the screen to finish. After the screen is processed (the screen is scanned), the device needs to go back to the first line to start a new refresh. There is a Vertical Blank Interval between this time and the time to interact. This operation is also known as VSync.

Vsync will also be covered later.

viz

What does Viz do?

In Chromium, the core logic of Viz runs in the GPU process and is responsible for receiving other processes (rendering process?). The resulting Viz ::CompositorFrame (CF for short) is then composited and rendered on the window.

What is CF?

A CF object represents a frame in a rectangular display area. Three types of data are stored internally, namely, CompositorFrameMetadata, TransferableResoruce and RenderPass/DrawQuad, as shown in the figure below:

  • CompositorFrameMetadata: CF metadata such as zoom level, scroll area…
  • TransferableResoruce: indicates the resource referenced by CF.
  • RenderPass/DrawQuad: Draw operations included in CF.viz::RenderPassBy a series of relatedviz::DrawQuadComposition.

CF is the core data structure in VIZ. It represents a frame of the UI in a certain area and uses DrawQuad to store what the UI will display. It represents the viz runtime data flow.

CF composite

CF synthesis refers to the viZ thread combining the contents of CF or multiple CF together to form a complete picture.

CF rendering

The rendering of CF is mainly done by Viz ::DirectRenderer and Viz ::OutputSurface, which render the composite to the target selected by the program.

CC

Now we are combing the functions and workflow of CC in a unified way.

Let’s take a look at the picture above. Blink has DOM, Style, Layout, comp.assign, prepaint, paint.

We can see that Paint is the bridge between Blink and CC.

The overall process can actually be understood as:

Cc: layer ->commit->(Tiling->)Raster->Active->draw(submit)-> submit

In the paint phase, CC performs a series of operations and finally submits the result (CF) to Viz in the Draw phase. In other words, Blink is responsible for drawing the web content, and CC is responsible for synthesizing the results and presenting them to Viz.

Cc architecture design

The design of CC is relatively simple. We can understand it as an asynchronous pipeline of multi-threaded scheduling. The CC running in the Browser process is responsible for composing the UI of the non-web part of the Browser, and the CC running in the Renderer process is responsible for composing the web page.

In chromium.googlesource.com/chromium/sr… The official website of How CC Works has such a picture, which I think can well reflect the core logic of CC.

Cc is multi-threaded in different phases. Paint is run in the Main thread, Commit, Activate, Submit in the Compositor thread, and Raster in the dedicated Raster thread.

Next, we begin to analyze the various stages of the CC pipeline.

Paint

The Paint phase produces the CC data source (CC: Layer tree). A CC: Layer will represent the UI of a rectangular area. It has many subclasses that store different types of UI data:

  • Cc: : PictureLayer. UI components that implement self-drawing, such as buttons, labels, and so on, can be implemented using it. It allows external implementations to passcc::ContentLayerClientThe interface provides acc::DisplayItemListObject that represents a list of drawing operations, such as drawing lines, rectangles, circles, etc. throughcc::PaintCanvasThe interface makes it easy to create complex self-drawn UIs.cc::PictureLayerOr onlyoneNeed a Rastercc::Layer. It passes through the CC pipeline and is converted to one or moreviz::TileDrawQuadStored in theviz::CompositorFrameIn the.
  • cc::TextureLayerThat corresponds to vizviz::TextureDrawQuadAll UI components that want to Raster with their own logic can use this Layer, such as Flash plug-ins, WebGL, etc.
  • cc::SurfaceLayerThat corresponds to vizviz::SurfaceDrawQuadFor embedding other CompositorFrames. Iframe and video player in Blink can be implemented using this Layer.
  • cc::SolidColorLayerUI components used to display solid colors.
  • cc::VideoLayerFormerly used exclusively to display video, SurfaceLayer has been replaced.
  • cc::UIResourceLayer/cc::NinePatchLayerSimilar to TextureLayer, used for software rendering.

Blink uses the above cc::Layer to describe the UI and connect to CC. Since cc::Layer itself can hold Child CC ::Layer, given a Layer object, it actually represents a CC ::Layer tree, which is the main thread Layer tree because it runs inside the main thread, And the main thread has one and only one CC ::Layer tree.

Commit

The core purpose of the Commit phase is to Commit data stored in CC ::Layer to CC ::LayerImpl. The CC ::LayerImpl corresponds to the CC ::Layer, but runs in a Compositor thread (also called an Impl thread). Tiles tasks are created as needed after the Commit, and these tasks are posted to the Raster thread for execution.

Tiling+Raster

The Tiles task (CC ::RasterTaskImpl) created in the Commit phase is executed during this phase. The most important function of the Tiling stage is to split a CC ::PictureLayerImpl into multiple CC ::TileTask tasks with different scales and sizes. The Raster phase executes each TileTask, rendering Playback from the DisplayItemList to the VIZ resource. Because Raster is time-consuming and a performance sensitive path for rendering, Chromium has implemented a variety of strategies here to adapt to different situations. These strategies are optimized in two areas: where Raster results (i.e., resources) are stored, and how Raster Playback is performed. These solutions are encapsulated in a subclass of CC ::RasterBufferProvider and are described below:

  • cc::GpuRasterBufferProviderUse GPU to Raster, and the results of Raster are stored directly in SharedImage. (Previously and the hardware acceleration mentioned)
  • cc::OneCopyRasterBufferProviderUsing Skia to Raster, the results are saved to GpuMemoryBuffer, and then the data in GpuMemoryBuffer is copied to the SharedImage of the resource via CopySubTexture. GpuMemeoryBuffer has different implementations on different platforms, and not all platforms are supported. On Linux platform, the underlying implementation is Native Pixmap (from the concept in X11), and on Windows platform, the underlying implementation is DXGI. The underlying implementation is AndroidHardwareBuffer on Android and IOSurface on Mac.
  • cc::ZeroCopyRasterBufferProviderUse Skia to Raster and save the results to GpuMemoryBuffer. Then use GpuMemoryBuffer to create SharedImage directly.
  • cc::BitmapRasterBufferProviderRaster using Skia and save the results to shared memory.

Raster will eventually generate a resource that is recorded in CC: PictureLayerImpl, and they will be placed in CF during the Draw phase.

Activate

There are three CC ::LayerImpl trees on the Impl side: Pending,Active,Recycle trees. The target of the Commit phase is the Pending tree, and Raster’s results are stored in the Pending tree.

During the Activate phase, all cc:: LayerImpLs in the Pending tree will be copied to the Active tree. In order to avoid the frequent creation of CC ::LayerImpl objects, the Pending tree will not be destroyed, but will be reduced to a Recycle tree.

Unlike the main cc::Layer tree, the CC ::LayerImpl tree is not maintained by itself. Instead, the CC ::LayerTreeImpl object maintains the CC ::LayerImpl tree. The three Impl trees correspond to three CC ::LayerTreeImpl objects.

Draw

The Draw phase does not perform the actual drawing, but instead traverses the CC ::LayerImpl object in the Active tree, And call it cc: : LayerImpl: : AppendQuads method to create a suitable viz: : DrawQuad in RenderPass CompositorFrame. Cc: : LayerImpl of resources will be created as viz: : TransferabelResource into CompositorFrame resource list. At this point, a Viz ::CompositorFrame object is created. Finally, the CompositorFrame is sent to the VIz process (GPU process) for rendering through the CC ::LayerTreeFrameSink interface.

conclusion

Now the rendering process is almost complete, and the front-end code has become pixels on the screen.