Hardware refresh principle of Vsync signal

The hardware rendered RenderThread also listens for Vsync signals. In fact, in order to render more quickly, the RenderThread itself listens in native and executes a loop similar to viewrotimPL.

Let’s take a look at what we talked about in the last article. When the Vsync signal arrives, it will eventually execute the following method:

void RenderThread::dispatchFrameCallbacks() { ATRACE_CALL(); mFrameCallbackTaskPending = false; std::set<IFrameCallback*> callbacks; mFrameCallbacks.swap(callbacks); if (callbacks.size()) { requestVsync(); for (std::set<IFrameCallback*>::iterator it = callbacks.begin(); it ! = callbacks.end(); it++) { (*it)->doFrame(); }}}Copy the code

So when was the IFrameCallback object registered? There’s one method THAT I missed when I was talking about view Blog.

void scheduleTraversals() { if (! mTraversalScheduled) { mTraversalScheduled = true; mTraversalBarrier = mHandler.getLooper().getQueue().postSyncBarrier(); mChoreographer.postCallback( Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null); . notifyRendererOfFramePending(); pokeDrawLockIfNeeded(); }}Copy the code

Whenever mChoreographer sends a CALLBACK_TRAVERSAL to Handler in the Loop for the next round of View rendering process after processing, will perform notifyRendererOfFramePending method:

    void notifyRendererOfFramePending() {
        if (mAttachInfo.mThreadedRenderer != null) {
            mAttachInfo.mThreadedRenderer.notifyFramePending();
        }
    }
Copy the code

And this method is essentially ThreadedRenderer calling the pushBackFrameCallback of CanvasContext via RenderProxy:

void CanvasContext::notifyFramePending() {
    mRenderThread.pushBackFrameCallback(this);
}
Copy the code

At this point, you’ve actually registered the callback to the CanvasContext. In other words, every time the RenderThread receives a VSync signal, it goes into the doFrame callback of the CanvasContext.

void CanvasContext::doFrame() { if (! mRenderPipeline->isSurfaceReady()) return; prepareAndDraw(nullptr); }Copy the code

void CanvasContext::prepareAndDraw(RenderNode* node) {

    nsecs_t vsync = mRenderThread.timeLord().computeFrameTimeNanos();
    int64_t frameInfo[UI_THREAD_FRAME_INFO_SIZE];
    UiFrameInfoBuilder(frameInfo).addFlag(FrameInfoFlags::RTAnimation).setVsync(vsync, vsync);

    TreeInfo info(TreeInfo::MODE_RT_ONLY, *this);
    prepareTree(info, frameInfo, systemTime(CLOCK_MONOTONIC), node);
    if (info.out.canDrawThisFrame) {
        draw();
    } else {
        waitOnFences();
    }
}
Copy the code

You can see that the process is almost the same as before:

PrepareTree processes the display hierarchy of the whole View, constructs the draw operation recordedOp and generates OffscreenBuffer
2. Draw to render

If the fence is already drawing and canDrawThisFrame is false, the fence waits.

conclusion

Here the main process and principle of hardware rendering and we have analyzed, about the Skia rendering pipeline and Vulkan related rendering pipeline related operations, here is not much more detailed. However, we will come back to skIA-related topics after IMS, PMS and the other three components talk to you.

As usual, let’s start with the sequence diagram:

We just need to focus on what’s going on in the ThreadedRenderer drawing process? When the ThreadedRenderer is ready to start drawing, the RenderProxy is notified, the drawing event is pushed into the RenderThread for queued execution, and the drawFrame ETask run method is used to start drawing.

Two main steps are performed in DrawFrameTask:

1. SyncFrameState synchronizes the state of each frame by name. The core of this method is to call prepareTree to iterate through the View hierarchy from RootRenderNode.
- The first step implements the OpenGL callback method injected with drawGLFunctor2.
- The second step is to try to read the texture cache corresponding to the current RenderNode without blocking the UI thread if the cache works.
- The third step is to calculate the total dirty area of the whole view according to the dirty area of the hierarchy.
- The fourth step is to record the RenderNode to be refreshed in the LayerUpdateQueue
- Step 5 Create an off-screen render cache for each RenderNode that needs to draw a View
2. Draw method of CanvasContext, perform the following steps:
- The first step is to copy the calculated dirty areas into the Frame EGLSurface object containing OpenGL ES via computeDirtyRect
- The second step builds the FrameBuilder object and calls the deferLayers method. In the deferLayer method, saveForLayer saves each LayerBuilder in the FrameBuilder’s collection of LayerBuilders. DeferNodeOps is then called, and buildZSortedChildList is called to handle the order on the Z-axis of each child RenderNode. Defer3dChildren handles whether shadows need to be handled on each child RenderNode. Iterating over the chunk collection object stored in the DisplayList object, retrieving the current RecordedOp draw operation from the index of the chunk record, and calling defer(type)Op for each operation to convert RecordedOp into a BakeStateOp object
The third step calls the replayBakedOps method on the FrameBuilder, which uses a BakedOpDispatcher object for the corresponding draw type, Find the corresponding on(type)Op method and build a Glop object to pass into the BakedOpRenderer to start rendering. If encountered draw type, brush, transparency is consistent then become find onMerged(type)Ops, after merge draw operation, finally make up the Glop object passed into the BakedOpRenderer to start rendering.
The fourth BakedOpRenderer will actually start rendering with RenderState. At the same time, the contents of the Glop object constructed by the Builder pattern are unwrapped and the drawing of OpenGL ES begins.

It took two big steps and nine small steps to complete the rendering behavior in RenderThread.

Of course, the RecordedOp in which to draw the action object undergoes a couple of relatively significant shifts.

thinking

Comparing the hardware rendering series with the software rendering series, we can see the biggest differences. All the software rendering work is queued up by Looper in the UI main thread to handle render events;

The difference with hardware rendering is that once performDraw starts drawing, all processes are switched to the Looper of RenderThread to queue rendering.

Here’s a schematic:

So a lot of UI rendering optimization says turn on hardware rendering and optimize, and that’s true. This is because the most consuming event in these processes is drawing. We can reduce the stress on the UI thread if we place the drawing steps in a Looper in another RenderThread thread.

Can we be bold and move performMeasure and performLayout to another measurement thread? Let’s make our main thread completely separate from the UI thread logic and become a real business thread.

UI rendering optimization and Litho

In fact, they did. Facebook opened source a Litho asynchronous measurement drawing library. The biggest problem with Litho library is that it has abandoned the original pipeline development of Android, and needs to accept a new concept, which is closer to the React concept: everything is a Component. The Design of Flutter has learned a lot about this. Litho’s layout is much simpler than Android’s. It has only one layout, the Flexbox layout, and since it’s written in code, you can’t see it visually in AS so far.

How does it work? There won’t be much introduction here, but here’s a schematic:

Litho added several states for each control in order to be able to measure the control properly in multiple threads:

@onprepare, in the preparation stage, perform some initialization operations.
@onMeasure, responsible for layout calculation.
@onboundsDefined, do something before mounting the view after the layout calculation is complete.
@onCreatemountContent, create the view you want to mount.
@onmount, mount view, complete layout related Settings.
@onbind, bind the view, complete the data and view binding.
@onunbind: unbind a view, mainly used to reset the data related properties of the view to prevent reuse problems.
@onunmount: unmounts the view. It is mainly used to reset the attributes related to the layout of the view to prevent reuse problems.

How does it work? Litho source is not difficult, free to chat with you. His design was similar to that of the Flutter. Flutter actually has only one FlutterNativeView in the Activity, and all subsequent drawing operations are performed on this View via Skia. LItho is the same. There is only one LithoView in the whole world. All the drawing starts from LithoView, and yoganodes are continuously added to LithoView through the Yoga library.

So for Android, onMeasure and onLayout and onDraw only have one layer and that’s LithoView.

And you can see in Litho we’ve taken over performMeasure, performLayout, performDraw.

Throughout the process, as soon as LithoView needs to start drawing the layout from onMeasure, it completely takes over the logic of traversing the component tree, performing the steps described above. When the layout is measured and placed, the Yoga node is drawn. Of course, The Default for LithoView is synchronous layout measurement, and asynchronous layout is only possible in RecyclerView and similar list components. Of course, it also has the following core methods:

  private void setRootAndSizeSpecInternal(
      Component root,
      int widthSpec,
      int heightSpec,
      boolean isAsync,
      @Nullable Size output,
      @CalculateLayoutSource int source,
      int externalRootVersion,
      String extraAttribution,
      @Nullable TreeProps treeProps)
Copy the code

There are also Async methods when updates are needed.

void updateStateInternal(boolean isAsync, String attribution, boolean isCreateLayoutInProgress)
Copy the code

Why doesn’t Android native support asynchronous layouts? Although the Internet says there are two reasons:

1.View attributes are variable, as long as the change of attributes may lead to layout changes, so it is necessary to recalculate the layout, so it is not meaningful to calculate the layout in advance. Litho has unique properties, so it’s possible to calculate the layout ahead of time.
2. Asynchronous layout in advance means to create the View of one or more items to be used in advance. As a View unit, the Android native View not only contains all the attributes of a View, but also is responsible for the drawing of the View. If you want to calculate the layout before drawing, you need to hold a large number of View instances that are not shown up front, which increases the memory footprint. Litho, on the other hand, has a DefaultMountContentPool mount pool on the bottom layer to recycle the component objects, and only after mounting the View will be displayed on the screen.

I agree with the second point, but I take issue with the first. We really can’t ignore the meaningless work done in the large View in which asynchronous threads pre-measure the degree of change.

But remember that in the entire UI thread we don’t just draw the View. More often than not, our business code starts out as a small requirement and ends up with no big problems. But once you get to a certain level, 16ms frames can be quite a burden (ideally allowing 1-3 frames to be dropped). Typically our development handles this by generating a thread from a thread pool to handle heavy tasks such as IO operations. Sometimes, however, it is not necessarily IO operations that cause the method to exceed the ideal threshold, but rather a combination of small pieces of business code (hence the need to monitor method staking).

Therefore, asynchronous processing of performMeasure and performLayout is perfectly possible in order to minimize the drawing pressure when the thread context switch is not costly.

Of course, in addition to this method of optimization, of course, we can get some inspiration from Litho, such as constructing View objects in advance. X2C means from XML to code, reduce the view object instantiation time; Complex pixel manipulation allows you to optimize your GPU using RenderScript; You can use View.animate to generate a hardware rendered animation of a ViewPropertyAnimator.

Think after comparing the browser rendering process horizontally

In fact, it seems to me that the hardware rendering flow of RenderThread is very similar to the browser rendering flow design. Although I haven’t read webView kernel code specifically, some of the fundamentals are clear. Here’s a quick comparison, using Chrome as an example.

As you can see, if you turn on hardware rendering on Android, the entire process is very similar to the browser rendering process.

In the browser, after the network request, obtain the DOM tree, style calculation, Layout Layout to generate the corresponding tree. In fact, performMeasure and performLayout on Android do almost the same thing, which is to measure the size and position of the layout.

When the browser generates the LayerTree from the LayoutTree, it crosses the composition thread into smaller blocks, and the raster thread rasterizes each layer, passing each frame to the thread.

This step is actually similar to hardware rendering. But hardware rendering does more. RenderThread allocates off-screen rendering memory for all rendernodes (corresponding to nodes in the DOM tree) to generate layerBuilders and store them in the FrameBuilder. This is equivalent to the Layer Tree step in the browser.

The smallest block of graphics for hardware rendering is RenderNode, and each RenderNode has its own cache. It can also correspond to the steps of the tiles diagram block. But the next steps are different.

When the ThreadedRenderer traverses the View tree, even though ImageView will call Drawable to draw into the DisplayListCanvas, since the DisplayListCanvas is essentially a RecordedCanvas, Instead of drawing immediately, all drawing operations are saved until the same composition is drawn later. In this way, the entire hardware rendering process is greatly optimized. There is no need to operate OpenGL ES to interact with GPU all the time, and the drawing performance of the entire system can be greatly increased by synthesizing the drawing operation and postponing the unified execution to reduce the calculation times of GPU.

Finally, every frame of EGLSurface synthesized by Android system through GPU rasterization is directly sent to SF process for further processing.

One more hint: why does every View or RenderNode need its own cache, regardless of hardware or software? The inspiration from Chrome is that the partitioning of tiles is done in a thread called composition.

In contrast to The Android system, there is actually a concept of composition. Each renderNode has its own cached texture, and each View has its own cached Bitmap. Then there is the following design:

Since all drawing operations are delayed and only drawing operation objects are saved, each texture or View with its own cache can be synthesized in LayerBuilder and finally rasterized by OpenGL ES and transmitted to SF process. In this way, View/RenderNode pieces can be pieced together to quickly form a whole new frame. I call this behavior horizontal frame composition.

Of course, another reason why hardware rendering is recommended is that it can request a larger texture cache. Instead of being 480 * 800 * 4 in software rendering.

Of course, there is also vertical frame composition, this is SF HWC, HardwareComposer function, each Client corresponding Layer object vertical (Z axis) synthesis.

Author: yjy239 links: www.jianshu.com/p/4854d9fcc… The copyright of the book belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.

Hardware refresh principle of Vsync signal

conclusion

thinking

UI rendering optimization and Litho

Think after comparing the browser rendering process horizontally

Related Posts

Android Hot Fix Case Analysis (a)

Facebook has released a new Node module manager, Yarn, which may replace the NPM client

Java Operator Literacy | Java Debug Notes