, recruiting

We urgently need talent for browser rendering engine /Flutter rendering engine. WelcomeJoin us.

preface

On January 16, UC Technology Committee, along with nuggets and Google developer communities, held the first Flutter Engine technology Salon in 2021. The activity attracted more than 150 students to apply, but due to the impact of the number of people on site, we could only arrange 50 students to come to the site. In addition, more than 2,000 students watched the live broadcast. During the event, five technical experts from Alibaba Group shared their research and development system based on Flutter construction, development and optimization experience, dynamic solutions, as well as the advantages and new features of Hummer, UC’s customized Flutter enhancement engine.

The first session was “Building UC Mobile Technology Center Based on Flutter” brought by Hui Hong, head of UC/ Quark client and head of UC Mobile Technology Center.

The second session is “Hummer (Flutter Custom Engine) Optimization and Systematic Construction Exploration” brought by Lu Long, technical director of UC Flutter Hummer Engine. This article contains about 8400 words, 38 pictures and 4 videos. The overall reading time is about 45 minutes.

Share content

Good afternoon everyone, I am Lao Long from the UC kernel team. As Dahui said just now, Flutter is not so perfect. It has many problems. At the very beginning, the business may be struggling to solve these problems, but some problems are difficult to solve at the business level, so an engine team is needed to support this matter. Our UC kernel team began to participate in the optimization of the whole engine last year, and there are some preliminary results, so today I am mainly here to share with you some of our progress, and some results, but also hope to provide you with some ideas for reference.

Today’s presentation will be mainly divided into four aspects: In the first part, we will briefly review the technical characteristics of Flutter and its current status; In part two we’ll look at Hummer’s technical architecture as a whole; The third part is the main content of today, will make an expanded introduction to some of our core optimization points; The fourth part will make a summary and synchronize some of our future plans.

This is an introduction to the website. Simply put, Flutter is a UI Toolkit. Its biggest feature is that Flutter is cross-platform.

It has several core concepts. The first is aesthetics, which is reflected in two aspects. The first is that the design of the entire Widget is very beautiful, and the cross-platform consistency is very good, which is mainly due to the implementation of self-drawing rendering, which can achieve pixel-level control.

The second feature is smoothness, which is mainly benefited from two aspects. The first aspect is that the business is written with Dart code, which can do AOT compilation, so the overall running performance is higher than JS. The second aspect is that the rendering pipeline is relatively short and efficient, which is mainly compared to React Native technologies, because its rendering pipeline may be longer.

The third point is efficiency, which is mainly reflected in development. Dahui said just now that Flutter supports this HotReload mechanism and its development is very efficient.

The last point I think is also very important, is because it is a very open project, it not only open source, and it’s the whole operation of the open source community is also very good, the response of the community is very fast, you go to ask what question, the official staff will reply you soon, so I think this is also doing very well.

There is no mention of cross-platform, because cross-platform is one of the fundamental properties of Flutter and the evolution of the whole Flutter engine is also based on these points.

The first one is Flutter, which is a very young engine. Its development history is relatively short. It only released its first official version at the end of 18. The second point is its add-to-App capability, that is, the mixed development capability, which was added at the end of 19th. We also found that the user experience of Add-to-App was relatively poor in the later use process. The third point is that the iteration speed of the whole Flutter version is very fast. This diagram does not show all the versions, but we can see that it will be updated and iterated with a new version in about 2-3 months. This also poses some challenges for our engine upgrade.

Let’s take a look at the development of the whole community. Although Flutter has a short history, its community development is very fast. Official statistics from Last October show that the number of apps and developers on Google Play has reached a very large level.

Many domestic manufacturers are also using Flutter technology more or less, including UC, quark and salted fish, which are all using Flutter technology in a very high intensity.

As the business scenario used more and more in-depth, more and more complex, we will find Flutter in fact it is not so perfect, it exists the problem was more, such as memory problems, performance issues, and doesn’t like package Size is too large, dynamic, so some of the problems hindered the development of business, It also hindered the further use of Flutter throughout the business.

To sum up, the biggest pain points of Flutter are mainly these 6 points from our perspective.

The first is that it lacks the ability to be dynamic;

The second point is that its performance in some scenarios, or in some complex scenarios is not enough;

In addition, its overall memory consumption is relatively large, especially in the picture more scene;

The fourth point is the package Size. If you use Flutter to develop, the Size will be larger than that of native. This may also be a pain point for some apps, because many apps are also focusing on the sinking market.

There are two other points, for example, the development ability of hybrid stack is relatively weak, and the entire infrastructure is not perfect because of its short history.

In order to solve these problems, there may be three business options. The first option is to optimize from the business, which may be the initial and painful phase. The second option is to wait for the official update, but this is actually unable to keep up with the pace of the business, because the official cannot support the business so quickly. So we chose the third way, to customize and optimize the engine, and our engine is called Hummer.

The Hummer Engine is in the lower left corner. It includes the Framework and Engine, as well as some code such as third-party libraries. On the right side of Hummer is our supporting tool platform, which mainly consists of three parts: online monitoring platform, construction platform and some supporting tool platform during development. In terms of research and development support platform, we are mainly engaged in the optimization of Seagull laboratory and DevTools.

On top of Hummer is its plug-in layer. In addition to supporting official general plug-ins, we will also customize some plug-ins and integrate them with Hummer, such as our Aion, a plug-in that supports dynamic capabilities.

At the top is our access business. At present, we not only serve UC, but also some clients of the group, such as some apps like Taohua.

Let’s go back to the lower left corner of the part, the blue square part is our relative to the official engine some optimization points, there are more, summed up in the main three core points: the first point is the memory optimization, and the construction of the whole tool platform; The second point is that we have done a comprehensive combing and optimization of performance; The third point is that we have enhanced the engine capacity to enable Flutter to provide more business capability support. Next, I will mainly make an expansion around these three core points.

First let’s look at some optimizations for memory. Let’s take a look at some of the challenges we face in memory, which fall into four main layers:

  1. From the perspective of business, in the scenario of multiple images, the memory pressure will be very high, because the release of Flutter images depends on THE GC mechanism of Dart, which may leak due to bad business writing or cause a large memory peak due to the late GC.
  2. From the point of view of the memory management of Flutter as a whole, it lacks a central memory management mechanism. This design without a central management mechanism will cause memory problems in multiple Engine scenarios. For example, if you use multiple FlutterViews, it has multiple FlutterEngines. At this time its picture cache is also controlled separately, can not be unified management, will be prone to peak memory;
  3. The whole process model of Flutter is relatively simple from an architectural perspective. It adopts a single-process architecture and is prone to virtual memory shortage and crash on 32-bit machines.
  4. At present, the tool level is relatively lacking, and it is difficult to analyze problems if they occur.

We mainly start from these five aspects of memory optimization to do some work, here to do a full link optimization, including from the engine to the business to the entire tool platform. We have made a lot of optimization points on the engine side, which can be summed up in two aspects: first, we have made a lot of optimization work on image memory control; The second piece was also optimized for multi-engine scenarios.

Business side is mainly to do some core object leak monitoring, as well as solve the problem of memory leak.

The two platforms in the middle are mainly tool platforms for local development. Our main core ideas for these two platforms are as follows: First, we hope to provide businesses with a means to quickly find memory problems through these tool platforms; Secondly, we also hope to make business analysis of these problems easier and more efficient through this platform.

The last piece is our iTrace, the online monitoring platform, where the main thing we do is to improve the crash log and add some memory grouping information to help us analyze OOM issues online more efficiently.

Let’s take a look at the effect of memory optimization. In the single-engine scenario, by working with the business to optimize, the overall crash rate was reduced by about 67%. However, memory saving is not our main goal, because memory is incompatible with performance. We hope to control the peak of memory, so that it can use as much memory as possible without OOM crash.

We’ve done a lot of optimization on peak memory control, and you can see one effect. In the two graphs on the right, which show the native memory curve, the overall memory peak can easily spike very high in this multi-graph scenario, reaching a memory footprint of about 1 GIGAByte, and the optimized version is fairly stable at the desired memory level.

Let’s look at the effect of multi-engine optimization. Here we mainly make three optimization points: the first point is the central memory control of the image.

The second is to share GLContext and GrContext, and the third is to share GPU and IO threads. Recently, the official has also launched an optimization plan for this area. We found that what the official wants to do is very consistent with what we do. They will also share GLContext, GrContext and thread. After these optimizations, the overall effect is very obvious in this multi-image scenario, like the video on the right, where each Item is 1 FlutterView and there are 9 FlutterViews in circulation. We can see that in this complex graphic scenario, with multiple engines, our memory footprint is 40% to 70% lower than native.

Click the link to watch the video

Next for several optimization points to do an expansion of the introduction. The first is our central image texture cache. Let’s take a look at this simple diagram. The image decoding of Flutter takes place after the image is loaded. There will be two problems. The first is that Layout may be blocked and slow down. The second problem is that its image texture cache is managed by Dart, so it has to rely on Dart’s GC mechanism, and there is no way to share the cache between the various isolates, so there is no unified management.

After we made the image center texture cache optimization, image decoding is on the rasterizer step to do, after completion of loading images, we only need to solve its head picture, get a high information wide, and then can be used to layout, it can accelerate the whole process of layout, it also in the rasterizer stage to do the decoding, Data management is decoupled from widgets, and a unified memory control can be implemented in C++.

In this way, we can manage the memory peak in a unified way under multiple isolate scenarios, so that the memory peak can be well controlled.

Taking a look at the optimisations we made in DevTools, we have created a panel called DMA that works with the interfaces provided by the Framework, which provides a LeakAdd and LeakCheck interface. Allow businesses to monitor objects through these interfaces, and then report object information to Devtools when they leak. You can see that there is an object name here, and then there is a reference path of it. We also do an intelligent analysis for the reference path, to calculate its shortest reference path, so that it can better help businesses to quickly analyze these memory leaks.

Our local automated test platform — Seagull Lab, where we do automatic test of memory, including the ability to automatically test the changes of the whole memory, and then display these memory modules. During the test, it will be combined with our memory exception check function. When some memory exceptions are found during the test, they can be reflected in the test report. Then click the report to see the details of the whole exception and analyze the memory exception. So our Seagull lab did automatic testing and assisted analysis of memory anomalies.

The online section is mainly to supplement some logs, including some memory grouping information. We have divided the whole Flutter memory into many modules, very detailed, so that if some OOM crashes occur, we can better locate which module is caused, so that we can analyze these OOM problems more effectively.

With some memory-related optimizations out of the way, let’s take a look at some of the things we’ve done in terms of performance, which fall into five categories:

The first aspect is that we have made a comprehensive combing and optimization of inertia rolling smoothness, which I will focus on later.

The second thing is that we have also made some optimizations for the first frame and boot-up, and now the results are very good, reducing the first frame consumption by 60%;

Third, we also used the LLVM back end to optimize Dart code generation performance. By introducing the LLVM compiler back end, the overall DartVM code performance was improved by over 30%.

The last two points are the optimization of two special scenes, one is a GIF, the other is a mixed rendering scene.

Let’s take a look at the effects of the GIF and mixed rendering of these two scenes. The GIF on the left shows a significant improvement in frame rate after optimization. On the right is the blend rendering optimization, like this triangle area and these color spheres, which is also more streamlined.

Click the link to watch the video

Some of the optimizations we made for inertial scrolling fluency are described below. The first step is the response to the user event. After the UI thread receives the user event, it calculates the scroll speed and scroll distance. Then it detects that it needs to scroll. It will register a Vsync Callback and wait for a Vsync signal to arrive. When Vsync arrives, it will Callback to the UI thread and perform the frame production action. Frame production includes Animate, Build, Layout, and Paint. After Animate, you compute the animation data and get the offset. After Animate, you create the Layout and Paint. Paint generates some Draw instructions, which are stored in the Scene object, sent to the Raster thread, and then does an action to render the Scene, which is the action of Draw. The frame production here is synchronized with the Draw process, and the two steps need to add up to less than 16 milliseconds in order to avoid dropping frames.

The first problem is that Vsync’s signal registration is cumbersome, and it needs to register a callback to the Platform thread, which may cause the Platform thread to be busy. As a result, the scheduling of the whole Vsync is not timely;

The second problem is the operation of Draw, which waits for the flow of the previous frame production, which is synchronous. Assuming that the previous Build or Layout process is complicated and time-consuming, it is very easy to lose frames.

The third point is the operation of Draw, which includes the operation of CPU and GPU. All operations are done in one thread, which is not concurrent enough and may not have high performance.

Having looked at its analysis of principles, let’s look at the trace of the actual business. This is a trace captured by us in the humor business. We found that the inertial roll of Flutter has a characteristic: the frame rate is not low, but it is easy to get a certain frame stuck badly in the sliding process. This is mainly related to the implementation of ListView. During the scrolling process, ListView will check whether there is a node in its buffer area. If there is no node in the buffer area, ListView will perform an insert action. It’s easy to say that you’re dropping frames.

Therefore, the main challenge of the Flutter’s inertial roll smoothness is how to reduce this serious frame jam.

Here we have made some optimization exploration, mainly including the six aspects. The first was that we did a separate GPU thread, and then we did some optimizations for typography, and things like Sliver frame rendering.

Next, I will make an introduction based on these three points.

First is the independent process of the GPU, this principle is still relatively simple, the original Draw all in the Raster thread to do, but now we Draw inside the GPU part of the operation into another independent GPU thread to do, This will reduce the pressure on the Raster thread, allow concurrency, and improve its performance.

The change of the whole program is still very large, but also encountered some problems, so I will not expand here.

The second point is that we have made parallel optimization of font Layout. We found that the Layout of the Flutter font takes a long time. Let’s look at the trace, here is a TextView with only hundreds of Chinese fonts. The Layout takes more than 100 milliseconds (profile mode). We have made an optimization here. We found that the font will calculate two parts of information in the Layout process. One part needs to be used in the Layout process, such as the width and height information of the whole text segment, and some information of the single font type, which is not used in the Layout process. We put the information needed for Paint into another thread for parallel calculation. In this way, the Layout process can be accelerated and its time can be shortened by more than 30%~40%.

Let’s take a look at Sliver frame rendering. Flutter scrolling is usually implemented using a ListView or GridView. There are several concepts. One is a Viewport, which represents the area that the user can see, and two Cache extents, each with a buffer above and below. Its nodes are hosted on the SliverList.

Click the link to watch the video

When we scroll, the SliverList will roll some nodes out of the Cache. When it finds that the buffer is empty, it will immediately add nodes to the Cache. For example, if two nodes are missing, it will add them. If the two nodes are complicated, this frame could easily stall, and a very large one could occur. The main idea of our frame rendering is to spread the big lag of a frame into multiple frames, so that it is not prone to such a big lag, and try to make the frame rate more smooth.

Click the link to watch the video

For example, after the first frame of this scene is rolled out, a node will be added at this time. When we find that the node insertion is very time-consuming, we will stop inserting another node, and the next node will be inserted until the next frame, so as to avoid the situation of a very difficult frame.

Our first screen has some optimizations related to startup, and there are 6 main points here. This one has been explained in detail in some of my previous posts, but I won’t expand on it today. The whole startup time and the first frame time can also be increased by multiple times after optimization. Many of our optimization points have been submitted to the community. If you use the latest official engine, you can also experience some optimization.

Thirdly, Dart compiler optimization based on LLVM is introduced. Here, we briefly take a look at the compilation process of Dart code. The compilation pipeline can be divided into the following steps: The Dart source code goes through a general-purpose front-end compiler to generate Kernel binaries. It then goes through type flow analysis to optimize out useless methods and generate an intermediate language called IL, which is then assembled by the assembler and linked to produce the final object code.

In this optimization, we mainly cut off the following process, and then added an LLVM compiler back end to convert IL instructions into the general input IR instructions of the LLVM compiler back end, and then used the powerful code optimization ability of LLVM after input to further improve the performance of the entire code product. The Dart code is already very high in performance, and after our optimization, it can be improved by 30% or more to achieve C++ performance.

Finally, let’s talk about some of the enhancements we’ve made to the engine’s capabilities. For example, we built an external network library to support third-party fonts customized by Android manufacturers. We also built an adaptive DarkMode capability, which allows business developers to switch between daytime and night modes with one click through a Widget. We also support image formats like HEVC/HeIF; Official support for single-engine reuse is poor, and we have made some support here. So we’re going to talk about the first two points.

The first one is our external web library. The main capabilities of Flutter web library are provided by plug-ins. Businesses usually use web library plug-ins such as DIO or HTTP to access web capabilities. These network library plug-ins it through HttpClient, and then all the way down to the implementation layer, eventually transferred to the network library side. We created a new plugin called uNet that implements the HttpClient interface and Bridges it to our external network library so that businesses can use our network library through DIO or HTTP plug-ins without changing the code. To enjoy some of the capabilities of our external network division.

The external network library has several advantages. The first is better network protocol support, such as better h2 and H3 support, and higher performance. The performance of our online data can be improved by more than 30%, and the error rate will be greatly reduced.

The last function point is our third-party font support. As you may know from Android phones, many domestic manufacturers will provide users with the ability to download fonts and then replace system fonts.

However, this third-party font is not supported inside Flutter. After the system custom font is switched, the font of the whole Flutter interface still remains the original system font, which makes the interface consistency look poor. We also support this aspect here, so that Flutter can use this third-party font internally to make its interface more consistent.

I’m going to make a little bit of a mess with all of these points, but let me just summarize.

Today I will mainly talk about three aspects. The first one is to introduce the core concept of Flutter. It has several characteristics: beautiful, smooth, efficient and open.

Flutter is a very young engine, but it has been developing rapidly. Although it has only been more than two years, the number of developers is huge and many big factories in China are using Flutter.

But as more and more of these scenarios were used, we found a lot of problems, such as memory and performance problems and size problems, so we made a custom engine called Hummer. Hummer is currently optimizing around three things:

The first is to optimize the memory, as well as its related to the construction of some supporting tools;

The second point is that we have also made a comprehensive optimization of the overall performance;

The third is to enhance some of the capabilities of the engine so that it can better support some scenarios of the business.

Finally, let’s talk about some planning in the future. Pain points were mentioned just now, including such pain points as dynamic ability, package size and mixed stack development. Next, we will continue to optimize around these pain points.

For example, inertia rolling fluency optimization. Although I just mentioned some points, there is still a certain distance from our goal of calibration native. We will continue to explore this area in the future, and then do more optimization points.

The second is the dynamic capability. Now we are building the Aion scheme with the client team. Finally, one of our colleagues will introduce the Aion scheme in detail.

Our previous investment in package Size was not enough. As many apps are now focusing on the sinking market, they have higher requirements on package Size. We will invest more manpower to optimize this area in the future.

As for the ability of mixed development mentioned above, there are many plugins in China that support this kind of mixed development, such as FlutterBoost and Thrio from Hello Bike, but there are still some problems. For example, FlutterBoost 2.0 has a high upgrade cost due to copying some code. Besides, its interface usage and memory usage are not very good, and there are many bugs. Later, we will also jointly build FlutterBoost 3.0 with Xianyu team, which will continue to operate in an open source way and will probably be launched in the first quarter of this year.

The last one is memory diagnosis. As mentioned earlier, some optimization has been made, but it is not enough in the whole tool and platform construction, so we still cannot find problems and diagnose them quickly. We will continue to explore this area and then improve it, hoping to improve efficiency.

That’s all for today’s sharing. Thank you.

Please search for U4 kernel technology and get the latest technology updates immediately