, recruiting
We urgently need talent for browser rendering engine /Flutter rendering engine. WelcomeJoin us.
preface
The image memory problem on Flutter has been criticized for two reasons:
- GC: Dart image object references Native SKImage, so native memory cannot be released without GC.
- Page rendering: may cause a large number of invisible images to be resolved, resulting in a large memory peak; The image may be incorrectly referenced by services and cannot be GC, resulting in leakage.
Against Problem 1, the official Flutter is also optimizing. See the official document “Downward Memory Pressure”. The main idea is to make it easier for the engine to trigger Idle GC.
For problem 2, it is necessary for the business to troubleshoot and optimize, but for the problem of picture reference leakage, there is no good official tool at present, so it is difficult to troubleshoot.
Column Number posted an article about a memory leak detection tool in a previous post for your reference.
U4 kernel technology team designed and implemented a new image memory optimization scheme on Hummer engine based on years of technical accumulation of rendering engine. This solution is a significant improvement over the original Flutter engine both in terms of memory reduction and performance. This scheme can solve the image memory problem of Flutter engine.
This paper first compares the rendering architecture of the Flutter engine with that of a general Web engine. The optimization scheme is put forward. Finally, a video of practical application proves that the scheme improves the experience.
Note that since this article uses some common terminology, it is recommended that the reader have some knowledge of the Flutter rendering engine and understand the Flutter code.
contrast
Flutter uses a direct rasterization of its rendering architecture. Its image layout, decoding and rasterization are very different from the asynchronous rasterization of common Web rendering engines. The following is a comparison of the process of rendering a single frame image loaded by the network.
The Web rendering engine described below is a multi-process architecture with a simplified threading model; Paint follows the common vocabulary of the rendering process and means recording Paint instructions to a Picture or DisplayList.
Flutter engine
- When the Image widget is mounted to the Element Tree, the Image Resolve process is triggered, mainly in dashed lines. If the image is already in the ImageCache, the dotted line part of the process is skipped. This step is mainly to get decoded and upload SkImage;
- Layout and Paint will use the SkImage above. If the SkImage is not ready, the size will not be recorded in the Picture. See RenderImage::_sizeForConstraints and paint methods.
- During rasterization, SkImage will directly playback Picture, and Skia will call GPU API for rasterization.
Web rendering engine
- Unlike Flutter, which requires an image to be downloaded before it can be decoded, the Web rendering engine supports progressive decoding. As long as a portion of image data is received that is larger than the file header, the header can be parsed to obtain the width and height for formatting, and part of the image data can be decoded. Paint is the SkImage that contains the raw data of the current image;
- The layer containing the Image needs to be rasterized to Tile. If the Image is not in the ImageDecodeCache, it needs to be decoded and uploaded.
- After the layer raster is finished, the tiles will be synthesized and output.
plan
In the optimized scheme, the image cache is changed into two:
- ImageCache: The ImageCache of the native framework layer is degraded to the image raw data cache;
- ImageDecodeCache: Native MRU cache, SkImage decoded from the cache, internally divided into InUseCache and PersistentCache.
process
- The three processes of loading/decoding /upload in the native solution are optimized to load/parse the first two processes of the file, which use ImageCache;
- The Layout /paint process uses SkImage that contains the raw data;
- Before the layer rasterization, the undecoded pictures in the Picture visual area will be arranged to decode, and ImageDecodeCache will be used here. During rasterization, SkImage is replaced with decoded SkImage.
The dashed line part uses ImageCache and ImageDecodeCache respectively. These two caches have upper limit of Size and Capacity. After reaching the upper limit, LRU algorithm will be used to eliminate Image.
advantages
- No service awareness: The major changes in the optimization plan are in the engine and do not affect the writing of the service. Compared with the “external texture” scheme commonly used in the industry, the main changes of this scheme are in the framework, the business needs to use special widgets, there is a certain amount of adaptation work, and there are extra memory and compatibility problems on Android;
- Decode only images in the viewPort region: avoid unnecessary decoding memory;
- Memory optimization is obvious: there is a native SkImage ImageDecodeCache, can be released actively, not limited by GC. In addition, even if there is a leaked image, it will eventually be eliminated in ImageDecodeCache (LRU), will not occupy memory;
- Low-end machine optimization: for low-end models, set a smaller ImageDecodeCache upper limit, while reducing the size of the Image, reduce memory occupancy;
- Faster Layout/Paint: Only need to decode the header of the file, it will look good on the first frame and fast scrolling.
Based on the optimized engine of this solution and the memory leak finding tool, businesses can use the Image Widget without having to worry about OOM.
The actual effect
We modified the Backdrop in the Gallery, added more pictures, and set the upper limit of Cache to 30M so that we could see the effect of memory reclamation more quickly.
Faster layout/paint
The following video is the original effect. As you can see, there is a noticeable jump as the screen scrolls. This is because Image is decoded or Image is eliminated, so the Image in the screen cannot be decoded in time, and only the text below the picture is displayed. The obvious pulsation is caused by typesetting changes.
Click the link to watch the video
The following video is the optimized effect. As you can see, the screen doesn’t bounce at all during scrolling. This is because in the scrolling process, the engine can quickly participate in typesetting because it only parses the Image file header. Although there is still a delay in decoding, it does not have the feeling of jumping on top.
Click the link to watch the video
Memory controlled
The memory test is to get the value of “Gfx Dev” in the app’s meminfo, which represents the memory footprint of the GPU. The test method is click On Backdrop, quickly scroll up and down, and pass
Watch -n 0.2 "ADB shell dumpsys meminfo IO.flutter.demo. Gallery >> mem.log"Copy the code
Get meminfo every 0.2 seconds, and then get the value of Gfx dev.
- Although the Cache is set to 30M, this value is larger due to RasterCache, as well as buffers allocated internally by MIpmap and other Skia. From the curve, the stable value should be about 60M.
- The optimized GPU memory is ~ 8M and the native GPU memory is ~ 32M. At this time, because ListView decodes multiple images, and the optimized scheme decodes only two images in the Viewport;
- In the sliding process, the memory of Image exceeds the Cache upper limit, and both schemes will eliminate the Image. Due to the timely GC of the scheme before optimization, the memory peak exceeds 100M, and it returns to about 70M after GC, and there are many similar fluctuations later. Moreover, the average GPU memory usage is larger than that of the optimized scheme, which increases the probability of OOM. The GPU memory after optimization is stable at ~60M in the sliding process without obvious fluctuation.
Since the memory release of the optimized scheme is no longer restricted by GC, the active memory release mechanism can maintain the Image memory at a stable level and achieve a balance between memory and performance, thus better solving the problems mentioned in the preface.
Looking forward to
We have already launched this optimization on clients within the group and have a real business scenario in place. In the future, we will continue to conduct in-depth optimization for scenarios with multiple FlutterViews in actual business.
UC kernel technology team, focusing on rendering engine & virtual machine technology. As an important participant in Flutter construction of the group economy, we embrace the community and strive to bring maximum value to the business. Hummer is our deeply customized optimization of the Flutter engine, incorporating the team’s years of experience in Web rendering engines. We will introduce more optimization practices in the future, so stay tuned.
The U4 kernel is committed to creating the best performance, the most secure web platform, so that the Web can do anything.
Please search for U4 kernel technology and get the latest technology updates immediately