Author: Idle Fish Technology – Furnace army

preface

I remember when I was doing group video calls in 2013, multi-channel video rendering became a very big performance bottleneck on the server. The reason is that the high-speed on-screen operation of each screen (PresentRenderBuffer or SwapBuffer is to render the rendering results of the rendering buffer on the screen) consumes a lot of CPU and GPU resources.

At that time, the solution is to separate the drawing and the upper screen, the multi-channel screen is abstracted into a drawing tree, and it is traversed. After the drawing is completed, the upper screen operation is unified, and each screen is no longer triggered separately, but triggered by the unified Vsync signal, which greatly saves the performance overhead.

It was even considered that the entire UI would be rendered by OpenGL, which would further reduce the performance overhead of animations such as sound spectrum, breathing effects, etc. However, due to various limitations, this idea was not put into practice.
 

Unexpectedly, this full interface OpenGL rendering idea can also be used to do cross-platform.

Flutter rendering frame

Here is a simple rendering frame for Flutter:

 

Each leaf node in the Tree represents an interface element (Button, Image, etc.).

Skia: This is a cross-platform rendering framework from Google. According to the current IOS and anrdroid, the bottom of Skia is ultimately called OpenGL drawing. Vulkan support is not good yet, Metal doesn’t support it yet.

Shell: Shell here specifically refers to the part of Platform features, including IOS and Android Platform related implementation, including EAGLContext management, on-screen operations and external texture implementation will be introduced later.

As can be seen from the figure, after the Runtime completes Layout output of a Layertree, the pipeline will traverse each leaf node of the Layertree, and each leaf node will finally call the Skia engine to complete the drawing of interface elements. Call glPresentRenderBuffer (IOS) or glSwapBuffer(Android) to complete the on-screen operation.

Based on this basic principle, Flutter isolates the UI between Native and the Flutter Engine, making it cross-platform to write UI code without worrying about platform implementation.

The problem

Just as there is always a downside to a Flutter, it also creates a mountain between the Flutter Engine and the Native, making it very difficult for the Flutter to capture some of the high memory footprint of the Native side (camera frames, video frames, album images, etc.). Traditional Ones such as RN and Weex can directly obtain these data through bridging NativeAPI. However, Flutter cannot directly obtain these data from the basic principle. The channel mechanism defined by Flutter essentially provides a message transmission mechanism. The transmission of images and other data inevitably causes huge consumption of memory and CPU.

solution

To this end, Flutter provides a special mechanism: an external Texture.

The diagram above shows a simple structure of the LayerTree mentioned earlier. Each leaf node represents a control in the Dart layout. You can see the TextureLayer node at the end, which corresponds to the Texture control inside the Flutter (ps). The Texture here is different from the GPU Texture, this is a Flutter control. When a Texture control is created in Flutter, it represents the data displayed on the control and needs to be provided by Native.

Here is the final drawing code for TextureLayer on IOS (similar but slightly different on Android). The whole process can be divided into three steps

1: Call external_texture copyPixelBuffer to get CVPixelBuffer

2: CVOpenGLESTextureCacheCreateTextureFromImage created OpenGL Texture (this is true Texture)

3: Encapsulate OpenGL Texture as SKImage and call Skia’s DrawImage to complete the drawing.

void IOSExternalTextureGL::Paint(SkCanvas& canvas, const SkRect& bounds) {
  if(! cache_ref_) { CVOpenGLESTextureCacheRef cache; CVReturn err = CVOpenGLESTextureCacheCreate(kCFAllocatorDefault,NULL,
                                                [EAGLContext currentContext], NULL, &cache);
    if (err == noErr) {
      cache_ref_.Reset(cache);
    } else {
      FXL_LOG(WARNING) << "Failed to create GLES texture cache: " << err;
      return;
    }
  }
  fml::CFRef<CVPixelBufferRef> bufferRef;
  bufferRef.Reset([external_texture_ copyPixelBuffer]);
  if(bufferRef ! = nullptr) { CVOpenGLESTextureRef texture; CVReturn err = CVOpenGLESTextureCacheCreateTextureFromImage( kCFAllocatorDefault, cache_ref_, bufferRef, nullptr, GL_TEXTURE_2D, GL_RGBA, static_cast<int>(CVPixelBufferGetWidth(bufferRef)),
        static_cast<int>(CVPixelBufferGetHeight(bufferRef)), GL_BGRA, GL_UNSIGNED_BYTE, 0,
        &texture);
    texture_ref_.Reset(texture);
    if(err ! = noErr) { FXL_LOG(WARNING) <<"Could not create texture from pixel buffer: " << err;
      return; }}if(! texture_ref_) {return;
  }
  GrGLTextureInfo textureInfo = {CVOpenGLESTextureGetTarget(texture_ref_),
                                 CVOpenGLESTextureGetName(texture_ref_), GL_RGBA8_OES};
  GrBackendTexture backendTexture(bounds.width(), bounds.height(), GrMipMapped::kNo, textureInfo);
  sk_sp<SkImage> image =
      SkImage::MakeFromTexture(canvas.getGrContext(), backendTexture, kTopLeft_GrSurfaceOrigin,
                               kRGBA_8888_SkColorType, kPremul_SkAlphaType, nullptr);
  if(image) { canvas.drawImage(image, bounds.x(), bounds.y()); }}Copy the code
The core is this external_texture_ object, where does it come from?

void PlatformViewIOS::RegisterExternalTexture(int64_t texture_id,NSObject<FlutterTexture>*texture) {
  RegisterTexture(std::make_shared<IOSExternalTextureGL>(texture_id,texture));
}

Copy the code

As you can see, before the Native side calls RegisterExternalTexture, we need to create an object that implements the protocol FlutterTexture, which is ultimately assigned to external_texture_. The external_Texture_ is a bridge between the Flutter and Native, through which the current image data can be obtained continuously during rendering.

As shown in the figure, the data carrier of Flutter and Native is actually PixelBuffer by means of external texture. Data sources (camera, player, etc.) at the Native end write data into PixelBuffer. The Flutter gets the PixelBuffer and turns it into OpenGLES Texture, which is drawn by Skia.

This allows Flutter to easily draw any data Native wants. In addition to dynamic Image data such as camera player and Image display, Flutter also provides an alternative to Image controls (especially if Native already has large Image loading libraries such as SDWebImage). Dart is also time-consuming and labor-intensive to write on the Flutter side.

To optimize the

The whole process mentioned above seems to perfectly solve the problem of Flutter displaying the big data at the Native end, but many realities are as follows:

In the engineering practice shown in the figure, GPU is usually used to process the video image data at the Native end for performance purposes, and the interface defined at the Flutter end is copyPixelBuffer. Therefore, the whole data flow should go through the process of GPU->CPU->GPU. Those who are familiar with GPU processing should know that the memory swap between CPU and GPU is the most time-consuming operation among all operations, and usually consumes longer time than the entire pipeline processing time.

Since the Skia rendering engine requires the GPU Texture and Native data processing outputs the GPU Texture, can we just use that Texture? The answer is yes, but there is one condition: resource sharing in EAGLContext (the Context is used to manage the current GL environment, ensuring the isolation of resources in different environments).

Here we first need to introduce the thread structure of Flutter:

As shown in the figure, Flutter usually creates 4 runners. The TaskRunner mechanism is similar to the IOS GCD, which executes tasks in a queue. Usually (a Runner will run on a thread, while Platform Runner will run on the main thread). There are three runners related to this paper: GPU Runner, IORunner and Platform Runner.

GPU Runner: Responsible for GPU rendering operations.

IO Runner: loads resources.

Platform Runner: Runs on the main Thread and is responsible for all Native interactions with the Flutter Engine.

Normally an APP thread design using OpenGL will have one thread responsible for loading resources (images to textures) and one thread responsible for rendering. But it is common to find that in order for a texture created by the loading thread to be used by the rendering thread, two threads will share an EAGLContext. However, it is not safe to use it in this way from the specification. Multi-thread access to the same object lock will inevitably affect performance, code handling is not good or even cause deadlock. Flutter therefore uses another mechanism for EAGLContext usage: two threads each use their own EAGLContext and share texture data with each other through a ShareGroup. (What should be mentioned here is: Although the users of the two contexts are GPU and IO Runner respectively, according to the existing logic of Flutter, the two contexts are created under Platform Runner. We don’t know what the purpose of Flutter is, but this design brings us great troubles. More on that later.)

For modules that use OpenGL on the Native side, they will also create their own thread Context under their own thread. In order to make the Texture created in this Context feed to the Flutter end and send it to Skia for drawing. When we create two contexts inside a Flutter, we expose their ShareGroup, and then save this ShareGroup on the Native side. When Native creates a Context, it uses this ShareGroup to create it. This enables texture sharing between Native and Flutter.

Doing external_texture this way has two advantages:

The first: It can save CPU time. According to our test, it takes about 5ms to read a 720P RGBA video frame from GPU to CPU, and another 5ms to send it from CPU to GPU. Even with the introduction of PBO, it still takes about 5ms. This is obviously unacceptable for high frame rate scenarios.

Second: save CPU memory, obviously the data is passed on the GPU, especially for image scenarios (since there may be many images to display at the same time).

After the language

At this point, we have introduced the basic principle and optimization strategy of Flutter external texture. But you might wonder why Google uses Pixelbuffer when Texture is so good to use as an external Texture. To use the Texture of the Flutter, you need to expose the ShareGroup, which means that the GL environment of the Flutter is open. If the external OpenGL doesn’t operate properly, the object of the Flutter is a number to the CPU. A Texture or FrameBuffer we see is a GLUint. If the environment is isolated, we just call deleteTexture, deleteFrameBuffer doesn’t affect objects in other environments, but if the environment is open, These actions may affect objects in the Flutter Context, so as a framework designer, it is important to ensure the closed integrity of the frame.

During the development process, we encountered a weird problem. After locating it for a long time, we found that the reason was that we mistakenly deleted the FrameBuffer of the Flutter by calling glDeleteFrameBuffer when the main thread did not have setCurrentContext. Cause flutter to crash when rendering. Therefore, it is suggested that if students adopt this scheme, the relevant operations of GL at the Native terminal must comply with at least the following points:

1: Try not to do GL operations on the main thread.

2: Before a function call with GL operations, add setCurrentContext.

The overall principle of Android Flutter is the same, but the specific implementation is slightly different. The external textures of Android Flutter are implemented using SurfaceTexture, which is actually a copy of CPU memory to GPU memory. Android OpenGL doesn’t have a ShareGroup, it uses a shareContext, so you just pass the Context out. And the Shell layer Android GL implementation is based on C++, so the Context is a C++ object, to share the C++ object with the Java Context object on the Android native side, need to call in the jni layer like this:

static jobject GetContext(JNIEnv* env,
                          jobject jcaller,
                          jlong shell_holder) {
    jclass eglcontextClassLocal = env->FindClass("android/opengl/EGLContext");
    jmethodID eglcontextConstructor = env->GetMethodID(eglcontextClassLocal, "<init>", "(J)V");
    
    void * cxt = ANDROID_SHELL_HOLDER->GetPlatformView()->GetContext();
    
    if((EGLContext)cxt == EGL_NO_CONTEXT)
    {
        return env->NewObject(eglcontextClassLocal, eglcontextConstructor, reinterpret_cast<jlong>(EGL_NO_CONTEXT));
    }
    
    return env->NewObject(eglcontextClassLocal, eglcontextConstructor, reinterpret_cast<jlong>(cxt));
}
Copy the code

 

Contact us

If you have any questions or corrections about the content of the text, please let us know.
Idle fish technology team is a dapper engineering technology team. We not only pay attention to the effective solution of business problems, but also promote the cutting edge practice of computer vision technology in mobile terminals by breaking the division of division of technology stack (the unification of android/iOS/Html5/Server programming model and language). As a software engineer in the Idle Fish technology team, you have the opportunity to demonstrate all your talents and courage in the evolution of the entire product and user problem solving to prove that technology development is a life-changing force.