Awesome: Cross-platform technology application of audio and video front-end

Flutter is the cross-terminal framework of the fire in the past two years. Due to the epidemic, real-time audio and video are increasingly integrated into People’s Daily work and life, such as online meetings and education. What kind of spark can the two combine? With the Flutter REAL-TIME audio and video SDK, we can quickly develop a cross-platform APP for meetings, entertainment, education, etc. LiveVideoStackCon 2021 Invited Niu Zan, senior engineer of Tencent Cloud, to share with us how To use Flutter to render audio and video in real time and optimize the performance of video rendering.

Article | praise

Finishing | LiveVideoStack

I come from Tencent Cloud audio and video, this sharing theme is audio and video front-end cross-platform technology application.

I joined Tencent in 2015 and have been responsible for king of Glory, League of Legends guessing, QQ membership and other businesses. At present, I am responsible for the front-end technology research and development of Tencent cloud real-time audio and video TRTC.

1. Cross-platform technology

Why cross-platform frameworks in the first place? Because it can ideally achieve a development, multi-terminal operation, component interoperability, improve efficiency. For managers, it reduces labor costs and eliminates the need to have separate IOS and Android teams. For developers, learning costs can be reduced by understanding a set of cross-platform frameworks to achieve dual-end development and increase their business value.

There are several stages in the development of cross-platform technology:

The first stage — Hybrid APP. The core principle is to encapsulate the native interface and expose it to JS. Business runs in external applications, backed by the front end of the huge ecosystem, development and iteration speed is very fast. The disadvantage is that the ability is limited to the adjustment layer, the scalability is weak, and the performance reflected in Webview is poor.

The second stage — In 2015, Facebook launched ReactNative. The core change was to abandon inefficient Webview rendering and convert it into Native control, which was handed over to the system for drawing. The advantage is that users can use the front-end development system (the huge React system), and its rendering is handed over to the system rendering, so its performance is better than Webview. However, the disadvantage is that it needs to communicate with Native during rendering. When the user is in a scene with frequent communication, poor processing will lead to lag. The underlying layer of ReactNative uses JS language and can only be compiled in time with JIT. There is a certain gap between its performance and Native end.

Phase 3 — In 2018, Google launched Flutter, a set of code that allows simultaneous construction of multi-platform applications. It supports hot overloading for efficient development, uses Dart voice, JIT compilation and AOT compilation, and has a rendering engine, SKia, that delivers nearly native performance.

Compare the heat trends of Flutter and ReactNative using statisa, Github and Stack Overflow data. It can be found that Flutter has surpassed ReactNative in popularity and is a late player in the cross-platform field. It is the hottest cross-platform technology solution at present.

The diagram shows the architecture of Flutter. The green part is the Framework of Flutter, which is a UI SDK implemented by Dart. From top to bottom, Flutter includes two component libraries, basic component libraries, graphics drawing, gesture recognition, animation and other functions. Two major component libraries implement UI components based on IOS and Material UI style respectively, making UI development more flexible. The blue part is the core Engine of Flutter, which implements the Flutter rendering Engine, Dart VIRTUAL machine, Platform communication channel, time notification, plug-in architecture, and other functions.

The Platform communication channel feature applies to the encapsulation of SDK interfaces and asynchronous message delivery of Flutter and Native. Messages are sent and responded asynchronously during the whole process to avoid blocking the UI. The Flutter engine has completed the bridge connection. Users can use the Flutter Dart directly by writing the underlying IOS/Android code in the communication layer.

2. Design of TRTC Flutter SDK architecture

The picture shows the architecture of the Flutter SDK. The SDK is packaged based on native IOS/Android and can be directly aligned with the native SDK to maximize the existing capabilities such as audio and video collection, encoding and decoding.

During the process of designing the framework, we did the following:

Optimize data communication capabilities: Since the message channels that Flutter communicates with the native SDK only support simple technologies such as basic types, optimizing data communication capabilities enables it to support more complex technology types.
Design the experimental layer to be more scalable: Considering that Flutter will support more platforms later, this design makes it easier to scale more platforms in the future.
Aggregation of beauty, equipment, audio related APIS: for developers to use Tencent cloud API, easier to use.
Optimized video rendering capability: GPU performance basically reaches the level of native SDK.

Flutter communication and native communication only support basic data types, which brings the following challenges:

How to implement complex class structure transfer?
How can images be transferred efficiently between Flutter and the native SDK?
Flutter does not have a system view component similar to the native platform. How does Flutter render video?
How to help developers quickly access the various API interfaces?

These four questions will be discussed in detail below.

The essence of Flutter is that Dart calls the Native interface and asynchronously returns Native data. There are a lot of class structure type definitions in the native SDK. For example, TRTC Params in the entry interface defines the application ID, user ID, user key and other relevant information. Since the original message channel does not support the transmission of such class structure, we have upgraded the data communication capability.

First, transform the class structure defined by Flutter into a Map object, and then serialize it with JSON. The underlying message channel will efficiently serialize the transmitted data into binary transmission. After receiving the data, the communication layer deserializes it and transfers it to the native SDK after converting it into the corresponding JAVA class structure.

The class structure can be used to constrain parameters and verify types, which is convenient for developers to find problems early and improve usability.

Sometimes, videos need to be watermarked in the live broadcast scene (such as the panda watermark in the lower right corner of the upper left picture). Setting watermarks and other interfaces for videos during the live broadcast requires uploading the image resources defined by the Flutter project to the native SDK. The underlying SDK can only recognize one Bitmap object. The problem is that Flutter does not have a Bitmap. How do I convert the image resources of the Flutter project into the Bitmap required by the native SDK?

This is done by copying images from the folders on the Flutter layer to the folders, and then transferring the image file addresses to the communication layer. The communication layer receives the address and parses it into the Bitmap object required by the Android native SDK. Although sharing can be achieved by transferring file paths through the document directory, when testing a 100KB picture, it was found that copying the file was time-consuming. Therefore, we considered whether the copying process could be abandoned to make the image transmission process more efficient.

By looking at the source of the Flutter Programme, we found that the image resource files of the Flutter programme are packaged under the native resource bundle. Besides, the API exposed by Flutter enables the communication layer to access the resource path. This allows the Asset resource address of the Flutter images to be passed directly to the communication layer. The communication layer gets the address and calls the AssetManager API provided by Flutter to read the object directly and convert it into the Bitmap object required by Android.

Network pictures can be downloaded to the document directory in advance by means of pre-download so as to realize the transmission of network pictures.

If the communication mechanism defined by Flutter is used to achieve rendering in Flutter, every frame data collected by the camera needs to be transmitted from native to Flutter, and the real-time transmission of image frame data through the message channel will inevitably cause a huge consumption of CPU and GPU performance.

To this end, Flutter offers the following two video rendering schemes:

External textures: Native OpenGLl image data can be shared with Flutter for rendering. The native SDK is required to provide the video frame image data callback interface, which is relatively complex to implement.
PlatformView: Mainly applies to Flutter components that are not easy to implement, such as Webview, video player, map, etc. This gives Flutter the ability to embed native Views on Android and IOS platforms. PlatformView also injected strong vitality into Flutter when its technical facilities were not yet mature. Components that are not easy to implement in the Native end can be embedded into the Native PlatformView through PlatformView solution.

The Native SDK provides a video rendering view component. We just need to take advantage of PlatformView’s ability to embed the Native video view into Flutter. PlatformView is actually AndroidView on Android, and the first argument at the bottom of the image, ViewType, is used to uniquely identify the Widget and associate it with the AndroidView.

PlatformView implementation is not complicated, simple understanding of the official documentation can use the formula and the official provided framework code to construct a PlatformView.

We first tried the PlatformView scheme, which was easy to implement. After the video rendering was packaged, we tested the performance of the output video screen. A low-end machine of OpPO was used for the test. When there were 6 users in the room, the rendering of the second screen was abnormal (figure 2 on the right).

Performance analysis was conducted by PerfDog of Tencent Cloud, and it was found that GPU occupancy was abnormally high, so a series of optimization measures were continued.

The default ListView of Flutter does not support lazy loading, so we replaced it with ListView.builder. At the beginning of the test, lazy loading did not take effect and preloading was supported by default. Considering that the increase of video rendering puts a great load on GPU, the preloading capability is abandoned, and the video in non-visible area is further recycled. When sliding to the second screen, the pull stream rendering of the video on the first screen is stopped.

After optimizing the video list, the GPU usage is reduced from 72% to around 50%, and the video images can be rendered normally.

After the first phase of optimization, we didn’t stop there. After the launch of optimized Flutter, customers will compare the difference between Flutter and Native, and the data shows that the CPU and memory usage is similar to that of Android Native.

However, the Performance gap between the Flutter GPU and native Flutter is significant, up to 15%.

Then we carefully compared the implementation principle of PlatformView, and found that for Android, in the virtual display mode, its bottom layer is also rendered with external texture, and there is a graphic buffer in the middle. Every pixel of the rendered video View at the Native end flows through this graphic buffer. The image texture data is then rendered on SurfaceTexture (the drawing board provided by Flutter). Finally, Flutter renders the entire video from the drawing board data.

In the above steps, the main performance cost is the graphics buffer, because the video rendered at Native will be re-drawn to the SurfaceTexture, resulting in a significant waste of video memory and graphics performance. The solution to this problem is to output the SDK raw data interface directly to the SurfaceTexture drawing board via OpenGL instead of using the Native system View component.

The architecture of the final video rendering is shown in the figure. When the remote user enters the room, the machine receives the entering signal through the cloud service. For example, many people are in a room, and a new user enters the room, the machine needs to render a new user, first send the pull flow instruction, and the Android native SDK calls back the video frame texture data frame by frame. Then draw to the SurfaceTexture palette using OpenGL. The Texture ID returned by the communication layer is given to Flutter to find and use the corresponding drawing data on the GPU. The final rendering is done by the Flutter engine.

The optimised Flutter GPU improves performance by about 10% and is almost at the level of the Android native SDK.

There are many original SDKS and apis, including over 100 of the Flutter apis alone. Apis are expensive for developers and customers to understand and learn, and take a long time to access. So we built a series of scenarios, customers can look for their own business scenarios, reference source code implementation, improve access efficiency.

TRTC application scenarios include audio and video calls, multiplayer meetings, online education, interactive live streaming, voice chat rooms, werewolf killings, online medical, online karaoke, and more.

The separation mode of UI and scene SDK is mainly adopted in the process of scene design. Customers can directly refer to the UI interface for development, or use the encapsulated scene SDK to customize UI. Tencent cloud function service is adopted in the background of scene development to reduce the customer access threshold. All components are serverless, no operation and maintenance is required, and labor costs are saved. The bottom layer relies on TRTC SDK for audio and video transmission, and IM SDK provides signaling and group chat capabilities.

It then introduces some of the application scenarios that have been implemented.

In the voice call scenario, the user sends a call request. After the call is accepted, an audio and video call is established, similar to the voice and video call function of wechat. Interactive live broadcasting includes interactive link mic, anchor PK, low delay viewing, bullet screen chat and so on. The delay can be controlled within 300ms, and advanced beauty such as lean face and micro face can be provided in the process of live broadcast. The comparison of effects after micro face operation can be clearly seen in the picture. Video conferencing is great for communication. Voice salons, such as ClubHouse, which is very popular at the beginning of the year, allow users to join a room on topics they are interested in. In the room, guests make speeches and other audience members listen in. If the audience wants to speak, they can raise their hands and apply to be a guest, and then ask questions or speak. In the online education scenario, teachers can choose teaching methods such as voice, video, and screen sharing.

Combined with the online education scenario, briefly introduce the common SDK implementation concepts. This SDK mainly aims at the secondary encapsulation of real-time audio and video and communication capabilities in online education scenarios. It not only encapsulates the basic audio and video chat and screen sharing capabilities, but also includes the ability of teachers to ask questions, students to raise their hands, teachers to invite students to answer, and the answer is finished. If customers are not satisfied with the UI implemented by default, they can use the scene SDK for personalized development. There are basically no more than 30 interfaces for the scenario-based SDK, and the semantics are more scenario-based. It takes 2-3 months for customers to connect to the original API; Scenario SDK is used for interconnection, and the interval is only one month. Reuse scenarios include UI component libraries that can be launched in as little as a week.

At present, more and more companies have tried to use Flutter in their new projects. Here are some typical users who use Flutter, including Yell Live, Binance and Tencent Games Youth Live, which have interactive live broadcasting scenes. Tanzhou Education, Rio Tinto Feiyuan, and zhaopin.com, which makes audio and video calls.

If a new business needs audio and video, Flutter can be a great choice.

3. Future vision of Flutter audio and video

Currently, Flutter is mainly used on mobile devices, both iOS and Android. The vision of Flutter is to be a multi-terminal UI framework that can support not only mobile devices, but also the Web and desktop (MacOS/Windows). Officially, Flutter is expected to support the desktop by the end of this year. Our team has integrated the desktop side in the Beta phase into TRTC audio and video capabilities, and opened the support for MacOS/Windows, which can support audio and video calls, but screen sharing and other capabilities are missing. We will make them up later.

The overall architecture of FlutterWeb and FlutterNative is similar. They both share the Framework layer. The core difference lies in that FlutterWeb rewrites the engine layer and uses DOM/Canvas to align FlutterNative’s UI rendering capability. Make the UI written by Flutter display properly in the browser. Although FlutterWeb officially opened support for the Web earlier this year, the following issues remain:

At present, all files are packaged as main.dart.js, which inevitably leads to very large image files, basically 1-2MB. In addition, there is a lack of JS unpacking, file hash and other work, which affects page loading performance.
Since FlutterWeb itself implements a set of page scrolling mechanism, during page scrolling, location information will be frequently calculated, resulting in re-rendering of the scrolling area, which ultimately leads to poor page scrolling performance.

Currently, Flutter is officially recommended for the following three scenarios:

Progressive Web applications built with Flutter;
Single-page applications;
Publish existing mobile applications to the Web.

Flutter is not suitable for the document-centric waterfall scenarios common on the Web side.

Currently, our SDK has opened Web support in the beta version of Dev. Compared with Native, there is a new Web compatibility layer. This layer is mainly designed to be compatible with the DESIGN of Flutter Native API.

There is a big difference between WebApi and NativeApi, so we also do a lot of logic in the Web communication layer to smooth the difference. The underlying WebSDK is a real-time audio and video call solution based on WebRTC. Currently, it mainly supports Chrome58+ and Safari.

In the future, Flutter will become more and more capable of supporting both desktop and Web applications. It is worth looking forward to the idea of a single framework for all platforms.

4. Visualizing audio and video capabilities on the Web

The audio and video capabilities of the Web side are also evolving, and the browser has become a complete multimedia engine. It mainly introduces three new features brought by the browser: at the coding level, webCoDECs can be used to do low-delay encoding and decoding, and dynamically control the key frame and coding rate of encoding; Transmission part, using WebTransport to provide flexible and controllable high-performance UDP transmission capacity; WebAssenbly enables you to reuse complex algorithms written in C++. On the browser side, you can use WebAssenbly to compile complex C++ algorithms into code that can be run by the browser for audio noise reduction and echo cancellation.

Based on Tencent’s 20 years of audio and video accumulation, to further expand the capabilities of the browser, in the browser side to recreate a custom RTC technology engine, the benefits of doing so have many:

Can reuse a set of TRTC stack, a set of C++ code can be reused in multiple platforms. We’ve accumulated a lot of best practices on the Native side that can be applied to the Web side as well.
More controllable RTC QOS regulation capabilities, such as in live scenarios, can sacrifice a certain amount of delay in exchange for live clarity.
Richer usage scenarios, the underlying technology can also be reused into the live push stream SDK and player SDK.

The RTC engine for the next generation Web is expected to be released next year.

In the current video conferencing products, virtual background has become a standard capability. For example, in a video conference, the background might be at home, which is less formal, so you can replace it with an appropriate background image. Using the graphics rendering ability of WebGpu/WebgGl, the machine learning ability of TensorFlow and the multi-thread computing ability of WebAssembly to achieve the portrait segmentation ability of the Web version SDK.

In addition, we did a lot of optimization to reduce the model file to less than 1M. We also provide a rich API to help users achieve a good balance between performance and performance.

Combined with the new features of the browser, many functions can also be developed. WebAr engine can be built through shader algorithm, which is suitable for beauty and makeup, interesting stickers and other scenes in the process of live broadcast.

In daily enterprise live broadcast, OBS has become the standard of enterprise live broadcast. As the capabilities of browsers improve, a Web version of OBS becomes possible, which provides the following advantages:

Multi-source support, can simultaneously support multi-person call live;
Wysiwyg effects can be dragged to change the layout;
Simple operation, open the web page can be broadcast.

Web OBS can replace 80% of obs. it can be applied in enterprise live broadcast, online live broadcast and other scenarios.

Thanks for sharing!

Scan the QR code to learn more about the conference

Awesome: Cross-platform technology application of audio and video front-end

Related Posts

Webpack5 hot replace JS CSS for local updates

How did I deploy the Node +mongodb project on the server and optimize its performance

[34]