A retrospective video of the first two RTC Dev Meetup: The Flutter development Technology session has been uploaded to the RTC developer community. We put together a presentation on the “Flutter Real time Audio and Video Practice” shared by Agora Senior Architect Ganze Zhang. The following is a transcript of the speech.
What I’m going to tell you today is my experience with the real-time audio and video development practice of the Flutter. My name is Zhang Ganze, graduated from The University of Oxford, UK. I used to work as a mobile terminal architect in SAP. I joined Agora in 2017, and now I am a senior architect of Agora, mainly responsible for the application research and development of RTC technology in entertainment, live broadcasting, education and other industries.
First of all, RTC applications. What is an RTC application? RTC is Real-time Communications. People have come into contact with a lot of RTC applications in daily life, such as live broadcasting. People usually watch panda live broadcasting, Douyu live broadcasting, and real-time games, singing and dancing of anchors with their mobile phones. There is also education, which can be realized through the Internet. For example, students can interact directly with teachers in the United States in real time.
On the medical side, we’ve seen a lot of cases recently. Before you want to go to the hospital to register queue, wait for a morning, the doctor sees, then tomorrow come again when re-register, it is a morning of time past. Now with RTC, you can see a doctor at home by video, and the doctor will ask you questions, and if it’s simple, you don’t even need to go to the hospital.
Today we’ll take a look at common video calling applications and see how to implement them based on Flutter and what the difficulties might be.
Architecture logic and implementation ideas
Let’s start with the architectural logic of Flutter. Those of you who are familiar with Flutter may know that the most different way it renders than React Native is that it does not use the system itself to provide views for some of its own Native controls. So when you’re developing an application, how do you render it?
It creates a Layer Tree, which is its Tree structure, and sends the data or rendering place to Skia. When Skia gets the Vsync signal from the GPU, it sends the data to the GPU for unified rendering. There is no way to pass Native views like UI View.
Once I knew the concept, I started to develop the app. Because the App itself provides some common components, such as a framework for the App, buttons and navigation bar, Flutter has off-the-shelf components that allow you to draw these things directly on with some very simple configuration.
The Flutter itself also provides a way to communicate with the Native SDK, called Channel Massage, much like React Native calls Native. So we had the Agora SDK, which was going to help us with the audio encoding and decoding, the transmission of the audio, and then we could use the components of the Flutter itself to draw the UI as well, as if this thing was working. Just when everything seemed so beautiful, I got stuck on the video. How does Flutter render videos?
On Native platforms, there are system components to render videos, but Flutter doesn’t have anything like that. I’m going to render the live video now.
So I looked at the source code of Flutter, and there was a component called the Video Player, and this was for Video. One of them is the Texture Widget, which provides a mechanism for rendering videos in this mode.
Texture Widget
First, a video is made up of several frames. The Texture of Flutter provides a component that can be placed in the Layer Tree. The data source in the component needs to be provided by you via the Native side.
In this case, for iOS, you need to provide CVPixelBufferRef. This is a piece of data that corresponds to a frame in the video. Provide this data as a data source to the Texture Widget, and the Widget will display the data you provide, eventually turning it into a video.
Above is a rough implementation architecture. The first step is to implement a Native Plugin on Dart. Plugin Registar is an entry to the Plugin. There is a property called TextureRegistry where you can register your own Texture class. This class may implement a protocol called Flutter Texture. There are two methods you need to implement to implement this protocol: one is called CVPixelBufferRef, which provides the data source to your Texture. The other method that you need to do this is called textureFrameAvailable. This tells the TextureRegistry that it is now possible to update the screen based on the data provided.
One of the questions is that I have TextureWidget. If I update the data source, how will it know which Texture I want to update? In fact, when you register the Texture, it returns a TextureID. You need to pass this TextureID in when you create your Texture Widget. In this case, the Texture component will bind to the practice you provided, knowing that the data source you provided is the data for the Texture. In the same way, we can have many textures.
Going back to our previous design (as shown above), we have a video calling app, we have a lot of native videos, we have a lot of remote videos, these videos can be different Texture widgets that are placed in Dart’s page container, and then we provide data sources for each of them and render the data.
Does it feel like everything is ready now? There’s another problem. The Official Github website Flutter provides some sample code, but what it does is it lets you play a static cache. What we offer here is a live video stream, not a video file. It is not possible to read a file and convert it frame by frame and pass it to the Texture Widget. We need a way to get this real-time data and pass it to Texture as a data source.
This is where we need to use the Agora SDK for soundnet. First of all, I want to popularize a concept for you. Imagine that when you watch a live broadcast, there is a camera in the live broadcast. He plays some games, dances and sings some songs. In summary, it is a very simple process, as shown in the figure above. First, the anchor has a camera to collect data. This data is raw data. Once we have that data, we need to code it. During encoding, data can be processed and formatted, for example to a specific resolution and frame rate. Livestreaming does not transmit all the data from the camera, it will compress and process the data, which is the process of coding. And when we’re done, we have a video data, which is transmitted quickly to your phone through the cloud service of the Sound network. Mobile phone client also sound network SDK, after getting the data, the data will be decodes, and provide a method, these video data to the UI View, the final display for the video you see.
As shown in the figure above, we added a sound network SDK based on the architecture we just described. The SDK has a method called AgoraVideoSink. It provides a callback that will send you all the video data you receive in the format you want. After we get the callback data, we send it to The Flutter Texture. We can set an update frequency and notify TextureRegistery to update the new data to the Texture every time we get a callback. Through this process, we were able to create a real-time audio and video App with Flutter.
Implementation idea 2: PlatformView
Since Texture involves a lot of rendering, many people find it a bit complicated. So in The 1.0 release of Flutter, Google introduced a new thing called PlatformView.
It gives us a way to create UI views and add them to Dart’s LayerTree. The classes in Dart correspond to UIKitView on iOS and AndroidView on Android.
So how do you use PlatformView? In PluginRegistar, we added a ViewFactory. The CreateView method is the only one that needs to be implemented. You can first provide a Identifier in this method, and after implementing the method, you can return the PlatformView you want and bind it to the Dart component.
Because our SDK supports passing Native views and then rendering the video onto them. As shown above, let’s make a change based on the architecture we just described. The SDK provides a CustomViewFoctory that creates a native View. The SDK’s AgoraRtcEngine then binds to the View and automatically renders it to the View when it receives a stream of data. Because each View is going to have a ViewID. So all we need to do is go through the ViewID to get this View and render it.
The performance comparison
To sum up, there are two ways to implement live audio and video in Flutter, one is Texture and the other is PlatformView. We compared their performance with that of Native implementations, and the results are shown in the figure below.
The implementation of Texture is relatively poor. PlatformView performance is similar to that of Native. As far as the Demo is concerned, there are two possible reasons why the Texture method performs poorly:
First, the Texture will be updated as soon as the video data is received, but there is little need to do this and some optimization could improve performance.
Second, we use a callback to get the data and then provide it to the Texture of the Flutter. This process is done by the CPU, but generally the rendering work is done by the GPU, so the data from the GPU to the CPU and back to the GPU is very resource-intensive.
To summarize, I personally compare the advantages and disadvantages of Texture and PlatformView. The Texture should be more Dart. It just doesn’t have a lot of native involvement, like the UI View, which is more in line with the Dart ecosystem and is more pure in design. But it also has disadvantages. If you use it to do some things without special processing, it may be difficult to avoid the intermediate data copy between CPU and GPU, resulting in performance loss. If you were to use it to make a map, you would have to render the map data on top of the Flutter and do it all over again using the logic of the Flutter. This, for most developers, is very bad for a thriving ecosystem. So I personally think that’s why Google added a PlatformView in Flutter 1.0.
What are the advantages of using PlatformView? As I said, it can take the functionality that you’ve implemented in Native View and put it directly into Dart. For developers, it’s easier to use. It can also avoid the performance loss caused by data copy on the CPU and GPU. The problem, however, is that it’s not pure in design, and it introduces factors that you can’t control.
A detailed powerpoint presentation and review video can be found here