Why did we develop the WEBRTC framework independently
-
There are no good weBRTC-based frameworks on the market right now, so we want to build a framework that takes full advantage of the free extension capabilities, security, and some of the great development features and ecology of the WEB. For traditional native audio and video applications, special technical optimization is generally made on the client side of a specific platform system. For example, The Window platform uses MFC or WCF technologies to achieve hd decoding of 16/32/64 channels of video, etc. For example, some content distribution applications need to highlight the ability of content updating and layout expansion. For example, we often use the desktop version of wechat, which makes use of the features mentioned above. In addition to content rendering, in order to ensure the security of transmission, the network communication part is put on the native platform to do.
-
What will the future of Web applications look like? When Google’s CEO defined web3.0 in 2007, he said he wanted the future of applications to be small, fast, and cross-platform. Today, like we now often use wechat, Alipay inside the small program, Huawei fast application, we can directly jump to the target platform by scanning the TWO-DIMENSIONAL code or click the link, this is some of the advantages of WEB applications, our front end to carry forward these advantages!
Problems faced by Web applications based on WebRTC
At present, the biggest problem for WebRTC is that the user experience is not ideal and there is often a lag. We can simply analyze the main problems from the following three points:
-
The Web platform is seriously fragmented. WebRTC official standard has not been determined until the last two years, and the standard iteration is very fast, such as foreign Chrome, Firefox, domestic 360 browser, QQ browser and so on, have their own ideas on the implementation of optional.
-
WebRTC implementation details are not controllable. For many Web developers, we tend to be more focused on the Media, Vedio these HTML tags, but such as I want to control some of the network transmission FEC, weak network confrontation, such as bandwidth estimation, priority protection algorithm, using special coding decoding, rely on pure Web is made, this is not the mission of the browser, If you want to have this lower level of control, you have to find other ways to participate in it.
-
Computationally intensive algorithms have few options. Most of the time, we need to perform some pre-processing actions on the video, such as beautifying function. Now, the common practice is to first obtain some original video data locally collected by users, and then draw it to canvas. Imagedata is obtained through canvas interface. Passing the ImageData to the Web Assembly, then performing further algorithmic operations on it, such as skending, brightening, and so on, and then redrawing it to the Canvas is a long link. Another point is that The Web Assembly technology itself is constantly evolving and has its own limitations, such as the implementation of Web Assembly based on chrome’s render main thread driver. Although the implementation of worker threads and parallel computing of Web workers appeared later, the execution performance was not particularly good. If we hoped to make full use of computing resources, it would be difficult to achieve in this case.
In addition, for the GPU part, WebGl implementation is focused on the details, because the browser WEB itself is oriented to the rendering of the entire TAB page, not only the video window, so the performance is not good compared with professional video rendering performance.
Sound network some practice to solve the idea to share
-
Unified rendering engine. To solve the fragmentation problem of The Web platform, we provide a consistent rendering engine, so that users can use a unified running rendering environment, so as to solve the fragmentation problem caused by the differences of manufacturers and versions.
-
The implementation details and algorithms of WebRTC belong to the extensible part of the application. We will try our best to encapsulate these capabilities into the internal implementation of the native platform.
Frame selection for PC applications
At the beginning we do technical evaluation, several kinds of optional framework, Electron, CEF, Qt, and so on, but a lot of development framework, like the CEF and Qt based technology stack preferred C, C + +, Java, etc., to develop habits and ecologically and front-end development experience, we especially it is need to build independent development tool chain, So we finally chose Electorn, which is a technical solution based on Nodejs. In front end development, we can use familiar development environment, such as Npm, YARN, and the whole construction process including subsequent test packaging.
Some issues when extending the ability to mix injection
- For the injection of native extension capability, we combine the implementation of some key business logic with the native capability by using C++ code in V8 engine, and then expose the capability to the upper layer through bridge. The biggest problem here is that the underlying C++ interface is not the same from V8 version to V8 version, and the interface implementation is slightly different from nodejs version to nodejs version, which has long been a problem for the community. Our solution here is to use the Node API implementation provided by NodeJS, which is backward compatible, in order to maintain the consistency of the underlying interface and solve the problem of developing plug-in interface and Node version dependence.
When our JS layer call is injected into native through bridge, it drives the Native SDK developed by another team, which contains the main functions and algorithms of sound net, such as our RTN network access, AI noise reduction algorithm, background segmentation, hypersegmentation algorithm, etc. In addition, in terms of front-end development experience, users only need to change a small amount of code to complete the adaptation of their own APP developed based on Agora SDK NG, which is also one of the original intentions of our framework. On the basis of maintaining the experience of external development users, it can improve the application ability inside it.
- Asynchronous event-driven The characteristics of the audio and video application is its data volume is large, each frame of data according to different user video resolution, from a few K to dozens of M are likely, at the same time, we must also to decrypt the data, coding, and so on, this requests us to fully use the user local computing power, This involves scheduling operations related to sub-threads. With asynchronous event drivers, we can design queue management of asynchronous events, native platform production and main thread consumption. In addition, because of the large volume of single data, if the rendering thread and asynchronous thread data exchange through copy interaction, memory overhead can be expected, so we must use some Node API external-ArrayBuffer technology, similar to intelligent pointer reference solution. We only need to ensure the lifetime of the Buffer, that is, it is available during the js layer call, and the Buffer is destroyed at the same time as the pointer is destroyed to avoid memory leaks.
- Dynamic resolution of the problem Limited network storm and so on questions, our users a lot of times it is difficult to keep 1080 p resolution remains the same, this time we will need some balancing algorithm, make a balance between frame rate and resolution, in this case, the video size can change, then we will need for this kind of situation to do some special processing, Canvas can be adapted to the width, height and step size of video data frame by updating texture, and then control the overall page rendering proportion and layout through CSS style. There is no need to do re-sampling to ensure the aspect ratio, which is obviously different from traditional native video application rendering.
Audio playback
Local audio collection generally does not require local playback, but in some business scenarios, users want to perform some special processing according to their voice feedback. For example, users want to adjust the relative position of the microphone by intuitively hearing their own volume. And this is in order for us to ensure that the agora SDK NG features are consistent. Because the browser audio playback is executed asynchronously, that is the main thread to forward the audio data to audio threads, thread audio broadcast after the main thread, the main thread to handle the next audio data segments, the consumption of the process itself is there will be always, so intuitive experience is in real-time audio playback, with growth time, delay accumulation, And the data playback interval appears obvious noise.
Our solution is relatively simple and direct. We use the Audio workletprocessor interface to control the frequency of audio transmission to the audio player thread, and stream the audio data to the asynchronous thread, so as to avoid the delay accumulation caused by feedback between threads, which sounds very smooth. Moreover, each data segment is relatively short. Around 10ms, the user feels that his voice has become louder and there is no obvious echo.
future
In the future, we hope to expose more acoustic Network nativeSDK capabilities to the upper applications, such as our desktop sharing, AI noise reduction, bel Canto, VP9 encoding and decoding, background segmentation, beauty, hypersegmentation and so on. We have entered the testing process inside acoustic Network. In the future, we also hope to give our users more convenient and powerful Web application development capabilities.