The rise of online meeting, online education, e-commerce live streaming and other scenes has also made real-time interactive technology go from behind the scenes to front of the stage, attracting more people’s attention. A series of RTE related technologies such as codec, network transmission and computer vision are also showing stronger vitality. In 2021, with the help of deep learning, 5G and other technologies, what further possibilities will RTE spawn? The Agora Developer Community, together with InfoQ, invited a number of technical experts from the Agora developer community to write the “2021 Real-time Interactive Technology Outlook Series” from the perspectives of video transmission, computer vision, codecs standard development, WebRTC, machine learning, audio technology, etc. A glimpse of new trends in technology. Zoe Liu is chief scientist and co-founder of Microframe Technology. This series is co-curated by the Agora developer community and InfoQ, and reviewed by InfoQ. It was first published at InfoQ. In June 2018, AOM (Alliance for Open Media) released AV1 (Alliance for Open Media Video 1), a next-generation Video coding standard. So far, AOM has 47 corporate Members, including 14 Board Members and 33 Promoter Members.
AV1 version zero is derived from libVPx, the same open source, royalties-free VP9 codecs, and incorporates the research and development results of Google VP10, Mozilla Daala and Cisco Thor. As of June 2018, AV1 has introduced more than 100 new coding tools compared with its predecessor VP9, representing the latest coding technology in the industry.
In this article, we will explore the technology trends that may emerge in the future for AV1 in real-time scenarios. At the same time, due to the limited data of AV1 in the real-time scenario, we will share the performance data of Aurora AV1 in the real-time scenario and the comparison statistics with existing encoders, including H264, VP9 and other open source encoders, in order to illustrate the changes more directly. Aurora’s example and data are intended to demonstrate that the AV1 standard is fully operational in real-time scenarios. We are also looking forward to exchanging and discussing with our industry colleagues about these research data.
Application practice and ecological development of AV1 in RTC scenarios
RTC technology upgrade and application expansion have been surging in recent years, especially under the epidemic in 2020, the RTC field has witnessed explosive growth, covering video conference, online education, remote terminal, game interaction, e-commerce interactive live broadcast, telemedicine, online finance and other fields. The typical video content mainly falls into two categories: Screen content and camera Talking Head content. For RTC ultra-low delay interactive scenes, the polishing and application of video encoders have strict requirements on encoding delay, coding speed, coding complexity, adaptive coding control, and fault-tolerant performance with the network layer, in addition to the basic considerations of encoding efficiency and video picture quality. AV1’s rich coding tools, such as its unique screen content coding tools, make it possible for AV1 to improve the user experience of RTC real-time interactive scenarios.
\
WebRTC is the most influential real-time interactive open source project in the industry, providing audio and video APIs for Web and mobile RTC applications. In January 2021, the W3C standards Organization officially confirmed WebRTC 1.0 as a recommendation. The WebRTC open source code library mainly includes three open source video encoders VP8, VP9 and H264 Openh264 in libVPx. AV1 is derived from VP9 and has natural coupling and coordination with WebRTC, including Temporal Scalability support. At the same time, AV1 is the first video Coding standard to introduce a Screen Content Coding (SCC) tool into its body, that is, any AV1 standard decoder needs to support SCC. This gives AV1 a huge advantage over other standards in dealing with computer-generated content in real-time scenarios.
The effective soft solution of AV1 is an essential solution for RTC scenarios, whether on PC platform or mobile platform. AV1 software open source decoders, currently include Libaom maintained by AOM/Google, SVT-AV1 maintained by AOM/Intel, Libgav1 launched by Google especially for Android devices, Dav1d, maintained by VideoLAN, FFmpeg open source community and funded by AOM. According to our user reviews, DAV1D performed best overall, and in January 2021, DAV1D 0.8 was released, with further improvements in AMD and ARM Architecture.
The AOM/AV1 open source codec libaom real-time file, also known as Libaom-RT file, has been incorporated by WebRTC and officially adopted from Chrome version 89. In 2020, Google real-time call product DUO and video conference product Meet are based on Libaom-RT AV1, and AV1 is the first to be applied in RTC scenarios. Cisco WebEx has since announced that it is starting to use AV1 codecs on PC for its video conferencing scenarios, especially for screen sharing scenarios.
The Microframe team launched the fully developed Aurora AV1 encoder in 2019 and became the world’s first provider of AV1 business encoders for RTC scenarios. Aurora AV1 has been honed and refined in practical applications, and now works well in PC screen content coding and camera Talking Head scenarios. The performance data for this article are based on Aurora AV1, which is also maturing in mobile and other ARM applications.
Of course, coding standards, no matter how advanced, need a complete, sustainable ecosystem to support them. AOM members cover the complete ecosystem from video collection and production, transmission and sharing to playback and consumption. In the field of RTC, AOM members also include many global leading enterprises in RTC technology and application, such as Agora, Cisco/WebEx, Poly and so on. Meanwhile, AV1 members include browser providers such as Google Chrome, Apple Safari, Microsoft Edge and Mozilla Firefox; Hardware manufacturers: Such as Intel, AMD, Nvidia, ARM, SAMSUNG, Xilinx, Broadcom, and China’s Huawei, etc. Cloud service providers: Such as Amazon (AWS), Microsoft (Azure), Google (GCP), IBM in North America, as well as Alibaba (Ali Cloud), Tencent (Tencent Cloud), Jinshan Cloud, Huawei (Huawei Cloud) in China; It also includes networking and systems providers such as Cisco. AV1 has natural ecological advantages.
AV1 RTC, currently supported in browsers (except Safari, although Apple is a board member of AOM) and Android mobile OS support, hard solution support is growing. Apple is a member of AOM’s board of directors and has shown a positive attitude towards AV2. It is expected that Apple will support AV1 in the near future. In addition, Qualcomm (QCOM) is not an AOM member, but it is widely believed that the industry will launch an AV1-supported hardsolution chip by the end of 2021 or early 2022 at the latest.
AV1 RTC Screen content encoding
In the AV1 standard, specific tools such as IntraBC and Palette mode are provided for screen content encoding. In addition, THE CfL (Chroma-from-Luma) tool, while not specifically designed for screen content, is an effective tool for screen content coding.
Note: In the figure, the FFmPEG command line -ffmPEG -r 30 -s 1920×1080 -c:v libx264-x264-params bframes= 0-tune zerolatency-preset is used superfast -threads 1
Aurora AV1 is superior to existing coding standards, including VP9 and H264, in terms of content compression efficiency at different screen resolutions. As shown in the figure, for example, Aurora has a bD-rate (PSNR) gain of 81.25% for the 1080p30 screen content test sequence set compared to the open source X264 Superfast real-time file using ordinary PC single-core resource coding, i.e. For the benchmark set, Aurora AV1 needed only x264’s (1-81.25%)=18.75%, or less than 1/5 of the bit rate, to obtain a similar PSNR objective quality.
The figure above shows the Aurora AV1 coding speed compared to the X264 Superfast file. For single-threaded 1080p screen content video, X264 speeds up to 132+FPS (frames per second), while Aurora is 46+FPS, about a third of the encoding speed of X264. Aurora’s encoding speed is not nearly as fast as X264, and considering that the frame rate required for screen content in most scenarios is generally lower than that required for normal camera content, AV1 is fully usable for screen content RTC scenarios.
AV1 RTC time domain scalable encoding
Temporal scalability and adaptive frame loss are particularly important in RTC scenarios. Due to dynamic changes in network conditions such as network bandwidth, RTT delay, Jitter Jitter, and packet loss, the encoder needs to work with the network control layer to make adaptive adjustments. Time domain scalability of video encoders is more important than spatial scalability, because time domain scalability is better in terms of overall performance of encoders against dynamic changes in network bandwidth, fault-tolerant robustness, coding efficiency and video subjective experience, and is suitable for dynamic adjustment while maintaining stable subjective quality.
As shown below, there are currently two modes of time domain scalability implemented in the Aurora AV1 encoder. In both modes, other video frames outside the basic layer can be discarded adaptively to suit the dynamic network bandwidth requirements. The time domain scalability of AV1 inherits the encoder characteristics of VP8 and VP9 in WebRTC platform, and has natural fit with WebRTC.
AV1 Content encoding of RTC camera
In addition to the on-screen content, for the video conference Talking Head scene, AV1 has been refined to highlight its standard advantages.
Aurora AV1 compares x264 Medium files in 480p and 720p video conferencing scenarios in AMD Ryzen 9 3900X 12-core (12C24T), 2-thread coding, as shown in the two figures below. Auora Superfast can obtain an average bD-rate (PSNR) gain of more than 20%, while the coding speed advantage is more than 30%.
Note: X264 command behavior –nal-hrd None –preset Medium –profile main — Threads 2 –tune zerolatency –no-psy –aq-mode 0 –no-scenecut
AV1 RTC mobile platform coding performance
The complexity of AV1 standard tools makes it more challenging to apply in mobile phones.
At the same time, as mentioned earlier in this article, WebRTC/Chrome has opened AV1 RTC support based on Libaom-RT file, and libaom-RT open encoder performance is also improving.
Aurora is compared to libvpx-VP9, X264, and Libaom-RT in terms of encoding efficiency and speed for RTC mobile applications: The encoding platform is Snapdragon 845 mobile phone, single thread CBR setting, select 40 180p typical real-time scene videos, the target bit rate range is set in 50kps ~ 200kbps.
Each curve in the figure represents the performance of an encoder, and each coordinate point on the curve represents a specific speed range of the encoder. The vertical axis indicates bD-rate (PSNR), all encoder preset, is based on X264 medium file (Anchor), the negative value of BD-rate indicates that compared with Anchor, the same video quality can be obtained with lower bit rate. Therefore, the lower the position of curve-coordinate point is, the greater the compression performance advantage of the encoder is. The horizontal axis marks the coding speed. The more to the right of the curve coordinate point, the faster the corresponding coding speed.
Aurora is shown to be far more efficient than VP9 and X264. Aurora is still being optimized and the current Settings for superfast and ultrafast are likely to be lower speed Settings. It will provide multiple speed files from Medium, fast, faster, veryFast, superfast to ultrafast for RTC scenarios. Compared with Libaom-RT AV1 in WebRTC, Aurora significantly outperforms libaom-RT AV1 in encoding speed and efficiency. Aurora will speed up while maintaining AV1’s full standard advantage. (Note: Aurora and Libaom-RT, both March 5, 2021 versions)
Whether the open source code library Libaom-RT, or the business encoder Aurora, AV1 in mobile platform optimization iteration, will continue its historical track, in the future for a period of time, performance continues to improve, to meet the needs of more and more RTC scenarios, based on the existing coding standards, to further improve the user experience.
Combination of AV1 and AI
In RTC scenarios, the combination of AV1 and AI should play a significant role in improving the performance optimization of encoders in various aspects, including pre-processing, content classification, ROI scene optimization, and intelligent code control design and implementation. AV1 can use AI technology to show further potential. In cooperation with many domestic and overseas universities, the microframe team wrote a paper entitled “Advances In Video Compression System Using Deep Neural Network: “A Review And Case Studies” has been accepted by The Proceedings of The IEEE, A top IEEE journal. In this paper, based on AV1, The combination of video coding And AI in pre-processing And post-processing, And some preliminary exploration of using AI for future coding standards, such as AV2. The paper can be downloaded directly from arXiv.org (link: arxiv.org/abs/2101.06…
AV1 subjective coding performance
As shown in the figure, with Aurora AV1 encoding, the image quality of AV1 encoding is significantly better than that of X264 encoding at the same bit rate, i.e., the same bandwidth.
Based on the above excellent performance of AV1 and its natural fit with RTC applications, we expect that AV1 will experience rapid ecological development in the next two to three years with the explosive growth of RTC applications driven by WebRTC, browser and Android mobile terminal.