With the optimization of the network environment and the improvement of the performance of mobile devices, more end users are attracted to the high-definition and smooth ultimate movie-watching experience. High resolution video resources also put forward new requirements for compression standards. From H.264/AVC standard to H.265/HEVC standard, although the compression efficiency is doubled, it still cannot meet the compression requirements of HD video. As a result, the H.266/VVC standard with higher compression efficiency was born. At the beginning of its birth, because of the complexity of H.266/VVC coding standard and the lack of corresponding terminal equipment hard decoding, commercial landing seemed out of reach.

Users get the ultimate viewing experience through HD and ultra HD video

H.266/VVC Standard compression efficiency ratio H.265/HEVC doubled

Versatile Video Coding (VVC, also known as H.266) is the latest generation of Video Coding standard developed by the joint Video Expert Group and was finalized in July 2020. As HEVC’s successor, H.266/VVC was able to double the compression efficiency for the same mass. The promotion of the H.266/VVC standard will provide great potential for the future development of video content.

The previous generation standard for H.266/VVC was H.265/HEVC. With the rapid development of mobile terminals in recent years, H.265/HEVC has been widely used, but its compression efficiency still cannot meet the needs of a large number of 4K/1080P high-definition video resources. Large compressed video files occupy storage space and consume network bandwidth, resulting in frequent video lag.

With the same coding quality, the compression rate of H.266/VVC standard can be doubled than that of H.265/HEVC standard.

For example, using the H.265/HEVC coding standard, users need 1GB of traffic to watch a movie, but changing to THE H.266/VVC coding standard, users only need 500MB of traffic, and when playing the smart terminal, the picture quality can remain the same. H.266/VVC standard, so that hd online video is no longer a traffic eater.

Although the compression performance of THE H.266/VVC standard is excellent, its decoding complexity is significantly higher than that of the H.265/HEVC standard. During the decoding process of the terminal device, the device will generate heat, consume more power, and the video will lag, which will affect the user’s viewing experience. In the background of hardware decoding chip has not yet come out, how to design and achieve a super high performance VVC software decoder has become our important goal.

BVC Decoder, go through H.266/VVC The last mile of the standard landing

BVC is a new generation decoder developed by Bytedance and officially supports the H.266/VVC standard. The BVC decoder can be applied to a variety of platforms including Android, iOS, Linux, MacOS and Windows.

In view of the large number of mobile users and the uneven performance of devices, we have carried out special optimization of THE BVC decoder on the Arm platform of Android and iOS, and the decoding speed on some devices is tens of times that of the reference software. The efficient decoding of BVC makes the commercial landing of H.266/VVC standard no longer out of reach.

Look at a detailed set of dataAccording to the international conference proposal JEVT-V128, on the iPhone 12 equipped with A14 processor, for 4K resolution standard test bit stream, the average decoding speed of BVC single thread reached 22 FPS; 2 threads can realize real-time DECODING of 4K video. For standard 1080p resolution test streams, BVC single-thread average decoding speed is 86 FPS, which means that a single thread can decode 1080p video in real time. Therefore, on high-end phones, BVC decoders support smooth playback of HD and even ultra HD video.

Figure 1 compares the decoding speed of BVC and VTM-11.0 of different threads

Figure 1 shows the speed comparison between BVC and the reference software vtm-11.0 for decoding videos of different resolutions on the iPhone12. The horizontal axis represents decoding 4K, 1080p, 480p, and 240p video, respectively; The vertical axis represents the average frame rate for vtM-11.0 decoding and BVC decoding with 1, 2, 4, and 6 threads, respectively. A larger number indicates a faster decoding speed. The dashed line represents the commonly used frame rates for real-time video playback at all resolutions. It can be seen that the single-thread decoding speed of BVC is 6-10 times better than that of VTM-11.0.

In addition to the above standard test bit streams, we also tested the tiktov video bit streams generated using the BVC encoder, and the BVC decoder showed strong decoding performance. On mi 6, which is inferior to iPhone 12 in performance, BVC single thread can complete real-time decoding of 1080p video; On other mid – and low-end mobile phones, BVC single thread can complete the real-time decoding of 720p video. The emergence of BVC decoder makes it possible to realize the decoding of H.266/VVC standard video resources on various performance devices, and bring smooth viewing experience for users.

BVC How do decoders achieve technological breakthroughs

In order to effectively reduce the computational complexity of THE BVC decoder and speed up the decoding speed, we optimize from the aspects of parallelism, code framework, assembly instructions and memory access efficiency, and the performance is improved significantly.

• Fine-grained parallel algorithms: BVC supports different levels of parallel algorithms, including frame level parallelism, block level parallelism, and module level parallelism. Frame level parallelization means decoding multiple video frames at the same time, which can make full use of the performance of multi-core CPU and has the highest parallelization degree. Block-level parallelization means decoding multiple decoding blocks simultaneously. Modec-level parallelism is to use the remaining CPU resources to process multiple complex modules at the same time. The combination of block level and module level can effectively reduce the output delay of video frames and ensure smooth experience in real-time scenarios such as video conference and live broadcast.

• Pipeline-friendly code framework: BVC has a very lightweight code framework that caters better to smaller, less powerful mobile devices. According to the features of each function module, BVC has different algorithms to reduce the branch jump as much as possible and improve the saturation of THE CPU pipeline.

• High-throughput assembly optimization: We used high-throughput SIMD instructions to perform assembly optimization for complex modules such as pixel intra – frame prediction, interframe interpolation, quantization, transformation, reconstruction and loop filtering, all of which achieved several times the module acceleration ratio and maximized the computational efficiency of CPU.

• Efficient memory access design: Mobile devices have small memory and cache space and limited memory access efficiency, which greatly restricts decoder performance. Therefore, we optimize the memory access of BVC decoder, including reducing memory read and write times, concentrating memory usage and improving cache hit ratio. After optimization, memory access is no longer a bottleneck in decoding UHD video on mobile devices.

Detailed performance data

We did a set of tests using vtM-11.0, the official VVC reference software. In the general configuration, several groups of 8-bit code streams are generated, and all the tools under standard test conditions are opened, including more complex DMVR, BDOF and ALF, etc. The test sequence is standard general test sequence, including class A, B, C, D, F. Where, class F is the screen content scene with resolution ranging from 480p to 1080p; Class A-D is A natural scene and the video resolution is 4K, 1080p, 480p and 240p, respectively.

On the iPhone 12 (A14 processor), BVC’s single-threaded decoding of 4K resolution, 8-bit standard test bitstreams averaged 22fps, which is 10 times faster than the reference software VTM-11.0. The decoding speed is even up to 55 FPS after using all six threads, with a maximum of 78fps. For 1080p resolution, 8-bit standard test streams, the BVC decoder’s average single-thread decoding speed is 86 FPS, up to 8.8 times that of the reference software.

Table 1 Detailed data of decoder speed comparison

conclusion

BVC decoder can realize real-time fast decoding of ultra clear and high quality video, showing excellent decoding capability on mobile terminal, and playing a positive role in promoting the development of video industry and the landing of H.266/VVC standard.

Next, based on the problems and challenges in practical applications, our technical team will continue to make efforts to continuously optimize the performance of BVC decoders, aiming to make more contributions to the implementation of the new generation of standards.

Volcano Engine Multimedia Lab team introduction

Multimedia Lab is committed to researching and exploring cutting-edge technologies in multimedia field, participating in international and domestic multimedia standardization work, and providing software and hardware solutions for multimedia content analysis, processing, compression, transmission, innovative interaction and other fields. At present, many innovative algorithms provided by the multimedia lab have been widely used in the on-demand, live broadcast, real-time communication, pictures and other multimedia services of tiktok, watermelon video and other products, enabling services in terms of cost, experience and capability, and providing them with the ultimate video technology and product experience