Authors: Zhang Li, Wang Yekui
2016 saw the emergence of short video applications and continued explosive growth. On March 1, 2019, the Ministry of Industry and Information Technology, The National Radio and Television Administration and the China Media Group jointly released the Action Plan for uHD Video Industry Development (2019-2022) to promote the development of the uHD video industry and its application in related fields. In the same year, e-commerce live streaming began to lead a new consumption model; At the beginning of 2020, video conferencing became popular all over the world; In 2021, the CCTV Spring Festival Gala will be presented with 8K ultra HD video for the first time, and integrate various virtual reality and augmented reality technologies to bring huge audio-visual shock experience to the audience. Meanwhile, online video will account for more than 82 percent of all consumer Internet traffic by 2022, a 15-fold increase from 2017, according to Cisco. All the above phenomena and data show that video has been everywhere in People’s Daily work and life, not only for entertainment, leisure, shopping and so on, but also is gradually replacing text as the most important way for people to obtain knowledge and information.
Behind these applications is a series of very complex technologies, one of the most core and basic technologies is video coding/compression technology. Video signals have a huge amount of data. Take ultra-high Definition (UHD) video, which has a resolution of 3840×2160 pixels and a frame rate of 60 (or 60 images per second), So the data volume of a second of uncompressed video is over 11.94 billion bits (3840×2160 pixels/image ×24bits/ pixel ×60 images/second). With such a huge amount of data, direct transmission and storage of video signals without compression is almost impossible. After compression, the data volume of video signal can be reduced to 1/10 or even 1/100 of the original data volume on the premise that visual perception is basically not affected. Video coding technology makes it possible to play remote video signal smoothly and clearly.
At present, the widely used video coding and compression technologies are mainly some video coding standards. So why should video coding be standardized? The main purpose is to achieve the interconnection between different products of different companies, for example, the video code stream obtained by one manufacturer’s encoder can be played by the decoders produced by other different manufacturers. The video coding standard, as a standard which must be followed by each manufacturer, contains inestimable commercial value. International industry giants, such as Qualcomm, Samsung, LG, SONY, Intel, Ericsson and other companies have invested heavily in long-term ploughing, and have obtained huge returns from the current mainstream international standard patents. Each video coding standard embodies the wisdom of many video coding technology experts, and the release of a new generation of video coding standards will often promote the generation and popularity of new video applications. For example, H.262/MPEG-2 standard promoted the transformation of analog TV to digital TV, H.264/AVC made hd video and Internet video widely popularized, AND H.265/HEVC successfully promoted the popularization of 4K ultra HD video. While H.266/VVC has better support for new video types such as 8K ultra HD, screen, high dynamic and 360-degree panoramic video, as well as applications such as adaptive bandwidth and resolution streaming media and real-time communication.
The four video coding standards mentioned above are the result of a joint effort by members of two international video expert groups: The International Organization for Standardization and the International Electrotechnical Commission ISO/IEC dynamic Picture Experts Group (MPEG) and the International Telecommunication standardization department ITU-T Video Coding Experts VCEG (Video Coding Experts) Group). In addition to these standards, several others have emerged over the long history of video coding standards, as shown in Figure 1.
Figure 1. Overview of video coding standards
At an opening plenary session of MPEG, THE founder of MPEG and Leonardo Chiariglione, chairman of MPEG for 32 years, showed off the opening words of the Romance of The Three Kingdoms: “As the general trend of the world, long divided, long united, long divided”. The previous generations of international video coding standards reaffirmed this historical dynamic — MPEG and VCEG developed their own standards separately, then worked together, then separated, then collaborated again, and so on. MPEG independently developed the first version of MPEG-1, MPEG-4 Visual and MPEG-5/EVC standards in 1993, 1999 and 2020, respectively. VCEG independently developed the first version of H.261 and H.263 standards in 1990 and 1995, respectively. H.262/MPEG-2, H.264/AVC, H.265/HEVC and H.266/VVC were jointly developed by the two, and the first version was completed in 1994, 2003, 2013 and 2020. Chinese experts in the field of video coding began to track international standardization work in 1996. Besides the above mentioned two international organization for standardization, in June of 2002, China formally established the digital audio and video decoding technical standard working group (AVS), its main task is to adapt to the need of the information industry of our country, joint domestic enterprises and research institutions, system (repair) of the digital audio and video compression, decompression, processing and common technical standards such as said. Since the establishment of the AVS Working Group, three generations of AVS standards have been developed: AVS, AVS+/AVS2 and AVS3. In addition, some companies with strong technology will develop their own video coding standards, such as: Microsoft developed the VC-1 standard in 2003, Xiph.org launched the Theora standard in 2004, and RealNetworks launched RMHD (RealMedia High) in 2015 In 2013, Google launched VP9, followed by VP10. In 2015, Google and other companies started AOM (Alliance for Open Media), which launched AV1 in 2019. It should be noted that video coding standards define bitstream formats and decoding (decompression) processes rather than prescribing specific coding processes, giving encoder developers more flexibility to develop non-standardized coding optimization algorithms.
Looking at the history of these video coding standards, there are two main threads in their history — application and technology. The primary target application of the original H.261 standard was isDN (now obsolete, or dead) based video telephones; VCEG didn’t exist at the time, and the working Group was called the Specialists Group on Coding for Visual Telephony; The supported resolutions are small, 352×288 and 176×144. The main target application for MPEG-1 is VCDS that some young people today haven’t even seen. Mpeg-2’s primary target application is digital television. The main target applications of H.263 are multi-party video conferencing in addition to video telephony. The target application of every video coding standard starting with MPEG-4 Visual includes the previous applications, and at the time of MPEG-4 Visual standardization, streaming began to appear, and since then streaming has become the target application of every new video coding standard, and increasingly important. The new target applications of H.264/AVC, H.265/HEVC and H.266/VVC have already been mentioned and will not be repeated here.
From the perspective of standard technology evolution, the technologies adopted by the previous generations of Video Coding standards are all based on the Hybrid Video Coding framework, which often includes the prediction technology based on motion compensation and the transformation and quantization technology of the predicted residual. In addition, more coding technologies are constantly introduced into the standard, such as filter technology, decoding end motion information improvement, etc. In general, it makes full use of Moore’s Law to gradually exchange higher computation amount for the improvement of coding compression performance. The specific algorithm design is more and more complex and adaptive. And it’s getting more and more difficult from the point of view of algorithm design. For H.264/AVC in 2001 and 2002, a tool with a performance improvement of less than 3% might not be of interest. For H.266/VVC in 2019 and 2020, a proposal with a performance improvement of 0.5% would be of interest. Interspersed with the development of video coding standards is another line – the change of participants. With the rapid development of the whole video industry, more and more people participate in the development and development of video standards. During h.264 /AVC, the number of participants is usually less than 100, and input documents are usually only a few dozen, with a peak of 150. During the h.265 /HEVC process, the jCT-VC document number changed from the previous three digits to four digits from the February 2012 meeting, when the number of input documents was 738 and the number of participants was 255. During the h.266 /VVC period, JVET’s document number started with four digits, and the July 2019 conference had 1,178 input documents and 340 participants.
Figure 2. The birth of ISO standards
I have described the history of coding standards. What is the development process of a specific standard? The birth of an international standard usually goes through seven stages: Preliminary exploration, Call for Evidence (CfE), Call for Proposal (CfP) and response, formal launch of standard project and formation of working draft (WD, Working Draft (CD), Committee Draft (CD), and Draft International Standards DIS, Final Draft International Standard (FDIS) and International Standard (IS) are officially released. The whole process is shown in Figure 2. For each stage, there may be one or more standard meeting cycles. The goals of each stage are different. For national standards or enterprises’ self-developed standard formulation process, some of the above links will be slightly adjusted. Figure 3 depicts the critical time nodes of H.266/VVC. From January 2015 to October 2015, it belongs to the KTA (Key Technology Area) stage, and we can do some technical exploration more divergent [1]. In October 2015, with the submission of a technical proposal to exceed HEVC’s coding performance by 10+% [2], JVET (Joint Video Exploration Team) was formally established. The proposed software platform is defined as JEM (Joint Exploration Model). Since then, new technologies have been validated against JEM, and a new VERSION of JEM has been released after each standards conference. As of July 2017, JEM has completed seven iterations of the JVET standard through seven JVET meetings, and has achieved a 30% improvement over HEVC compression performance. This provides a strong signal to the industry that the next generation of standards is still very much on track to achieve the stated goals (a 50% bit rate reduction with the same subjective quality). Thus, the standardization work entered the second phase: proof and technical proposal solicitation and response stage proof, which went through three standards meetings. In April 2018, the test results of the CfP response were made public. The highest performance version of the 23 CfP responses was 40% bit rate less than HEVC. This means that the next generation of video compression technology is mature, and the development of the VVC standard is officially started. Since April 2018, after ten standard meetings, the review of thousands of technical proposals, and the joint efforts of hundreds of experts around the world day and night, the final version of the VVC standard (VVC V1) was officially completed in July 2020. Figure 4 lists some of the companies involved in VVC standardization; The good news is that The participation of Chinese companies is very high and Chinese companies are playing an increasingly important role on the international stage.
Figure 3. VVC standard birth process
Figure 4. VVC standard main participating companies
For the video coding standard application manufacturers, facing many standards, how to make a choice? Everyone has his own opinion. The author believes that we should analyze the advantages and disadvantages of each standard according to our own situation and choose the most suitable standard for ourselves. Of course, we would like to see a relatively fair licensing fee for each technologically competitive standard, so that more users can benefit from the most advanced video standards. Finally, many may ask, when will the next generation standard after H.266/VVC come out? Unfortunately, the answer is that it is impossible to know at present, but it is clear that people’s exploration of video coding technology and demand for efficient video coding technology will not change. JVET has recently begun exploring two areas, one is the emerging deep learn-based video compression (also including the combination of deep learning and traditional hybrid video coding framework), and the other is the continued exploration of traditional hybrid video coding framework technology. Although it’s still very early days, we’ve already seen some breakthroughs, such as a deep learn-based adaptive filter algorithm [3] proposed by my bytedance team that gives performance gains (10%, 28%, 28%) for three color components (Y, U, V); Qualcomm has recently reported a number of improvements based on the hybrid video coding framework that together deliver performance gains (11%, 13%, 13%) [4]. We believe that in the near future, we will see more and more new technologies emerge through the continuous efforts of standard friends. When we see another 30 percent improvement in compression performance, we’re ready to hear the next generation of video coding standards.
References:
[1] J Chen, Y Chen, M Karczewicz, X Li, H Liu, L Zhang, “Coding tools for Next-generation video Coding”, Itu-t SG16 doc.com16 — C806, Feb. 2015.
[2] M Karczewicz, J Chen, WJ Chien, X Li, A Said, L Zhang, X Zhao, “Study of coding Efficiency Improvements beyond HEVC”, ISO/IEC MPEG Doc.M37102, Oct. 2015.
[3] Y. Li, L. Zhang, K. Zhang, “AHG11: Convolutional Neural Network-based In-loop Filter with Adaptive Model Selection “, JVET-U0068, Jan. 2021.
[4] Y.-J. Chang, C.-C. Chen, J. Chen, J. Dong, etc. al, “Compression efficiency methods beyond VVC”, JVET-U0100, Jan. 2021