Authors: Wang Meng, Zhang Li

Background:

With the rapid development of communication technology, Internet, multimedia and display technology, video-related applications have penetrated into every aspect of people’s life. The explosive growth of video data has put great pressure on existing storage and bandwidth. At the same time, users’ demand for high-quality video viewing experience is also increasing, which brings unprecedented challenges to existing video coding technology. Video coding is the core technology of video-related applications (especially uHD video applications), which aims to efficiently and compactly represent video data, minimize the loss of video quality due to compression, and save the cost of video transmission and storage. To further improve the efficiency of Video compression, the International Video Coding Standards Joint Video Expert Group has developed Versatile Video Coding (VVC), a new generation of audio and Video Coding standard. Meanwhile, in order to meet the needs of China’s 8K Video and 5G industry, AVS (Audio Video Coding Standard) working group officially launched the development of AVS3, a new generation of Video codec Standard with independent intellectual property rights for 8K uHD Video applications, in 2018.

The new video coding standard still adopts the classical hybrid coding framework based on prediction, transformation, quantization and entropy coding. In the Coding process, video frames are firstly divided into non-overlapping Coding Tree Unit (CTU), which is further divided into Coding Unit (CU). Therefore, coding unit is the basic unit based on block coding, and the partition method and structure of coding unit determine the specific size and shape of coding unit. The efficient coding unit division strategy can better adapt to the local texture and motion characteristics of the video, which will help improve the block-level prediction efficiency, and then improve the overall video compression efficiency.

Figure 1. Coding unit division method

Figure 1 illustrates the coding unit division methods adopted by each video coding standard/model. Quad-tree (QT) recursive partition [1] method is widely used in video coding, such as H.264, H.265, AVS2. A QT partition will generate four square subcoding units of exactly the same size and shape. In addition, h.265 and AVS2 introduce prediction unit and transform unit on the basis of coding unit, so that prediction and transform can be more suitable for texture content and residual distribution, improve the accuracy of prediction and enhance the energy aggregation of transform. However, the starting point of the division of prediction unit and transform unit is the square sub-coding unit after QT partition, which limits the flexible expression of prediction unit and transform unit to some extent. In addition, the expression of prediction unit and transformation unit also needs to consume extra bit cost.

In order to further improve the efficiency of video compression, as shown in FIG. 2, JEM [2] uses the partition structure of QT and binary-tree (BT) to divide coding Tree units into combinations of squares and rectangles through recursive division. VVC introduces ternary-tree (TT) structure on the basis of QTBT [3]. In the process of standardization, AVS3 adopts QT partition in AVS2, introduces BT partition, and adopts Extended quad-tree (EQT) proposed by us [4][5].

Figure 2. Schematic diagram of QT, BT and TT classification

Extended quadtree partitioning:

EQT partitioning can significantly improve the flexibility of coding unit partitioning and content adaptability, and effectively make up for the shortcomings of existing partitioning methods, further improve the compression performance of the new generation of video coding standards. As shown in Figure 2, in terms of partition type, QT partition and BT partition path always run through the parent coding unit and generate sub-coding units of the same size and shape, so the flexibility of partition is also limited. The proposed EQT partition can divide an M×N CU into four sub-coding units with different shapes, as shown in FIG. 3. Two of the sub-blocks are located at both ends of the original coding unit with the size of M×N/4, and the other two sub-blocks are located on both sides of the center of the parent block with the size of M/2×N/2. Using the implementation in AVS3 as an example, Figure 4 shows an example of a set of QT, BT, and EQT partitions, as well as the corresponding encoding sequence and encoding tree structure. On the child nodes divided by QT, we allow the recursive partition of EQT and BT. In the implementation of VVC, EQT can be divided recursively with BT and TT. Since the size of the subcoding unit generated by EQT partition is the power of 2, there is no need to introduce transform kernel of additional size to support the transformation of the subcoding unit.

Figure 3. Schematic diagram of horizontal and vertical division of EQT

FIG. 4. Encoding unit division structure of QTBT+EQT in AVS3. Black solid line represents QT division, black dotted line represents BT division, and red dotted line represents EQT division

Figure 5 is a frame image of the test sequence “BasketballDrive”, which shows the visual partition structure with QTBT+EQT partition method. The white line represents the coding unit divided by QT and BT, and the green line represents the coding unit divided by EQT. It can be observed from the figure that flat, gently moving background areas, such as walls, floors, etc., tend to be coded with larger coding units. Areas with complex textures and heavy motion are frequently divided using EQT. In addition, by comparing (a) and (b) in Figure 6, it can be observed that in the absence of the introduction of EQT partition, QT and BT partition will carry out frequent iterations in this region in order to better encode the details of the head region of athletes containing strenuous exercise. After the introduction of EQT partition, the region can be effectively expressed by several EQT partitions.

Figure 5. Structure diagram of QTBT+EQT division

Figure 6. Comparison between QTBT division and QTBT+EQT division of local areas

Figure 7 shows the visual partitioning structure of QT, BT, TT and the proposed EQT partitioning method on the same frame of the test sequence “BasketballDrive”. The white and red lines respectively represent the partitioning track of QTBT and TT, and the green line represents the coding unit divided by EQT. As can be seen from the figure, even on the basis of the most advanced partition structure of QTBT+TT, the proposed EQT partition method is still frequently selected in complex texture areas and motion scenes. In addition, as can be seen from FIG. 8, encoders tend to choose TT partition to encode smooth regions in the middle, while EQT partition is more suitable for encoding regions with content differences in the middle.

Figure 7. QTBT+TT+EQT division structure diagram

Performance Report:

We verified the performance of EQT partitioning method on AVS3 reference software platform (HPM-4.0.1) [6] and VVC reference software platform (VTM-4.0) [7] according to general test conditions [8][9]. Wherein, encoding performance is measured by BD-rate [10], and negative value represents bit Rate savings (performance gain). The tests covered AVS3 and VVC common test sequences with different bit depths (8, 10) and resolutions (4K, 1080p, 720p). The performance of EQT on THE AVS3 reference software platform HPM4.0.1 is shown in Table 1 (switch test). Experimental configurations include All Intra (AI) and Random Access (RA) configurations. The coding performance gains of brightness component and chroma component brought by EQT in AI configuration are 1.1% and 2.1% respectively. In RA configuration, EQT can bring 1.7%, 2.7% and 2.6% encoding performance gains for Y, U and V components, respectively, and the encoding complexity increases by 54%, while the decoding complexity remains unchanged. In addition, EQT brings 0.66% compression gain to vtM-4.0 platform.

References:

[1] I. Kim, J. Min, T. Lee, W. Han, and J. Park, “Block Partitioning Structure in the HEVC Standard,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1697 — 1706, Dec 2012.

[2] “JVET software repository,” JVET. Hhi. Fraunhofer. DE/SVN/svn_HMJ… .

[3] J. Chen B. Bross and S. Liu, “Versatile video coding (draft 4),” JVET M1001, 2019.

[4] M. Wang, J. Li, L. Zhang, K. Zhang, H. Liu, Y. Wang, P. Zhao, D. Hong, and S. Wang, “Extended Quad-tree partitions”, 2018.10.16 M4507, “AVS3 – P2.

[5] M. Wang, J. Li, L. Zhang, K. Zhang, H. Liu, S. Wang, S. Kwong and S. Ma. “Extended coding unit partitioning for future video coding”, in IEEE Transactions on Image Processing, vol. 29, pp. 2931-2946, 2020.

[6] “AVS3 software repository,” gitlab.com/AVS3_Softwa…

[7] “VVC software VTM – 4.0,” vcgit. Hhi. Fraunhofer. DE/jvet/VVCSof… .

[8] K. Fan. AVS3-P2 common test condition. AVS-Doc, N2654, 2019.03

[9] F. Bossen, J. Boyce, K. Suehring, X. Li, and V. Seregin, JVET Common Test Conditions and Software Reference Configurations for SDR Video, Joint Video Exploration Team (JVET), doc. JVET-M1010, Jan. 2019.

[10] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” ITU-T SG.16 Q.6 VCEG-M33, 2001.