This article is organized from Google video compression core algorithm group software engineer Chen Cheng in RTC 2018 real-time Internet Conference codec technology special speech sharing. In his speech, he mainly shared the research and development status of AV1, and shared the encoding algorithm and encoding effect of AV1 in detail.
Welcome to visit the RTC developer community to share your experience with more RTC developers.
According to a report released by Cisco this year, video is expected to become a major demand for Internet products in 2021, with more than 80 percent of Internet traffic accounted for by video. By then, video-related services and demand will have increased by about 50 percent, with uHD demand rising by about 30 percent. At the same time, Internet demand for live streaming and other real-time video services will be 15 times greater.
Such rapid development and rapid growth of demand, it is the Internet companies to promote a new generation of video encoding and decoding technology.
I. Overview of AV1 and AOM
The new generation of encoding and decoding technology, AV1, which was finalized in June 2018, has four features:
-
AV1 is an open source, free ecosystem that members of the Open Media Alliance (AOM) will support
-
AOM has set aside funds for LEGAL assistance and patent protection for AV1
-
AV1 uses more cutting-edge coding technology than previous generation products and achieves better compression efficiency
-
The AOM cooperation framework provides an open environment for AV1 cooperation
As we all know, AV1 is the predecessor of VP9 launched by Google. Compared with VP9, AV1 has three characteristics: Performance, Platform and Potential. The AOM framework makes AV1, and the future of video, more dynamic. Although AV1 is still in the early stages of development and rollout, we believe that with the support of the industry, it will be more widely used than VP9.
AOM, an open source media alliance led by Google, Amazon, Cisco and other companies, is dedicated to promoting and developing multimedia video codec technology. In addition to the Internet companies currently closely linked to the video industry, the alliance includes hardware makers, content providers, and major browser makers. Apple recently joined AOM. The industry ecosystem will also support AV1.
There are four AV1 working groups: software research and development group, hardware research and development group, Tapas group and test group. The software and hardware research and development groups jointly conduct THE RESEARCH and development of AV1, while the test group will check the consistency of AV1. The Tapas group reviews patents and advises on legal issues.
The AV1 development team is a vibrant community. Throughout the development of AV1, more than 15 academic papers have been published, and about 100 r&d proposals have been adopted by AV1. The reference code, which is currently available online, has about 300,000 lines of code, and an average of 15 are submitted every day.
From research and development to promotion of AV1, there will be four stages:
-
The first stage is the development and formulation of standards
-
Phase two, desktop browser support for decoding
-
The third phase is to promote AV1 in more hardware and software support
-
Phase 4: Support for AV1 software and hardware coding in the AOM ecosystem
We have achieved our goals for the first phase, starting with research and development in 2015 and finalizing in June 2018. We’re currently in phase two, where Google’s Chrome browser has implemented software decoding. In future phases 3 and 4, we will focus on AV1 support from hardware devices and content providers. AV1 is expected to become more widely used by 2020.
Two, AV1 coding and algorithm
Similar to other video coding software, AV1 is also divided into a series of continuous modules, including segmentation, prediction, transformation, quantization, entropy coding, ring filtering, etc. In unit block partitioning, AV1 supports more partitioning patterns and larger partitioning blocks. Its predecessor, VP9, supports blocks of up to 64×64, and can be divided into four sub-blocks per block recursively. AV1 supports a maximum block partition of 128×128. There are 10 types of recursive block partition, with the smallest block partition of 4×4.
The prediction model is divided into interframe prediction and intra – frame prediction. There are four main tools for in-frame prediction:
-
Support more and more forecasting models
-
Supports predicting chroma values from brightness values
-
Palette mode
-
Intra-frame copy mode
AV1 supports the prediction of 56 directions. Using the upper and left boundaries of the current block, the reconstructed image predicts the pixel value of the current block by the difference of the direction. Its Angle is expressed by selecting one of eight major difference directions and using a Delta value to determine the exact Angle.
In addition to direction predictions, AV1 supports other ways to generate predictions for individual pixels or the current gradient block. As shown in the figure below, there are four different difference methods to predict the current value, and the current value P will be obtained by the pixel difference of the dark blue module. It is worth mentioning that another recursive way to predict the gradient block, using filtering method to recursively predict the value of each pixel, the complexity of codecs will be increased.
The tool to predict the chromaticity value from the brightness value, it takes advantage of the video image of the brightness value channel and the chromaticity value channel has a high similarity, by selecting the appropriate parameters and then to reconstruct the brightness value channel to predict the chromaticity value, this tool has a good compression effect for the game video.
The color palette mode treats the elements in the block as several discrete colors, which is different from directly transferring the parameter value of the pixel itself, but achieves the purpose of compression by transferring the color number of the color block. AV1 supports blocks from 8×8 to 64×64 and supports palette mode. The encoder will automatically select whether to use palette mode according to the video content. As shown in the example shown in the figure, on the left is a current module that is divided into 3 different color blocks, and the blocks in the block are coded one by one according to the mode of wave front. The color signals on its left and upper sides will be used as information in entropy coding. The palette mode is useful for scenes where the current block has a single tone, which typically occurs during compression of screen content.
Another tool that is important for screen content compression is called in-frame copying. Its working principle is that when predicting the current block, it will search the reconstructed image parts of the current frame, such as the second letter A and the third letter A in figure 1 below. It will find the first encoded letter A prediction block, and its prediction will be very accurate, which can achieve the purpose of improving the compression efficiency. The image contains a lot of letters, and the prediction effect is very good through the pattern of intra-frame copy, which greatly improves the video compression efficiency. For this image alone, the compression efficiency is increased by 50%.
Interframe compression tools are more abundant than intra-frame compression tools. AV1 is further optimized for the number and structure of reference frames compared to VP9, supporting eight reference frames and being able to make predictions with seven of them, whereas VP9 uses only three. AV1 also supports layered structures for bidirectional forecasting.
In current video compression standards, motion estimation occupies a large part in bit rate. Motion estimation generally predicts the current block by searching a motion vector to find the corresponding block in the reference frame. In AV1, we adopted a method called the field projection method to obtain the motion vector. Its working principle is to obtain a motion trajectory through the two reference frames of the current frame, find the corresponding reference frame of the reference frame, and project the motion trajectory to the current frame to obtain two motion vectors. When the object is in linear motion, these two motion vectors will be able to predict the motion trajectory of the object well, and get more accurate prediction, so as to help us get better compression effect.
After we have these motion vectors, we need to transmit them. In the existing video standards, the transmission motion vector has occupied a large proportion. Therefore, we adopt a method called dynamic motion vector index in AV1, which does not directly transmit the specific coordinate value of the motion vector, but transmits its serial number in the motion list, and then obtains the used motion vector by querying. The motion vector is obtained by projecting the motion field and added to the motion list. We sorted the vector-value of the motion list through the calculation of motion estimation, and then selected a good serial number through the encoder to transmit to the decoder.
Current video compression standards support bidirectional prediction, and it is worth exploring how to fuse two reference frames together to generate a prediction block. In this respect, AV1 supports a variety of different ways to generate prediction blocks, including mean motion compensation, weighted motion compensation, wedge segmentation prediction, etc.
Motion block compensation was proposed as early as the H.263 era and is adopted here by AV1. Its principle is to use smooth filtering method to overlap two motion compensation blocks, which can eliminate the influence of discontinuity of motion, so as to improve the prediction accuracy. In two-way prediction, we can achieve the purpose of generating different prediction speeds by adjusting the weights of the two prediction speeds. AV1 in addition to supporting the average motion compensation, according to the reference frame distance weighting generated forecast, characteristic is when a reference frame frame is now very close when it be a source of high prediction accuracy, so we give it a higher weight, when another reference frame distance away when we will give a reference frame in the distance a lower weight, The weight is not directly transmitted through the bit stream. The weight is queried in the preset table through the transmission number.
AV1 also supports wedge segmentation prediction, which can compensate for the shortcoming that block motion compensation cannot accurately simulate object boundaries. Wedge splitting has been preexisting in the codec as a lookup table, and the encoder selects the appropriate and best wedge splitting mode to transmit the bit stream to inform the decoder.
One big problem with motion compensation for blocks is that it doesn’t simulate a lot of curling and transformation in real motion. AV1 uses affine transform based cube compensation to solve this problem. AV1 supports global frame-to-frame transformations as well as block-to-block local transformations.
In terms of transformation processing, AV1 supports ADST, fADST and IDT in addition to traditional DCT. Because they are two dimensional divisible, AV1 supports 16 combinations in total. At the same time, AV1 also supports a variety of sizes, the maximum 64×64, minimum 4×4, also support rectangular transformation block size. AV1 supports DCT because it has approximately optimal compression effect for natural signals, while ADST and reverse ADST have good compression effect when the residual signals are monotonically varying. Congruent transformation has a good compression effect when the video is a step transformation.
Compared with VP9, AV1 has many more transform sizes and transform types, which greatly increases its search space and improves its codec complexity.
In terms of Quantization, AV1 adds several new tools to VP9, including delta-Q and Quantization matrix. These tools allow greater flexibility in the quantification of AV1, which is useful for specific use scenarios, such as the use of parameter matrices to improve the quality of subjective observations.
In the aspect of entropy coding, AV1 uses multi-symbol arithmetic coding, which can have high throughput and has the characteristics of fast probabilistic model adaptive.
In video compression technology, after the transformation of the coefficient matrix of the compression and transmission will account for most of the bitstream, even more than 50%, against the transformation matrix in AV1 transmission USES the hierarchical coding, the method of layered coding will use the two scans of the current block matrix is compressed, the first scan will be coding the absolute value of coefficient, The second code will encode the symbol of the coefficient. Let’s use an example to illustrate the hierarchical coding process. As shown in the figure below, left side represent the current coefficient matrix and scanning sequence, the first line says the first coding, coding starting position for the first time, inverse scan direction gradually to the starting position, deep yellow block according to the current pixel values, light yellow around the numerical information, said it may use the second line indicates the second coding from the starting position until the end position, The first coding will encode only the absolute value of the coefficient, and the second coding will encode the symbol of the coefficient.
In – ring filtering is an essential part of current video standards. In addition to the traditional deblock filter, AV1 adds new tools such as the Constraint Direction Enhancement filter (CDEF), frame super-resolution reconstruction, intra-ring reconstruction filter, and film grain-based film effect synthesizer.
The de-block filter of AV1 adopts different filtering intensity for Y, U and V signal channels respectively. For Y channel, different filtering intensity is used for horizontal filtering and vertical filtering. This design adds more optimizable space for de-block filtering.
The constraint direction enhancement filter (CDEF), applied after the block removal filter, estimates the direction of the object for each 8×8 block and uses the enhancement filter along the direction of the object. It preserves the sharpness of the edges and improves the quality of reconstructed images.
Frame super-resolution reconstruction, applied after CDEF. Firstly, the image is down-sampled along the horizontal direction, only the low-resolution image is encoded, and the low-resolution image is restored to the original resolution by up-sampling in the decoder. This method significantly improves the subjective observation quality of codec at low bit rate.
In – ring reconstruction filter is another important post – processing method to enhance image quality. It consists of two kinds of filters. The encoder chooses one of the two. Wiener filters are separable and symmetrical. The self-guided projection filter uses a linear combination of two reconstructed signals to approximate the real signal. The encoder selects the appropriate parameters by comparing the filtering results and transmits them to the decoder.
Cinematography particle synthesis is a tool designed for high quality video. The grain effect of film is difficult to preserve with traditional video compression methods. AV1 takes the synthesis of the particle effect as the step of the post-treatment, and treats the particle effect separately. It works by separating the film grain effect from the original video before it is encoded. The separated image is codec, and the particle effect is combined with the decoded image to produce the final output video.
Comparison of compression efficiency of AV1
We compared AV1 with VP9 and HEVC. Strictly speaking, we compared the reference software implementations used by these standards. AV1 used Libaom, VP9 used Libvpx, and HEVC used X265. Our test environment is AWCY, which is an open test set containing more than 30 test video files from 360p to 1080p, and the test condition is fixed QP60 frames. In this test environment, we can see that Libaom has a compression efficiency improvement of about 30% over Libvpx and 27% over X265.
Facebook also tested AV1, VP9, and H.264 in their real-world applications. The efficiency of AV1 is 50% higher than that of H.264 and 30% higher than that of VP9.
Moscow State University also conducts annual encoder performance tests, and in their results, AV1 achieves the best compression performance, significantly surpassing H.264, H.265 and VP9.
As we all know, the current video compression standards use higher codec complexity in exchange for the improvement of video compression efficiency, of course, we can not increase the complexity of codec unlimited in practical application, then where is the balance point? For video-on-demand companies, Netflix has their answer. They believe that if AV1 codec complexity can be controlled at 4-10 times that of VP9, it can be used in their product. What is the complexity of AV1? As of early August of this year, we compared AV1 with VP9 in speed 0 to speed 3 coding complexity. Just to explain, velocity 0 is the most efficient compression, but velocity 3 is the slowest. Compared to VP9, AV1 is 70 times faster at 0 and less than 10 times faster at 3. The AOM software development team is working on the AV1 codec optimization and the numbers are dropping.
Iv. Next evolution of AV1
An important goal of AV1 at the moment is codec optimization, which requires more SIMD code support on the coding side, and coding redesign, especially to reduce complexity on the hardware side. In terms of coding, we need to speed up coding speed, we need more efficient segmentation algorithm, better filter type, coding type, motion estimation algorithm and so on.
In AV1, we use a lot of machine learning algorithms that allow AV1 to make a lot of quick decisions, such as what type to divide the search square. We believe that in the future, better machine learning algorithms will help speed up AV1 video codec. In the future, we will continue to try other new video compression tools, such as optical flow algorithm, prediction and synthesis algorithm based on machine learning, transformation method based on machine learning, etc. We believe that under the framework of AOM, with the support of various members and the entire ecosystem, AV1 will be widely used in the near future.
Questions from the audience
Audience:When will AV1 be available to all?
Jiang:AV1 is currently in phase 2, and we have finalized the standard and support for browser software decoding. Hardware design and optimization will be completed in the next year or two, and it is expected that by 2020, AV1 will be available to everyone within the AOM organization.
Audience:We are concerned about the performance of AV1. How much bit rate can be saved by compressing the same content with the same quality compared to other encoders? You have just provided some more detailed data. I am under the impression that AV1 saves half the bit rate compared to H.264 in Facebook tests, and we know that HEVC also improves by half compared to H.264. From this perspective, the compression ratios of AV1 and H.265 should be comparable. However, other reports showed that AV1 was slightly better than H.265, but the data fluctuated slightly, ranging from 20% to 40%. Can you provide an authoritative comparative interpretation? How much better is AV1 than H.265?
Jiang:My statement is only personal opinion, can not be used as an official explanation to reference. Different test environments will result in different test results. I think there are two differences that contribute to the difference. The first is the test set used, and the second is the different test conditions. In terms of test conditions, AV1 is currently being developed to provide compression algorithms for video-on-demand for Internet companies. In the case of YouTube and Netflix, they are compressed to a fixed bit stream, and the comparison between AV1 and HEVC is based on fixed QP as the comparison condition. AV1 is designed to compress at a fixed code stream and is not optimized for fixed QP. As a result, in many comparison tests, AV1 did not appear to improve much over HEVC. I’m explaining that based on the test results of YouTube, Netflix and other companies in the actual application environment, they believe that AV1 has surpassed H.264 and H.265.
Welcome to follow “Agora” wechat official account, reply to “RTC” to watch the video review of the speech and get the PPT of the speech.