In the course of audio and video development, coding and decoding is undoubtedly its most core function, and the updating of coding and decoding standards has greatly promoted the development of audio and video technology and the change of behavior mode. From TV to network video and now network live broadcast, on-demand, audio and video conference and so on, behind these changes are inseparable from the update of audio and video coding and decoding technology. For example, H.264(still the most used codec specification at present) and H.265/HEVC(some big factories are using Youku Tencent, etc.), as well as the domestic AVS series.

H. 26 x series

A brief history of video coding standards

LoveYFan

H.261- Founder of video editing

The h.261 is designed to transmit video signals of acceptable quality over ISDN for Integrated Services Digital Network with bandwidth multiples of 64kbps. The coding program is designed to work at bit rates between 40kbps and 2Mbps, encoding video at BOTH CIF and QCIF resolutions, i.e., 352×288 and 176×144 for brightness and 176×144 for chroma with 4:2:0 sampling at 176×144 and 88×72 respectively.

H.261 used the now familiar discrete cosine transform (DCT) algorithm for image encoding, which later played a major role in JPEG encoding. But more than that, it introduced a series of video-specific features, mainly macroblocks and Motion Compensation based on macroblocks, that laid the foundation for modern video coding.

H.261 uses YCbCr color space and uses 4:2:0 chromaticity sampling, with each macroblock containing a 16×16 luminance sampling value and two corresponding 8×8 chromaticity sampling values. YCbCr became YUV and is still the color space used by the current codec specification.

Macroblock and interframe prediction based on motion compensation

As we know, video is a combination of images frame by frame. Generally, a one-second video contains 24, 25, 30, 60 or more pictures, which are played at a certain time interval and form smooth and moving pictures based on the principle of visual residue. There is actually a lot of repetition between successive frames, such as the following example:

A white billiard ball moves on a green table

The direction and distance of the ball were used to describe the changes in the image

If we compress every frame of image in the traditional way, obviously there is still a lot of redundancy in the whole video after compression. So what to do? H.261 introduced the idea of macroblocks, which sliced the whole picture into many small pieces, and then introduced interframe prediction based on motion compensation — most of the picture was still, so we used the compression results for the blocks that were still. The moving part is described by such a vector as the direction of motion plus distance, which can save a lot of storage space, right?

DCT algorithm

Divide 8×8 pixels into a block

DCT algorithm originated in the 1970s. In the middle and late 1980s, some researchers began to use it for image compression. This algorithm can transform the image from the spatial domain to the frequency domain, and then do the quantization — reduce the high frequency information which is less sensitive to the human eye, retain most of the low frequency information, thus reducing the volume of the image. Finally, the processed data is further compressed by efficient data coding, which uses Zig-ZAG scanning and variable length coding.

In H.261 and later video coding based on H.261 framework, DCT algorithm is mainly aimed at the compression of key frame, the so-called key frame is used as a reference frame in motion compensation. For example, like a keyframe in a Flash animation, it defines a starting point from which subsequent frames are calculated. Because it only does intra-frame compression and does not involve other frames, it is also called intra-frame, or I-frame for short.

Mpeg-1: Introduces the concept of frame types

Mpeg-1 is a customized video and audio compression format for CD media. Mpeg-1 uses block motion compensation, discrete cosine transform (DCT), quantization and other technologies, and is optimized for 1.2Mbps transmission rate. Mpeg-1 was subsequently adopted by Video CD as the core technology.

Audio – MP3

There are three generations of MPEG-1 audio, the most famous of which is the third generation protocol called MPEG-1 Layer 3, or MP3, which is still widely used for audio compression.

Video – Introduces B frames and GOP

The concept of video frames already exists in H.261, such as the key frame above (which is a complete still image that can be decoded directly), and other frames calculated on top of the key frame by motion compensation algorithms.

However, MPEG-1 really introduced the concept of frame categories. The original key frame was called “I frame” and the resulting frame was P frame based on inter-frame prediction. In addition to these two existing frame types, H.261 introduced a new type of frame: the bidirectional prediction frame, also known as the B frame.

However, the introduction of B-frame also increases the complexity of encoding and decoding. Mpeg-1 also puts forward GOP(Group of Pictures), that is, the lattice arrangement between I lattice and I lattice.

An image group is a group of continuous images inside an MPEg-encoded movie or video stream. Each MPEG-encoded movie or video stream consists of a continuous group of images.

Below is a GOP example

Mpeg-2: DVD standard

As for MPEG-1, it hasn’t changed much, mostly for DVD applications and the digital age.

Interlaced scanning is supported

Interlaced scan (English: Interlaced) is a method of displaying an image on a scanning laced display device, which has less lace and width than progressive scan. The scanning device alternately scans even and odd lines.

A hint of a slow interlaced scan

H.263: Familiar 3GP video

The original H.261 and MPEG-1 are biased to low bit rate applications. With the rapid development of the Internet and communication technology, people’s demand for network video is increasing, and the pursuit of higher quality video at low bit rate has become a new goal. As a standard setter in the communications industry, Itu-t introduced H.263, the direct successor to H.261, in 1995.

In the 1990s, 3GP was also all the rage, reducing storage space and lower bandwidth requirements to make the limited storage space available on phones. At present, H.263 still dominates in 3GP.

H.264/MPEG-4: To familiar parts

H.264/AVC is a block-oriented, motion-compensated video coding standard. By 2014, it had become one of the most commonly used formats for high-precision video recording, compression, and distribution.

H.264/AVC includes a number of new features that enable it to encode more efficiently than previous codecs and to be used in a variety of network applications. These new features include:

  • Motion compensation for multiple reference frames. H.264/AVC uses more encoded frames as reference frames in a more flexible way than previous video coding standards. In some cases, up to 32 reference frames can be used (in previous standards, the number of reference frames was either 1 or 2 for B frames). This feature can reduce the bit rate or improve the quality of most scene sequences. For some types of scene sequences, such as rapid repeated flash, repeated clipping or background occlusion, it can significantly reduce the code bit rate.
  • Variable block size motion compensation. The maximum 16×16 to minimum 4×4 blocks can be used for motion estimation and compensation, which can carry out more accurate segmentation of the moving region in the image sequence. These types are 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4.
  • In order to reduce Aliasing and get a sharper image, a six-tap filter (sixth-order digital filter) is used to generate the predicted brightness component of 1/2 pixel.
  • Flexible interlaced- Scan video coding.
  • .

H.265/HEVC: Embarrassed successor

As the successor to H.264, HEVC is expected to not only improve image quality, but also achieve twice the compression rate of H.264/MPEG-4 AVC (equivalent to 50% bit rate reduction for the same picture quality), support 4K resolution and even ultra high definition TV (UHDTV). The highest resolution is 8192×4320 (8K resolution).

The following figure is a comparison of subjective video performance between H265 and H264

We know from the performance comparisons above that the H.265 is superior to the H.264 in every metric, so why is it an awkward successor?

  1. Most of the existing audio and video is still based on H.264, and H.264 can meet most of the scene
  2. The licensing fee is too expensive, domestic and foreign audio and video service manufacturers have been fleeced by H.264 once, if they support H.265 again, they have to pay the licensing fee, so at present only part of the big manufacturers (Tencent, Youku) use it on specific films.
  3. H.266 has already been released, so if there is a real need for access to H.266, it is possible to wait for access to H.266, and less need is still not access to H.265, so it is in an awkward position.

The H.266 may still need a few more years of development, so time is running out for the H.265, but it still has a chance.

H.266/VVC: Future coding

It has come it has come, it is walking with the pace of six brothers and sisters, will lead the development of a new generation of audio and video world, it is H.266/VVC.

In July 2020, h.266 /VVC video codec standard announced the completion of editing, which is also the moment that sets the direction of audio and video development in the next 10 years.

VVC stands for Versatile Video Coding, also known as H.266, MPEG-I Part 3 or Future Video Coding(FVC). VVC is designed to achieve greater compression ratio, lower bit rate, and enable more scenes such as 4K, 8K ultra HD, and 360 panoramic video at the same video quality. Of course, VVC also has other features:

  • Lossless and subjective lossless compression.
  • 4K to 16K resolution and panoramic video
  • 10 to 16 bit YCbCr 4:4:4, 4:2:2 and 4:2:0, BT.2100 wide gamut
  • High dynamic range (HDR) with peak brightness of 1000, 4000 and 10000 nits
  • Auxiliary channel (for recording depth, transparency, etc.)
  • Variable frame rate and fractional frame rate 0-120 Hz
  • Adaptive video coding for temporal (frame rate), spatial (resolution), SIGNal-to-noise ratio, gamut and dynamic range differences
  • Stereo/multi-view coding
  • Panoramic format
  • Static image coding

The standard expects several times (up to ten times) the encoding complexity of HEVC, depending on the quality of the encoding algorithm. Its decoding complexity is expected to be about twice that of HEVC.

Video standard update/coding efficiency update

PS

• VTM = VVC test Model, latest version is VTM-10.0

• JVET = Joint Video Experts Team of the ITU-T VCEG and ISO/IEC MPEG (VVC Standards Committee)

H. 266 / VVC advantages

Reduce the cost

The existing H.264/H.265 has met most of the audio and video service requirements, but has reached the bottleneck in some services, and the bandwidth of CDN is also a large investment. If greater compression ratio and lower bit rate can be achieved under the same video quality, It means that the same CDN server can serve more customers, reducing cost and improving efficiency.

Enable more scenarios

Emerging businesses such as VR(Virtual Reality), AR(Augmented Reality), 360 panorama, etc., must use 4K or even higher 8K resolution to achieve the effect, In this situation, it is how to transmit data faster (low latency), better (resolution) and less (low bit rate), and the existing codec schemes have been unable to meet the requirements. Development status of VVC at home and abroad

Development of H.266 in China

  1. I actively participated in the formulation of H.266 standards, including Tencent and Ali, who submitted hundreds of proposals during the formulation of H.266 standards, and more than half of them were adopted. Take an active part in the formulation of rules, so you can have a say later. This is the lesson of blood.
  2. Tencent open Source first H.266 codec github.com/TencentClou… For more information, see www.infoq.cn/article/auu…

AVS series

AVS development history

As the AVS codec standard started late in Korea and most of the patents are in foreign countries, most of the domestic companies will still be ripped off by others. In addition, the performance of AVS coding system is still inadequate. According to a comparative report on coding efficiency of HEVC/VP9/AVS2 issued by IEEE, the performance of HEVC is 24.9% better than VP9 and 6.5% better than AVS2 under the condition of random access. Under the delay condition, HEVC was superior to VP9 by 8.7% and AVS2 by 14.5%. In some areas, AVS2 is not much different from HEVC, but in terms of overall performance and application scale, AVS2 still has a long way to go. Therefore, even though the country is vigorously promoting AVS, its application scenarios are still relatively few, and it is often used in state-owned enterprises.

Google series

VP8

From a technical point of view, VP8 uses technology similar to H.264. Although VP8 was advertised as having better compression efficiency than H.264, in practice it did not perform as well as H.264 due to some design flaws. Eventually, it entered the Web standard, but no one used it. Instead, WebP, which is derived from its in-frame compression technology, is popular.

VP9

VP8 didn’t fare well, and Google quickly introduced its successor, VP9. This time, they referred to HEVC, again aiming for efficient coding at high resolution. Some of the designs in VP9 were influenced by HEVC, such as the Super Block, which also has a maximum size of 64×64. The result is that VP9 delivers up to 50% efficiency gains over VP8. It looks like it’s on par with HEVC, but it suffers from a similar problem with VP8. The scope of VP9 is limited to Google’s own Youtube, which lacks real-world applications.

The future development of audio and video thinking and prospect

Deep learning and end-to-end intelligence enable future audio and video development.

Deep learning

Intelligently adjust codec parameters through model training AI.

The intelligent

Establish an end-to-end communication link