Takeaway: H.265 is a new generation of video coding standard developed by ITU-T VCEG after H.264. Compared with H.264, H.265 can further improve the compression efficiency and picture quality, and has been increasingly widely used in many audio and video scenes. We have done a lot of engineering practice on H.265 in netease Yunxin NERTC. This article is the third video in the series of experience sharing, and it will make specific introduction from four aspects.

Ability to negotiate

Whether a client can send a stream of a given feature depends not only on whether the local end supports encoding, but also on whether other receivers in the room can decode it. ** that is, the sender and the receiver jointly determine what kind of stream the local end can send. ** For H.265, the following may occur:

So how will they communicate with each other? If client B sends H.265 streams, client A and client D cannot decode the h.265 streams sent by client B because they only support H.264 decoding. Therefore, we need to design A capability negotiation mechanism to ensure normal communication between clients with different codec capabilities.

Below is the overall design diagram of capability negotiation:

The following is a detailed introduction of our capability negotiation design scheme.

Capability set design

The capability set is defined as {uint32 Key: [uint8 value1, uint8 value2…] }, using a 1-bit mask

** SDK ** key range is [0, 2^8]

The key range of video is [2^8, 2^16]

The key of audio is in the range [2^16, 2^24]

The bytes are shown as follows:

Examples are as follows:

Capability Negotiation Process (client)

  1. The client defines the capability set locally

  2. The capability set reporting and capability set delivery mechanism is implemented locally on the client

  3. The server provides the generation, synthesis, and delivery of capability sets within channels

For example:

Definition: Power field ** VideoCodec: 256 (2^8), power values H.264:0, H.265:1 **

When clients A and B enter the same channel, client A reports capability {256: [0, 1]}, and client B reports capability {256: [0]}.

After receiving the capability sets of client A and client B, the server knows that A supports H.264 and H.265, and B supports H.264. After integrating the capability sets of client A and CLIENT B, the server obtains the conclusion that the channel supports H.264, and sends the capability set {256: [0]}.

After receiving the capability set, client A actively disables its OWN H.265 capability. After receiving the capability set, client B does not change the capability set.

Capability Negotiation Process (Server)

  1. When a room is created, the default capability set is generated, provided by the engine, and the server is used as a global configuration

  2. The first EDge_LOGIN request, if it has a capability set field, uses this capability set to override the default capability set as the room’s capability set; If there is no capability set field, the default capability set is used as the room’s capability set. Being the first, there was no need to broadcast.

  3. If the capacity set of each subsequent user is greater than that of the room, the capacity set of the room is returned to the user, and the capacity set of the room remains unchanged. An example is shown below:

  1. If the capability set of each subsequent user is smaller than the capability set of the room, the intersection is taken, the capability set of the room is reduced, and the result is broadcast to all users. An example is shown below:

H.265 Codec practices

Each platform has its own hardware 265 codecs, as well as a variety of open source software 265 codecs, so how do we choose to achieve the best results?

Below, we will introduce the practice of 265 codecs on Android, iOS, Mac and Windows platforms. The software encoder x265 was used for evaluation, and the software solution ffMPEG and libheVC were used for evaluation.

The Android end

First, let’s take a look at the power consumption and bit rate of the Android hardware codec (test model Mi 10: Qualcomm Technologies, Inc SM8250, test Profile: 720P 30FPS 1.7m)

Take a look at the performance of each software codec on android (test model Mi 10: Qualcomm Technologies, Inc SM8250, test Profile: 720P 30FPS 1.7m)

Finally, take a look at the picture quality comparison (test model Mi 10: Qualcomm Technologies, Inc SM8250, test Profile: 720P 30FPS 858K), h.264 on the left and H.265 on the right

Conclusion:

  1. Android hardcode 265 is roughly on par with 264 in terms of power consumption, while android hardcode 265 is a bit more stable than 264
  2. Android software 265 performance is still relatively poor, can not meet the needs
  3. Android ffMPEG 265 is poor on arm64 and uses up to 15% of the CPU, while libhevc uses up to 4.5% of the CPU

4. The image quality of Android hardcoded 265 is clearly clearer than that of hardcoded 264, and the picture quality benefits are obvious

Therefore, the use strategy of our 265 codec on Android is as follows:

  1. The 265 hard solution is preferred. Some devices have 265 hardware compatibility problems, high-end models choose 265 soft decoder, low-end devices directly consider that 265 decoding is not supported
  2. Prefer to use 265 hard braid. Some devices with 265 hardcoded compatibility issues will be considered as not supporting 265 encoding and degraded to 264 encoding
  3. Because the performance of libhevc soft solution is significantly better than ffMPEG soft solution 265, we will choose libhevc for the 265 soft decoder

The iOS side

First, let’s take a look at the power consumption of iOS hardware codec (test model iPhone11, test Profile: 720P 30FPS 1.7m)

In addition, we found that on some models or in some scenarios, iOS hardcoding 265 has a significant bit rate deficiency (test model iPhoneXR, test Profile: 720P 30fps 1.7m).

H.264:

H.265:

In the case of serious bit rate shortage, we designed a bit rate monitoring method based on iOS hard coding control. If the bit rate is seriously insufficient, it will revert from H.265 to H.264

Finally, take a look at the picture quality comparison under the condition of stable hard coding rate (test model iPhone11, test Profile: 720P 30FPS 858K)

H.264 on the left, H.265 on the right

Conclusion:

  1. The power consumption of iOS hard coding 265 is significantly higher than that of hard coding 264, and the power consumption of iOS hard coding 265 is also significantly higher than that of hard coding 264
  2. IOS hardcoded 265 occasionally suffers from a bit rate deficiency that results in inferior image quality to 264
  3. The image quality of iOS hardcoded 265 is a bit sharper than the image quality of hardcoded 264, which has some quality benefits

Therefore, the final use strategy of our 265 codec on iOS is as follows:

  • Hard 265 is preferred, but soft 265 is not supported
  • Hard solution 265 is preferred. Only when hard solution 265 fails to decode for many times and cannot be recovered, it reverts to FFMPEG 265 soft solution
  • Since the hardware coded power consumption of iOS 265 is higher than that of hardware 264, we monitor the overall power of iOS devices and switch hardware 265 back to the 264 hardware encoder when the power is low (such as 20% critical point)
  • IOS hard coding control module was used to monitor the actual coding bit rate. In case of obvious bit rate insufficiency or bit rate overshooting, hard coding 265 was switched back to hard coding 264

IOS hard coding controller

The following figure shows the hardcoded controller module design diagram:

Code control process:

  • First, the hard encoder passes the Target Bitrate to the HWBitrateController

  • After each frame is encoded, the HWBitrateController updates the encoded frame size to estimate Estimated Bitrate per second

  • Calculate the direct difference Diff between the target bit rate and the actual bit rate

  • By dichotomy, 0.5 times Diff plus Target Bitrate is taken as the Adjusted Bitrate

  • The Adjusted Bitrate is set back to HWEncoder as the Target Bitrate of HWEncoder

  • Back to step 1, the hard encoder passes the current Target Bitrate to the HWBitrateController

  • Calculate the Diff/Target Bitrate. If the Bitrate continues to be greater than 30%, it is considered that the Bitrate is obviously insufficient and the degraded encoder needs to be triggered

Mac * * * *

First, let’s take a look at the CPU and bit rate of Mac side 265 hardware and software codec (MacBook Pro (15-inch, 2016) : Intel(R) Core(TM) I7-6700HQ CPU @ 2.60ghz, Test Profile: 720P 30fps 1.6m)

Hard knitting 265 bit rate:

Soft knitting 265 bit rate:

At the same time, we hard-coded the Mac stream 265 by dumping forward B frames instead of P frames

MacBook Pro (15-inch, 2016): Intel(R) Core(TM) I7-6700HQ CPU @ 2.60ghz, Profile: 720P 30FPS 1M

Conclusion:

  1. Mac hardware 265 has a lower CPU usage than software 265, resulting in significant performance gains
  2. The stability of hard coding 265 bit rate is not as good as that of soft coding 265, and the fluctuation is relatively large, but it fluctuates around the target bit rate. After a long time test, it is found that the overall bit rate has no obvious overshoot or insufficiency on average
  3. Hard coded code stream forward B frame instead of P frame, the overall compression rate will be higher, the same code rate better picture quality
  4. The picture quality of hard knitting 265 is obviously better than that of 264, while the picture quality of soft knitting 265 is not different from that of hard knitting 265

Therefore, the Mac side of our 265 codec use strategy is as follows:

  1. The 265 hard solution is preferred. If some devices have compatibility problems with hardware solution 265 or do not support hardware solution 265, use ffMPEG on devices with strong CPU performance. If the devices with weak CPU performance do not support decoding 265
  2. Prefer to use 265 hard braid. Some devices have compatibility problems with or do not support 265 hard coding. Therefore, use 265 soft coding on devices with strong CPU performance. If the devices with weak CPU performance do not support 265 hard coding

Windows client

On Windows, due to the serious fragmentation of hard programming and hard solution, we have not considered using hard programming and hard solution for the time being, and we mainly use soft programming and soft solution at present.

The following figure shows the situation of the software on the upper side of WIN (test model Dell Latitude 5290: Intel(R) Core(TM) I5-8250U CPU @ 1.60ghz, test Profile: 720P 30FPS 1.6m)

You can see:

1. The performance of software 265 and software decompress 265 on x86 is inferior to that of X86_64

2. The CPU consumption of software 265 on X86_64 is much higher than that of software 264

Therefore, the use strategy of our 265 codec on the Win end is as follows:

1. In the case of Win x86, software editing 265 is not supported

2. If Win x86_64 is used, software editing 265 is enabled on devices with strong CPU performance. At the same time, soft editing 265 has higher REQUIREMENTS on CPU performance than soft editing 265, so it has stricter requirements on device performance

Project strategy

Whitelist policy

As mentioned above, we need to consider the problem of device compatibility when using the 265 hardware. At the same time, the performance of the equipment should be considered because of the large number of codec operations.

To provide the best user experience at the overall project level, we used a whitelist strategy. This section describes how to deliver a whitelist to distinguish whether the 265 hardware and software devices support hardware and software.

The following is our specific approach:

  • Through a large number of device adaptation, the 265 hardware hardware support better devices, configure into the online whitelist
  • You can distinguish device CPU performance by device running. Devices with high performance support 265 software and devices with low performance do not support 265 software, and then update the configurations to the online whitelist
  • Finally, through online whitelist configuration delivery, the client obtains the configuration information about whether hard editing and hard decoding are supported and whether soft editing and soft decoding are supported

H.265 Capacity consultation

The negotiation is about H.265 decoding capability, and the negotiation result is finally acted on the encoding side, whether to use H.265 encoding

  • The current H.265 decoding capability is derived from the configuration delivery, user Settings, and device support capabilities
  • The capability negotiation module generates the capability set according to the decoding capability of h.265 and reports it to the capability negotiation server
  • The capability negotiation server integrates the capability sets of each client in the channel, generates a new room capability set, and sends it to each client
  • The client receives the capability set from the server and resolves the capability set of H.265 in the current channel
  • According to the capability set of H.265 in the current channel, the configuration and delivery of H.265, user Settings, the current device support capability, and the server support capability of H.265, it is concluded whether the current SUPPORT of H.265 is supported or not. Finally, it is determined whether the coding side can encode THE STREAM of H.265

CPU OverUse strategy

Running scores to distinguish between high and low CPU performance on different devices may not be completely accurate. In actual scenarios, there may be high running data but insufficient coding performance. In this case, we need to monitor and count the current video coding time in real time to determine whether the CPU is overloaded.

What we do is:

  • Considering the possibility of pipeline delay, we do not count the encoding time of each frame at present
  • When the software 265 is used, the encoding time of each frame is counted. If the encoding time is too long, the CPU is considered to be overloaded, and the software 265 will immediately revert to 264

Adjust the QP threshold

** Our QOE module will adjust frame rate and resolution according to the QP threshold to achieve optimal subjective video quality. ** Therefore, it is very important to set a reasonable encoder QP threshold. Then in practical engineering practice, how can we explore a reasonable UPPER and lower QP threshold for the hardware 265 encoder?

What we do is:

  • To ensure that H.264 and H.265 are basically aligned in the subjective picture quality, the QP value is printed and the QP curve is generated. Based on the QP curve, the basic range of the upper and lower limits of THE QP threshold is obtained

Take this model as an example: (Test model Mi 10: Qualcomm Technologies, Inc SM8250, test Profile: 720P 30FPS)

You can see:

  1. In the Profile gear of 720p 30fps, the hardware 265 bit rate can be saved by 40%, QP fluctuations are basically similar, and upper and lower thresholds are also basically similar, under the condition that the picture quality of hardcoded 265 and hardcoded 264 are basically aligned. Therefore, if the current QP upper threshold of Android hardcoding 264 is [A, B], it is recommended that the upper QP threshold range of Android hardcoding 265 be [A-1, A+1] and the lower QP threshold range be [B-1,B+1].

  2. Based on the upper and lower threshold range of QP obtained in Step 1, the network loss is gradually released. The adjustment of resolution and frame rate of QOE module based on THE QP threshold is observed, and the most reasonable QP threshold is found out from the QP threshold range, as shown in the following data:

We found that the overall QOE performed best when the QP threshold was [A-1,B-1], so [A-1,B-1] was selected as the final HARD-coded QP threshold of 265

earnings

At present, the revenue evaluation of the first stage is based on the alignment with the h.264 resolution bit rate, and we look at four indicators including picture quality benefit, end-to-end delay, CPU usage and fluency. Here is a list of the picture quality benefits of Android and iOS hard coding 265 compared to hard coding 264. According to our evaluation, the three indicators of end-to-end latency, CPU usage and fluency are basically not different. The picture quality benefits can be seen in the following chart:

In general:

  1. Compared with 264,265 has obvious benefits in video quality

  2. In the case of low bit rate, weak network or dramatic picture changes, the picture quality gains will be more obvious

  3. Android hardcode 265 has significantly better picture quality gains than iOS hardcode 265

conclusion

Regardless of whether it is hard or soft, the video quality of 265 has obvious benefits compared to 264. In the future, netease Yunxin will further explore the dimension of picture quality alignment and saving bit rate.

The authors introduce

Shi Lingkai, netease Yunxin audio and video engine development engineer, responsible for NERTC video engine development and maintenance.