The full text is 3514 words, and the expected reading time is 26 minutes

The background,

In the era of national video, video playback in Baidu APP is a very important business. With the arrival of 5G, video playback has not met the previous STANDARD DEFINITION/HD, ultra hd and EVEN 4K has been the old days of Wang Xetang yan flying into ordinary people’s homes. More and more clear video sources, more and more complex video encoding, the VIDEO decoding ability of APP also has higher and higher requirements.

At the same time, everyone’s mobile phone performance is getting better and better, many mobile phones have gradually provided a strong hardware decoding ability; And software decoding development for many years, but also has its irreplaceable advantages. Therefore, how to make reasonable use of the software/hardware decoding capabilities of mobile phones, give full play to their respective advantages, and provide users with better video playback experience, has become our focus of optimization direction.

1.1 Advantages and disadvantages of software/hardware decoding

The decoder has two modes: software decoder and hardware decoder. Software decoding industry has a more mature FFmpeg, using CPU to decode. The development of hardware decoding started late. On Android phones, special decoding chips are used for decoding. The system provides MediaCodec for access to the underlying hardware decoder.

Every coin has two sides, both decoding modes have their advantages and disadvantages, and in many players, both modes coexist. The advantages and disadvantages of hard and soft decoding are commonplace for audio and video developers.

Note: On Android, MediaCodec more specifically, is a set of framework provided by Google, because of the various chip manufacturers, mobile phone manufacturers to achieve different, so often compatibility problems. In addition, the initialization process of MediaCodec is long, and some mobile phones need to cache multiple frames internally before outputting the first frame. These two factors result in the decoding speed of the first frame of the hard solution is significantly slower than that of the soft solution.

1.2 Efficiency Comparison

1. Software decoding: FFmpeg is used to obtain YUV data after decoding, which needs to be converted to RGB through libYUv and rendered on the screen.

2. Hard decoding buffer mode: MediaCodec is used to obtain YUV data from buffer after decoding, which needs to be converted to RGB through libYUv and rendered on the screen.

3. Hard decoding surface mode: Use MediaCodec, the surface mode is the most efficient mode in the official instructions: Bind surface during decoding, and directly upload the surface to the screen through the system API after decoding.

  • Online statistics of first frame decoding time:

(Baidu APP version V11.20.0.14, Data date: March 20, 2020)

  • Decoding frame rate and CPU usage statistics need to be measured by pressure (audio and video synchronization is not carried out, decoding performance is completely released), and the following two items use offline test data. 16th 16TH 4K HEVC

    Note: The higher the decoding frame rate, the more frames are decoded within 1 second, the less time it takes to decode a single frame, and the higher the performance.

The CPU is shown below:

It can be seen that MediaCodec surface mode is the most efficient mode in video playback, which not only makes full use of the advantages of hard decoding, but also reduces the time of data copy and YUV conversion to RGB due to the system directly on the screen, effectively reducing CPU load and memory consumption.

1.3 pain points

To sum up, in video playback, the ideal state is to use hard decoding surface mode as much as possible, followed by hard decoding buffer mode, and finally consider soft decoding. However, the first screen decoding speed and the compatibility of hard decoding models should be considered at the same time. In these scenarios, soft decoding is preferred.

To do this, we need to address the following pain points:

1. How to improve the compatibility judgment of hard decoding?

2. How to use hardware decoding as far as possible while ensuring the decoding speed of the first screen?

Second, our plan

Pain point 1: how to improve the compatibility of hard decoding judgment?

  • Mainstream practice: offline testing of hardware decoding compatibility of various models, maintenance of static hardware decoding blacklist.

  • Disadvantages: high labor cost of testing, and it is difficult to cover a variety of online models in offline testing; Mobile phone models are constantly iterated. In this way, new abnormal models cannot be shielded in time.

  • Our scheme 1: in the static hardware decoding blacklist mechanism, add decoder monitoring.

Pain point 2: How to use hardware decoding as far as possible while ensuring the decoding speed of the first screen?

  • Mainstream approach: hardware decoding is selected for broadcast scenes that need to ensure decoding efficiency, and software decoding is selected for first-screen decoding speed.

  • Disadvantages: software decoding scenarios cannot give full play to the advantages of hardware decoding of mobile phones.

  • Our plan 2:

  • Set the first screen decoding time threshold, for example, 200ms.

  • Obtain the first screen time of historical hard decoding from the decoder monitoring module for prediction. If it is lower than the threshold value, directly use MediaCodec surface mode; Above the threshold, soft unmulticast is used, switching seamlessly to MediaCodec buffer mode.

As mentioned above, the two schemes are introduced in detail below.

Scheme 1: decoder monitoring

1. Design of decoder monitoring module

  • The decoding monitoring module records the decoding of hardware and software of various encoding sources by encoding type (H264/HEVC) & profile & Level as an ID.

    Profile specifies the compression rate of the video. Level Specifies the resolution, frame rate, and bit rate. Both are important features of video coding. For the same encoding type, the decoder may support different levels of profile/level.

  • Record the first screen decoding speed and average decoding speed of hardware and software decoding in this encoding mode.

  • For hardware decoding, whether the hardware decoder crashes is also recorded. Operation times and the number of anomalies during operation (including decoding interface throwing exceptions; Decode blocks, etc.).

  • During a period of time when baidu APP is just installed, the video will be played randomly using software/hardware to collect the operation of the machine’s decoder.

2. Process

  1. In the prepared stage, the static hard decoding blacklist is first passed, and then the coding type is subdivided into profile & Level. From the decoding monitoring module, check whether the hard decoding mode of the video source encoding mode crashes – > whether the hard decoding mode is abnormally large, and judge whether the hard decoding compatibility is satisfied.

  2. Obtain the current video source encoding mode from the decoding monitoring module and use the first frame time of soft and hard decoding to predict the first screen time. When hard decoding meets the specific first screen time, hard decoding is preferred; otherwise, software decoding is selected to start the broadcast.

  3. After the video is played, the decoding time of the first frame, the operation of the decoder (whether there is a crash, whether there is an anomaly, the average time of each frame) will be updated to the monitoring module for the prediction of the next play.

In the case of hardware decoding crash and too many anomalies, we judged that there were compatibility problems with hardware decoding and used software decoding to play the whole video.

In fact, compatibility is OK if the first screen time of hardware decoding exceeds the threshold, so we can use scheme 2 to further optimize after software decoding is used for fast playback.

Scheme 2: Seamless switching of hardware and software decoder

1. Unification of decoding channels

Why unification? To do a good job, he must sharpen his tools.

If you have a unified decoder module that encapsulates the soft/hard decoder (including all three decoders) and provides a unified access interface, you can still use the Player as if it were a normal decoder. It is more reasonable to implement the whole architecture and convenient to maintain the extension.

Module internal maintenance front and background decoder, internal state, switch frame and other logic, external no sense. And I frame identification, switching identification, etc., can be carried in PKT incoming, so there is no need to add some decoding module decoding irrelevant interface, interface design is more reasonable.

2. Decoder switching logic

Two types of timing can be toggled: 1) playback decoded to the second GOP; 2) Player occurs seek.

Play to the second GOP switch:

  1. At the beginning of playback, the soft decoder is opened in the decoding module as the foreground decoder; While creating a background hardresolver thread in a wait state, it does not block the foreground decoding task.

  2. The Player starts to play, and the PKT of the first GOP is given to the decoding module. With the advantage of the foreground decoder (soft solution), the first video frame is quickly decoded for rendering and display, realizing fast playback.

  3. 4-5 seconds later, the second GOP arrives, PKT (video packet) carries the switchable flag notification decoding module, simultaneously inputs multiple PKTS of GOP2 to the hard decoder for decoding in the background, and enters the frame catching state. The foreground soft decoder decoding remains unchanged, entering a PKT and decoding a frame.

  4. When the background hard decoder PTS catches up with the soft decoder PTS, the hard decoder thread can be closed, and the front and back decoders switch. This process needs to ensure the continuity of the frame, to reach a seamless switch, the user has no sense.

  5. Subsequent PKT of GOP2 and subsequent Gop3/4/5… MediaCodec buffer mode is used. In this way, the soft solution is used to ensure the first frame decoding speed, and the decoding advantage of MediaCodec is also maximized.

When Player seek switch occurs, the logic of this scenario is relatively simple. When Player SEEK calls decoder flush, we take this opportunity to switch foreground decoder to hard decoder, and use MediaCodec buffer mode all the time.

3. Ensure seamless switching of decoder

There are two situations in the process of frame chasing and decoder switching:

  • (Left picture) GOP2 hard decoder decodes N frames before catching up with soft decoder, so these repeated frames (gray part) need to be discarded to avoid repetition and jump back.

  • (Right picture) GOP2 hard decoder decodes the first frame, that is, it has caught up with the soft decoder, so it must fill in the empty PKT packet, the soft decoder internal cache all output, to avoid the picture jump.

Three, endnotes

At present, on the Android side of Baidu APP, on the premise of ensuring the first-screen speed and decoding error rate without degradation, hardware decoding accounts for 87% of video playback, as follows:

At present, in the era of video services, codec is also making continuous progress, with various new encoding methods emerging in endlessly, and the terminal is also constantly strengthening its decoding ability in this direction. As decoding is an important part of video playback, it is foreseeable that we will continue to explore and optimize decoding on the end in the future to provide users with better experience.

Recommended reading:

Baidu Aipanpan real-time CDP construction practice

When technology refactoring meets DDD, how to achieve business and technology win-win?

Interface documents automatically change? Baidu programmer development efficiency MAX secret

Tech reveal! Baidu search medium low code exploration and practice

Baidu intelligent cloud combat – static file CDN acceleration

Simplify the complex – Baidu intelligent small program master data architecture practice summary

Baidu search in Taiwan mass data management cloud native and intelligent practice

———- END ———-

Baidu said Geek

Baidu official technology public number online!

Technical dry goods, industry information, online salon, industry conference

Recruitment information · Internal push information · technical books · Baidu surrounding