LTR weak network countermeasures need some special processing when implemented with hardware decoder because the decoder feedback is needed. In addition, some hardware decoders are not particularly good at implementing LTR, leading to decoding errors. This article is the third part of QoS weak network optimization series, which will explain in detail the LTR weak network anti-principle of Ali Cloud RTC QoS policy and the pits encountered when realizing hard SOLUTION of LTR and corresponding solutions.
Author | An Jicheng, Tao Senbai, TIAN Weifeng
Review | Tai one
Long Term Reference (LTR) Anti-weak network principle
Reference frame lost I frame recovery
In the RTC scenario, the general encoding reference strategy is to refer to the previous frame (without taking temporal SVC into account), because in general, the closer the reference distance is, the better the similarity is, and the better the compression effect is. For real-time consideration, there are only I frame and P frame, but no B frame. In the case of P frame loss, the receiver needs to request I frame again to continue decoding and playing correctly.
As shown in the figure above, normal I P P P frame encoding, if a certain P frame (✖️ mark) in the middle is lost and cannot be recovered due to weak network, the receiver will request the sender to re-encode I frame. However, I frame can only use intra-frame prediction, so the encoding efficiency is low.
Reference frame lost LTR recovery
Long term reference frame is a cross-frame reference frame selection strategy, which breaks the traditional forward frame limit and can choose the reference frame more flexibly. The purpose of the long-term reference frame strategy is that in the scenario of P frame loss, the receiver can continue to decode and play correctly without requesting I frame again. Compared with I frame, it can significantly improve the encoding efficiency and save bandwidth. This technology can bypass the lost frame and use a long term reference frame received before the lost frame as a reference to encode/decode the display, so as to improve the video fluency in weak network scenarios.
The frame loss recovery strategy after the introduction of LTR technology is shown in the figure above. When weak network does not occur, it is still normal I P P P frame encoding, but some P frames will be marked as LTR frames (green P frames in the figure, hereinafter referred to as LTR labeled frames). If a certain P-frame (✖️ marker) in the middle of the weak network is lost and cannot be recovered, the receiver will request the sender (encoder) to use LTR to recover. At this time, the encoder will make a P-frame (red P-frame in the figure, hereinafter referred to as LTR recovery frame) using the LTR marker frame previously confirmed to be received as a reference.
Since the previous LTR marker frame has been received by the decoder, there must be this frame in the decoder reference frame buffer, so the red P frame using this frame as a reference must be decoded correctly by the decoder. LTR recovery frame is a P frame with reference, so the encoding efficiency is significantly improved compared with I frame.
According to the characteristics and purpose of LTR technology, it can be seen that LTR technology is a reference frame selection technology which is completed by the cooperation of network module and encoder. The realization of LTR technology requires receiver side feedback information, namely LTR flag frame (green P frame in the figure) sent by the encoder. If it is successfully received by the decoder, the decoder needs to notify the encoder that it has received this frame, so that the encoder can “rest assured” to use this frame for reference when receiving the LTR recovery request.
LTR is also partially covered in the previous two articles. Interested readers may refer to:
1. Several Encoder-related Optimizations of Aliyun RTC QoS Screen Sharing Weak Network Optimization
2. Variable Resolution Coding for Aliyun RTC QoS Weak Network Antagonism
Hardware decoding supports LTR
Advantages of hardware decoding
Compared with software decoding, hardware decoding has the natural advantage of low power consumption. Therefore, hardware decoding should be preferred when conditions are available and video viewing experience is not affected.
Get LTR related information in the code stream
For a software decoder, developers can directly implement an interface in the decoder to read LTR information from the code stream, such as whether the frame is an LTR flag frame and its frame number. If this frame is an LTR flag frame, its frame number is fed back to the encoder to indicate that it has received the frame.
However, for hardware decoders, their interfaces cannot be modified by software developers, and general hardware decoders do not have interfaces to read LTR information. So how do I get information about LTR?
The method used in this paper is to conduct code stream parsing again at RTC level outside the hardware decoder, read LTR related information and feed back to the encoder. Since this information is in the high level syntax of the stream, such as the Slice header, it is not too expensive to parse this part of the stream.
Some hard solutions do not support LTR and solutions
Because the above functions of LTR are not particularly common in CODEC, some hardware vendors do not implement LTR functions well. In the test process of this paper, some problems were found.
In the figure above, LTR recovery frames, namely red P frames, can be decoded correctly by the hardware decoder tested in this paper if the normal P frames in the red box are not lost. But if the P frame in the red box is lost, then some hardware decoders fail to decode correctly the LTR recovery frame in the red behind. In this paper, some mobile phones are measured, and it is found that mobile phones using Apple, Qualcomm and Samsung chips can decode correctly, while mobile phones using huawei (Hays) and Mediatek chips cannot decode correctly the LTR recovery frame at this time, and will return decoding errors or output split-screen.
Since the actual occurrence of weak network will be accompanied by frame loss, that is, the P frame in the red box will definitely be lost, so if the LTR recovery frame cannot be decoded at this time, it is equivalent to LTR technology is not available for these hard solutions. This should be some hardware decoder implementation of their own problems, that is, not fully in accordance with the standard to implement. But how do you get around this problem?
Further test found that the cause of the decoding error was the discontinuous frame number caused by the loss of P frame in the middle red box. If the frame number of the following frames was changed to continuous, the decoding could still be correct! Therefore, the solution of this paper is: For a frame code stream, if the frame number is found to be discontinuous before it is sent to some hardware decoders, the code stream can be directly rewritten to be continuous and then sent to the hard solution. In this way, the problem that some hard solutions cannot decode LTR recovery frames can be well avoided, so as to give consideration to power consumption and weak network video experience.
“Video cloud technology” your most noteworthy audio and video technology public account, weekly push from Ali Cloud front-line practice technology articles, here with the audio and video field first-class engineers exchange exchange.