1. An overview of the

Rate Control (RC) is a very important technology module in practical video encoder. Based on actual application scenarios, for example: Real-time or online coding or offline coding, traditional TV broadcasting service or Streaming service on the Internet, The output bit rate of video encoder has different specific requirements and restrictions, such as average or maximum target bit rate, buffer size, initial delay, etc. Bitrate control is aimed at various application business scenarios. By controlling each frame, even each Coding Unit CU (Coding Unit), and Quantization Parameter QP (Coding Parameter), the output encoded video bitrate can meet various specific restrictions. Meanwhile, Coding performance, Both coding efficiency and subjective quality are optimized as much as possible.

In practice, according to different typical application scenarios, common bit rate limiting requirements, that is, bit rate control modes, can be summarized as follows:

1.1 CBR (Constant Bit Rate)

CBR RC assumes that the channel transmission bandwidth or bit rate is constant, and is often used in traditional broadcast and TELEVISION, or streaming media applications on the Internet. In practice, it is usually assumed that the video decoder has a certain size of input stream buffer, so that the actual coding rate of each frame can be allowed to fluctuate within a certain range at the video encoder end even for constant channel transmission rate, so as to achieve better coding performance and quality.

The decoder buffer model is known as the “Leaky bucket” model, where the input is the encoded video stream at a certain bit rate transmitted through the channel and the output is the encoded stream removed from the buffer at the corresponding decoding time of each frame. The common industry term for this model is VBV (Virtual Buffer Verifier). Since h.264 /AVC MPEG international Video Coding standard, VBV model has been defined in the MPEG series of international video coding standards, such as: H.265/HEVC, H.266/VVC are named HRD (Reference Decoder). These models define the behavior of Decoder’s buffer. HRD is defined by three parameters: channel bit rate, buffer size, and initial removal delay.

HRD model and CBR code control will be introduced in detail later.

1.2 VBR (Variable Bit Rate)

VBR RC assumes that there is a maximum bit rate for channel transmission. The transmission bit rate can be equal to or lower than the maximum bit rate or even 0, but cannot exceed the maximum bit rate. VBR RC is often used for offline file storage applications, such as: Compressed storage of film and television files, VOD (Video -on-demand) business applications. The corresponding HRD model defined in the MPEG video coding standard is VBR HRD.

1.3 Average Bit Rate (ABR)

The bitrate requirement of ABR RC is that the average bitrate of each chunk of video frame is one target bitrate. In addition, there is generally no restriction of HRD buffer. For example, a video block is a window of video frames every two seconds. An ABR is an informal code control pattern definition used in the industry, usually for adaptive Streaming. In this kind of service, according to the change of network transmission bandwidth, the content sender can switch the bit stream with different average bit rate in the unit of video block for effective transmission.

1.4 Capped VBR

Capped VBR requires that the average bit rate within the continuous sliding window for 1 second at the start of each frame not exceed a maximum bit rate, except that there are generally no HRD buffer restrictions. This mode is also an informal definition of code control commonly used in the industry, and is usually used for some streaming media services transmitted over the Internet.

1.5 the Stat – Mux (Statistical Multiplexing)

Stat-Mux (Statistical multiplexing) is a common code control mode in broadcast and TELEVISION services. In this scenario, multiple TV programs and video channels are generally transmitted through the same transmission channel at the encoder end, which requires adaptive allocation of the total channel bandwidth among all channels to maximize the overall coding efficiency of all channels.

1.6 CRF (Constant Rate Factor)

CRF is a constant quality video coding mode commonly used in practice [1]. Set a specific target coding quality level, namely CRF value, and the encoder maximizes the overall video coding efficiency by controlling the fluctuation of QP of each frame and each coding unit CU around the target CRF value. In practice, a relatively simple and effective CRF code control algorithm is cu-Tree algorithm [2], which will be introduced in detail later. CRF mode does not explicitly specify the target bit rate, but a target coding quality level, so it can be regarded as a VBR coding mode without explicitly specifying the maximum target bit rate limit. In practical applications, for example, encoding and storage of offline large video files, CRF values need to be set or adjusted empirically based on service quality requirements and channel bit rate to ensure that the actual encoding bit rate meets the requirements of service scenarios. CRF can also be integrated with capped VBR to support business applications such as streaming media.

2. HRD consistency (HRD Conformance)

In view of the importance of practical application, since H.264/AVC [3], the video decoder model based on leaky bucket buffer has been defined in the INTERNATIONAL video coding standard of MPEG series, namely HRD. There are two types of HRD conformance definitions in these standards: The input codestream of Type I HRD only considers the two types of Network abstraction Layer units NAL (Network), Video Coding Layer units VCL (data units containing actual encoded Video frames) and filler unit Coding data of Abstraction Layer) units; Type II HRD takes into account NAL units data of all types of network abstraction layer units, such as SPS (Sequence Parameter Set) except VCL and Filler units. PPS (Picture Parameter Set), SEI (Supplemental Enhancement Information) message and all types of data units. HRD defines two modes: CBR and VBR. CBR HRD is defined by average bit rate, buffer size, and initial removal delay. VBR HRD is defined by similar three parameters, except that the average bit rate is replaced by the maximum input bit rate (that is, the input bit rate is variable and can be anything from zero to the maximum bit rate).

The size of the buffer divided by the bit rate is the maximum buffer delay introduced by the buffer. The size of the buffer delay is closely related to the actual application scenario requirements, such as: Low-latency applications such as video conferencing and online games tend to have very small delays, such as less than 0.3 seconds. Applications such as radio, TV and online streaming media tend to have a delay of 0.5 to 3 seconds. Applications such as offline large video file storage tend to have a long delay, such as more than 10 seconds. In practice, you can determine whether HRD is required and set HRD parameters based on service requirements.

The operation of CBR HRD buffer at the decoder is shown in the following figure. In tr at each frame interval time interval (0) t_ (0) {r} tr (0), tr t_ (1) {r} (1) the tr (1),… Tr (I)t_{r}(I) TR (I), the corresponding encoding stream of each frame is taken out from the decoding buffer, and the initial removal delay is between the time point 0 and tr(0)t_{R}(0) TR (0), and LinitL_{init}Linit is the initial buffer level. Buffer input bit stream is uniform bit rate R, frame to frame interval is reciprocal of frame rate FR, Bits(I) is the bit stream size of frame I, and B is the buffer size, i.e., upper limit.

If the buffer level exceeds the upper limit size, it is an overflow, which can usually be avoided by adding an appropriate amount of padding (i.e., empty) stream after the previous encoding frame. However, the addition of fill rate reduces the overall coding efficiency, so overflow should be avoided as much as possible.

On the other hand, if the buffer level is lower than zero when a frame stream is removed, it is underflow, which means that the stream of the current frame has not been fully transmitted to the decoder, so the decoder must wait, resulting in jittering. In order to avoid underflow, the encoder usually needs to reduce the encoding rate of the current frame appropriately for recoding. However, if the target bit rate encoding of the current frame is reduced too much, it may result in extremely poor encoding quality. How in practice all kinds of complex video content and under the condition of different content of transition or switch, can effectively avoid buffer underflow decoding end, at the same time can guarantee the quality of the overall good coding, frame do not appear very poor quality, is the code of video encoder control algorithm robustness the demands and challenges.

Figure 1. Related operation flow of decoder CBR HRD buffer

In the actual MPEG video coding standard, As shown in Figure 1, the operation flow of CBR HRD is determined by the frame removal time from the buffer and the frame final Arrival time of all code streams of each frame. Is defined and represented by the relative diagram of, as shown in Figure 2 below:

Figure 2. Decoder CBR HRD buffers defined in the standard by frame removal and arrival time

For CBR HRD, the definition of Figure 1 and Figure 2 is equivalent, because the transmission rate of the channel is a constant average rate value. However, for VBR HRD, since the transmission bit rate is variable, and the adjustment of input bit rate can only be clearly and clearly expressed and judged by the relative relationship between frame removal and arrival time. The VBR HRD buffer overflow can only shown in figure 2 formula (tr (I) – ta, F (I) – 1) > B/R (t_ {R} (I) – t_ {} a, F (I – 1)) > B/R (tr (I) – ta, F (I) – 1) > B/R. If there is an overflow, the input bit rate of the (i-1) frame is correspondingly reduced to avoid it. As can be seen, the buffer level diagram cannot be used to clearly indicate the overflow judgment and bitrate reduction. For example, the buffer level diagram cannot be used to determine whether there are any frames that decrease the input bitrate leading to buffer overflow before tr(0) T_R (0) TR (0). Therefore, the standard defines the CBR and VBR modes of HRD with more basic and general frame removal and arrival time [3].

3. Design of code control algorithm in practice

In practice, the code control algorithm should be able to make the overall video coding rate accurately reach the target average bit rate, and if there is a limit of HRD buffer, it should be able to meet the requirements of corresponding buffer. On this basis, the overall coding efficiency should be improved as much as possible. In addition, a practical video encoder products, stability and robustness of the control algorithm of yards also have high requirements, such as all possible in practice all kinds of video content and characteristic under the condition of continuous dynamic change sharply, request code control algorithm to be able to make the final video encoding to achieve and meet the target bit rate, and the corresponding HRD buffer limit, If so, at the same time, try to ensure that there is no extremely poor quality of the problem frame, etc.

The r-Q model is a bit rate prediction model which can estimate or predict the coding rate of each frame. There have been many good research results and related papers on R-Q model, such as:

  • Linear model: R=a∗X/QR=a*X/QR=a∗X/Q, where X is the complexity of frame coding.
  • Quadratic model: R = a + b ∗ ∗ 1 / Q 1 / Q2R = a + b * 1 * 1 / Q/Q ^ {2} R = a + b ∗ ∗ 1 / Q 1 / Q2
  • R-q model for fitting the result data of multiple actual coded QP points
  • ρ\rhoρ domain model: R = a ∗ (1 – rho (Q)) + bR = a * (1 – \ rho (Q)) + bR = a ∗ (1 – rho (Q)) + b, Where ρ(Q)\rho(Q)ρ(Q) represents the ratio of zero quantized transform coefficients to the total transform coefficients after quantization of Q parameters [4].

Among them, the primary model is simple and commonly used. Although the model fitted with multiple actual coded QP points has high prediction accuracy, it also has high computational complexity. The ρ\rhoρ domain model also has high prediction accuracy, and the computational complexity, including transformation and quantitative look-up table calculation, is not very high, so it is also a relatively easy to use model.

How to make good use of the result data of past coded frames is a problem to be considered in the design of code control algorithm. These data include preanalysis complexity of past encoded frames, actual encoded QP, actual encoded bit rate, etc. In practice, as video coding goes on frame by frame, there may be a large number of encoded frames before the current frame, so how to store and utilize these data is a very practical problem. At the same time, in order to ensure good performance, it is often necessary for the code control algorithm to adapt to follow up and adjust in time when the video features of a long period of time have obviously changed compared with the past. This is also a practical problem to consider and pay attention to.

4. CRF code control algorithm

In practice, common service application scenarios allow a certain amount of look-ahead delay. CRF code control is to improve coding efficiency as much as possible by making full use of a certain amount of information of unencoded frames. There are many related researches on CRF code control. The scheme of global optimal solution or quasi-global optimal solution for CRF code control requires that, in the whole preceding video segment, each GOP starting from i-frame to the frame before the next I-frame (assuming that the GOP is closed, That is, there is no inter-frame prediction across GOP) for each CU in each frame, CU QP traversal adjustment is carried out frame by frame and CU by CU. The sum of RD (Rate Distortion) Cost of the current CU and all the subordinate CU with direct or indirect reference to the current CU is minimized. Such quasi-global RDO (RD Optimization) optimization of the whole GOP frame by frame and CU QP needs to be repeated for many times until the results of each CU QP in each frame converge. It can be seen that the CRF code control algorithm of such quasi-global optimal solution has very high computational complexity and is difficult to be applied. As a result, many studies have proposed multiple low-complexity CRF solutions for use. Among them, a common simple and effective CRF code control algorithm is cu-Tree algorithm [2].

Different from the quasi-global optimization CRF algorithm, which takes the current CU as the starting point and examines the RD Cost of all affiliated CU with which it makes direct or indirect prediction reference relationship from front to back, cu-Tree algorithm adopts the calculation method of backward coding complexity propagation, namely: For the sub-GOP structure of a GOP from the last miniGOP, that is, all the bi-directional prediction frame B-frames between the non-bi-directional prediction frame of a temporal Layer 0 and the non-b frame of the previous temporal layer 0, From the beginning to the end of the first miniGOP, each miniGOP starts from the b-frame of non-reference frame at the highest layer of temporal layer and ends at the non-B frame of the lowest temporal layer 0. Calculate the total coding cost (CODING cost) of each CU frame by CU. The coding cost of each CU is determined by the intra cost obtained from the pre-analysis of the current CU plus all previous higher temporal layer frames, which have a predictive reference relationship with the current CU of the current frame directly or indirectly. The sum of all back Propagation costs in the predicted direction. Finally, the optimal QP offset of each CU is the total coding cost of CU including propagation cost, i.e., the sum of CU intra cost and Propagation cost. To the ratio of the original coding cost containing only the intra cost of the current CU without propagation cost. The specific calculation process is as follows.

  • For each CU: PropCostOut = (PropCostIn + IntraCost) ∗ PropRatioPropCostOut = (PropCostIn + IntraCost) * PropRatioPropCostOut = (PropCostIn + IntraCost) ∗ PropRatio, PropRatio = (IntraCost – InterCost)/IntraCostPropRatio = (IntraCost – InterCost)/IntraCostPropRatio = (IntraCost – InterCost)/Intra Cost. One PropCostInPropCostInPropCostIn PropCostOutPropCostOutPropCostOut is the input and output of the current CU propagation cost, IntraCostIntraCostIntraCost is preliminary analysis and calculation of frame prediction model of coding the best CU in cost, InterCostInterCostInterCost and is considered the frame to frame to predict the best model of preanalysis coding cost.
  • Each CU PropCostOutPropCostOutPropCostOut back propagation to its interframe prediction reference CU, as part of the CU PropCostInPropCostInPropCostIn. Specific reference CU cost are based on back propagation to predict the CU involved in the current CU area of the part of motion compensation prediction and the ratio of the total area of the CU, multiplied by the current CU PropCostOutPropCostOutPropCostOut.
  • For each CU: Δ QP = – strength ∗ log2 (IntraCost + PropCostIntraCost) \ {QP} = Delta – strength * log2 (\ frac {IntraCost + PropCost} {IntraCost}) Δ QP = – stre NGTH ∗ log2 (IntraCostIntraCost + PropCost).

You can see that the CU – Tree algorithm is essentially use CU IntraCostIntraCostIntraCost as representative CU coding bits of characteristic, PropCostInPropCostInPropCostIn is used to represent the current CU for all and it has directly or indirectly predict subsequent frames of reference relation of the influence of CU coding bit rate. Due to the calculation method of back-propagation coding complexity, cu-Tree algorithm is not very high in computational complexity, so it is convenient and practical.

5. Summary

The code control algorithm is a comprehensive technical module of a practical video encoder. Its design often needs to consider the requirements and problems of practical application, standard compatibility, product quality, and interaction with other related technical modules of the encoder system. A good code control algorithm is not only helpful to improve the overall coding efficiency and quality, but also closely related to the stability of the encoded video quality under various actual video characteristics. In addition to the development and debugging of the initial basic version, the actual use of the algorithm is often based on new test findings or user feedback, and constantly improve the details of the design of the relevant aspects of the algorithm, so as to constantly improve.

reference

[1] SLHCK. Info/video / 2017 /…

[2] “A Novel Macroblock-tree Algorithm for Dependent Viddeo coding in H.264/AVC”, J.Garrett-Glaser, 2011.

[3] “Advanced video coding for generic audiovisual services,” ITU-T H.264, Mar 2010.

[4] “RP-Domain Source Modeling for DCT Video Coding using Low-Delay Rate Control”, Z. He, Y.-K. Kim, S.K. Mitra, IEEE Trans. CSVT, Vol. 11, Issue. 8, p928-40, Aug 2001.