Click a button to subscribe to “Yunjian Da Jia” column, access to the official recommended boutique content, learn technology not lost!

5G remote control scenario, yesReal-time audio and videoTransmission delay, holdup rate, weak network and other indicators have very high requirements. This paper will introduce how to combine the characteristics of 5G network to carry out joint optimization in real-time audio and video communication links to meet the demand for remote control of industrial scenes and reduce the picture delay.

In the last article, we covered the technical essentials of remote control. From the beginning of this chapter, the author will introduce the application and optimization of the three technologies of remote control. This paper will start with real-time audio and video communication technology, which is mainly used to solve the remote control of the controlled equipment or vehicle surrounding environment picture and sound to the remote control end of the real-time transmission, convenient remote driver or controller can clearly understand the surrounding conditions of the controlled equipment, so as to conduct targeted control. For example, the picture of the front, side and rear of the vehicle in advance, and the picture of the grasping arm in the operation of the excavator need to be transmitted remotely through real-time audio and video technology.

In order to ensure the real-time and fluency of control, compared with the transmission of sound, remote control mainly has very high requirements for the transmission of picture, especially for the core indicators such as picture delay, lag rate and anti-weak network ability. Taking low-speed remote driving scenarios as an example, the delay should be less than 200ms and close to 100ms as far as possible, and the lag rate should be better than 2 ‰, which can resist the network fluctuation equivalent to the average RTT delay and the packet loss rate of 20%-30% in extreme cases. The requirements of these indicators are often significantly higher than the previous remote conference, live broadcast, monitoring and other application scenarios, but for real-time audio and video technology, reducing delay, reducing the lag rate, improving the ability to resist weak network is often contradictory, so this is a very big challenge.

Remote control compared with other application scenarios

Follow the map to explore the key points of optimization

The following figure shows a typical video transmission link, which is mainly composed of acquisition, encoding, sending, transmission, receiving, decoding, rendering and other main modules.

Typical video transmission link diagram

Capture: Capture the original image frame data from the camera

Encoding: The original image frames are encoded

Send: The encoded video frame is packaged and sent

Transfer: The packaged data is transferred from the network

Receive: Receives packaged data and restores video frames

Decoding: The video frame is decoded to recover the original image frame data

Render: Render the raw image frame data to the screen

In real-time audio and video communication, the Jitterbuffer in the receiving module is mainly responsible for resisting network fluctuation and reducing the lag rate, and it is also one of the main contributors of delay. The implementation of jitterBuffer is slightly different in different projects, but it basically has out-of-order arrangement, frame detection, frame caching and other functions. The JitterBuffer is responsible for correctly receiving and properly caching video frames. After confirming that the decoding condition is reached, the jitterbuffer is smooth according to the estimated inter-frame delay (the time difference between two frames receiving and the time difference between two frames sending), and then sent to the subsequent decoding and rendering modules. In this way, even if there is some fluctuation in the network, due to the smoothness of the Jitterbuffer, the adjacent video frames can still be rendered close to the desired time interval, thus smooth playback. Generally, to deal with packet loss, disorder, and delay jitter, the larger the NETWORK RTT and delay jitter are, the larger the jitterbuffer is required. In this case, the larger the cache is, the larger the video delay will be. This is the fundamental origin of the contradiction between the three indicators.

In addition to the receive module, let’s look at other modules. With the increasing computing capability of the chip, the delay of encoding, decoding, rendering and other modules has been very small, which is basically within 10ms, or even around 5ms. There is not much room for optimization, which has little impact on the three core indicators. The delay of acquisition and transmission module is mainly affected by external objective conditions, the former depends on the camera, the latter depends on the network. The sending module affects packet loss, delay, and jitter during data transmission, which affects the receiving effect. Therefore, in order to achieve the three core indicators, it is necessary to optimize the host sending and receiving modules. Through the optimization of the sending module, the size of the jitterbuffer at the receiving end can be reduced as much as possible on the basis of the guarantee of the lateness rate and the ability to resist weak network, so as to reduce the delay.

Aim at the target and design optimization scheme

As for the joint optimization of sending and receiving modules, the implementation of different projects is not consistent, and the complexity and effect vary greatly. The following is a real – time audio and video communication architecture of the more complex send and receive module implementation diagram. Tencent remote control products in real-time audio and video communication is also used in such a composition.

Schematic diagram of sending and receiving modules

The sending module is mainly composed of packet protocol, congestion control, sending window, error coding, etc. In order to improve transmission efficiency and anti-weak network ability, packet protocol is usually based on standard RTP protocol, and UDP protocol is used at the bottom. Congestion control mainly estimates network state and makes recommendations for sending pacing Windows and bit rates. Error coding is mainly used to resist the loss of RTP packets and improve the forward error correction capability. In this way, some lost packets can be recovered through error decoding without relying on retransmission.

Besides the out-of-order cache, frame detection cache and frame cache involved in jitterbuffer in the receiving module, there are modules such as unpacking, error decoding, link state estimation and feedback, etc. The link state estimation feedback is mainly used to estimate packet loss, delay and jitter of the link, and is used to guide the design of jitterbuffer size and provide reference for congestion control at the sender.

As mentioned above, the purpose of optimization is to reduce the size of the JitterBuffer, and inter-frame delay fluctuation is the core factor affecting the size of the JitterBuffer. In addition to network fluctuation, packet loss and retransmission is the main contributor to peak delay fluctuation. Therefore, the first consideration of joint optimization of send and receive is to reduce packet loss and retransmission. Tencent has made better optimization for 5G remote control scenarios mainly in terms of congestion control and error coding, reducing the probability of packet loss and retransmission.

Congestion control: at present, among the common congestion control methods for real-time audio and video, BBR, GCC and so on are better.

Bandwidth product of BBR is mainly based on network time delay, maximum bandwidth and minimum delay detection network, respectively, and argues that the product is able to carry on the network is the data capacity, its advantage is that can resist the random noise of network delay and packet loss fluctuations, the disadvantage is the minimum time delay measurement will reduce the throughput, the sudden deterioration of the network, It takes longer to get down to the actual bandwidth. Moreover, BBR was not originally designed for video transmission and has limited experience in real-time audio and video applications.

GCC is based on delay congestion control and packet loss congestion control at the same time, and takes the minimum of both. In the delay congestion control, kalman filter is used in GCC to smooth the influence of network fluctuation noise on delay gradient estimation. The advantage of GCC is that it can take into account delay and packet loss at the same time and has good practical experience.

Error coding: In network transmission, packet loss model can be understood as a deletion channel, and packets will be randomly deleted during transmission. Therefore, forward error correction coding (FEC) suitable for deleting channels can be used to recover packet loss by increasing the number of redundants during packet transmission. Considering error correction performance and computational complexity, linear block codes are mainly used in audio and video transmission, and xOR codes and RS codes are commonly used. Since FEC is designed mainly for random errors, this method can resist random packet loss to a certain extent in short code length (number of coded packets). However, for sudden packet loss caused by congestion or network quality deterioration, the short code length still cannot resist. In this case, the traditional method is to increase the time interval between packets and increase the code length to resist sudden packet loss.

Optimization and enhancement based on 5G air network

In the SCENARIO of 5G remote control, the delay and fluctuation of 5G air ports account for a large proportion of network delay, and the network model of 5G air ports is different from that of traditional routers to some extent. Traditional routes are congested and lose packets and do not carry retransmission. 5G air error packet loss and congestion packet loss, with a certain retransmission; The delay increase of traditional routes is mainly caused by congestion, and 5G air ports also have certain delay fluctuations due to resource scheduling cycles, especially for uplink data transmission. The bandwidth of 5G air port is related to SNR and air port load, and will change with time; The bandwidth of traditional routes is fixed and is mainly affected by network loads.

Comparison of features between router and 5G air network

Optimization of congestion control: It can be seen that 5G air network is greatly different from traditional routing, and BBR congestion control has limited applicability in the face of delay jitter caused by resource scheduling cycle and bandwidth fluctuation caused by signal quality. Considering that the signal quality in 5G air port will lead to great changes in network bandwidth, congestion control based on air-message dry noise ratio and network load estimation can be added on the basis of GCC delay control and packet loss congestion control, so as to have a faster response speed to 5G air port network changes. At the same time, the Kalman filter algorithm used for delay gradient estimation in GCC can be modified to better smooth the jitter caused by resource scheduling cycle.

Error coding optimization: Based on the characteristics of 5G air port network, it can be seen that 5G air port has a low probability of packet loss due to its own retransmission, so it can use a shorter coding length to resist random packet loss. The sudden packet loss in 5G air port is often caused by the sudden drop in the quality of 5G air port signal. The period of such deep fading is usually related to mobility. The faster the speed is, the shorter the period is, and the low speed is about 10ms. The traditional approach of simply introducing longer packet intervals and increasing the code length cannot be effectively dealt with, and will increase the amount of data sent, leading to worse packet loss. Combined with the congestion control estimation based on the dry noise ratio of empty message, the sudden packet loss can be predicted instantly, and the probability of sudden packet loss can be reduced by reducing the bit rate and prolonging the sending time without increasing the coding length. At the same time, grouping interleaving can be introduced to interweave codes to resist sudden packet loss to a certain extent.

Overall, 5 g delay requirements for audio and video remote control scene is very high, although by combining with the characteristics of 5 g, some joint optimization in sending and receiving, can meet the demand of some low speed in the industry scene of remote control, but the industry ideal index of 100 ms or there will be some challenges, especially in the remote control of cross-regional scenario. In the future, more methods of joint optimization combined with network should be introduced. In addition, more mining in camera acquisition and coding can be considered to improve the end-to-end effect as much as possible.

MAO Junling past wonderful article recommended: to liberate the distance between people and equipment, how to complete the remote control in the 5G era

“Cloud recommendation” is Tencent cloud plus community high-quality content column. Yunjianguan specially invites the top performers in the industry, focusing on the landing of cutting-edge technology and theoretical practice, to continue to interpret hot technologies in the cloud era and explore new opportunities for industry development. Click one button to subscribe and we will send you regular high-quality content.