voip

Ip-based audio and video transmission is a real-time video call technology, through the Internet protocol to achieve audio and video calls, as well as multimedia conference. VoIP can be used to transfer signaling, audio and video calls, short messages, and control information through various Internet access devices such as VoIP phones, smart phones, and personal computers.

background

Once mobile phones or monitoring devices are connected to the network, the loss of audio and video packets in network transmission is inevitable due to the heterogeneity of the Internet and the decreasing transmission efficiency of various media, which directly affects users’ sensory and subjective experience. In TCP, ACK feedback is used to verify packet integrity. In UDP, NACK is used to confirm and judge packet loss. RR and SR reports are used to collect RTT data. WebRTC came out of the blue, with its own JitterBuffer and NetEQ implementation, so that audio and video UDP transmission has enough guarantee.

In the RTE conference of Soundnet in 2020, I had the honor to participate in online sharing and learned a lot of content, among which I was deeply impressed by the optimization items mentioned in the chapter of real-time audio and video transmission and the optimization results of soundnet. The following is an introduction to the PPT I watched at that time, as well as the targeted optimization of relevant content after learning real-time audio and video.

Data driven

Zhang Xingong from Wang Xuan Computer Institute of Peking University introduced the chapter of Data-driven real-time Video Transmission Technology. Nowadays, with the prosperity of the Internet, real-time video is everywhere, including but not limited to video conference, live video, VR/AR, 360° panoramic video, as well as audio and video surveillance, audio and video call, etc.

However, real-time video transmission faces many challenges, including: network restriction, transmission delay between different networks, jitter, serious network switchover or 4G network packet loss, low video transmission quality, easy to lag, Mosaic, black screen, green screen and other phenomena, directly affecting user experience. Although TCP can solve part of the problem, but the network sensitivity is to be strengthened, and there will be a certain delay, not conducive to real-time transmission. The rise of WebRTC, which can solve most of the problems, including controller based on packet loss and delay, can be greatly alleviated. At the same time, the introduction of reinforcement learning further improves the problem-solving ability. Then BBR model provides a good solution for transmission, including low latency and high bandwidth. However, it is still based on RTT model, and there is no fairness reference, so the adaptability is not so strong. At the same time, BBR is based on detection, which is lagging for network detection.

Reference model

The CC that combines mathematical model with statistical model provided by Teacher Zhang’s team provides a good idea. Including the combination of mathematical model with fairness as the objective function and statistical model without model network state, as shown in the figure below:

The main objective of this model is to solve two unknowable and one lag problems, including user unknowable, network state unknowable, and network state feedback lag.

Optimization of ascension

After learning from relevant experience, we optimized our products, mainly including the following parts:

The first step is to perfect the test environment. Because most of our products are wired, and some are fiber optic, the network environment is relatively stable. Therefore, the Traffic Control command is added in Android and Linux products to simulate the network of data sending end. TC can support packet loss, network jitter, delay, bandwidth limitation and other methods, which can maximize the near real-time network environment and further improve the accuracy of laboratory simulation and test methods. To provide more sound and convenient test means for weak network optimization, and combined with the TC command, to complete the development of the test APP, can be combined with the command to set at will, to realize the test sister without development experience can also do as she likes.

The second step is to improve the weak net counter technology. Our company uses the earlier version of WebRTC, which is no longer able to compete with the latest version. However, for stability, we can only gradually optimize some functions and put them online after pressure testing, which increases the maintenance difficulty for software engineers. We learned the latest BBR model and the data-driven network model proposed by Mr. Zhang, optimized the accuracy of network detection, enabled the mechanism of FEC and NACK working at the same time, optimized and changed the judgment conditions for some processes in JTB, and improved the processing efficiency. If the TC in VGA mode is set at 20% packet loss rate, it will be stuck without pictures. If the TC is set at 30% in 720P, the experience of smooth playback can be improved. After more than a month of pressure test by the quality department, the relevant algorithm has been online, and the effect has been significantly improved, which has been appreciated by users. In the frontal PK with competitors, our company won the cooperation opportunity from customers due to the better quality of weak network video, and signed a long-term memorandum.

The third step is to adjust H264 codec parameters. Due to the different coding parameters of H264, it has a great influence on the code rate after coding. Therefore, we have adjusted and optimized part of the codec parameters, including the choice of CABAC and CAVLC (the interface provided by the previous manufacturer has it, But the original design and development leaders did not use this parameter), including the research and change of bit rate control parameters, including the introduction and optimization of IDR and INTRA-refresh parameters, including the LRT and SRT (adaptation of long and short reference frames) to be connected with manufacturers. Appropriate fine-tuning of codec parameters, without affecting the video quality and the subjective feelings of users, can control the encoded bit rate in the optimal state, for poor network environment pressure will be reduced a lot, so as to save the bit rate at the source to the greatest extent, improve the quality of coding, to ensure user experience.

Problems and Objectives

The above three parts are the optimization and improvement we are doing recently, but the relevant content is ok for packet loss and delay counter, but when jitter is very serious, it cannot be done. Due to the limited capability (cost consideration) of the WIFI module of our company, the jitter of WIFI data transmission is very severe and there is a certain packet loss rate. This hardware performance directly leads to the poor effect of the weak network countermeasures system studied by us. In addition to replacing more stable and reliable wifi modules, high jitter is also a challenge our weak network team will face next.

In the next research and development cycle, we will continue to study, study and consult the data carefully to try to deeply understand the data-driven related model proposed by Professor Zhang’s team. It tries to combine its own equipment environment and usage scenarios to form a self-developed data model for network condition detection, congestion control and network sensitive feedback system, further improve more reliable and high-quality video transmission under WiFi and 4G connection mode, and provide a strong guarantee for the promotion of the company’s products.

conclusion

The way ahead is so long without ending, yet high and low I’ll search with my will unbending. The weak network confrontation of real-time audio and video transmission is a long-term process, we will be brave to try and learn from, not to guarantee the industry leading, but to provide perfect real-time video call quality is the goal of our team: after all, tomorrow’s achievements are due to today’s not compromise.

The above is my share, welcome to discuss with you at any time, interested can click three links.