Original team to share this article from taobao live audio and video algorithm, the original “5 g era | taobao live high quality low latency technology exploration”, there are changes.
1, the introduction
At present, the application of 5G technology is gradually advancing, compared to the current widely used 4G, it has higher speed, larger capacity, while lower latency, higher reliability.
In the 5G era, thanks to the improvement of network bandwidth, video will become the mainstream media in the future. More and more businesses and applications will be video, live. A large amount of interactive content will be transmitted via 5G in the form of video with low latency. 5G will place higher and higher demands on video resolution and clarity.
As a short video and live broadcast platform with hundreds of millions of users, Taobao has diversified services, wide distribution of users at both ends, and complex equipment and network conditions, which bring great challenges to the storage and distribution of multimedia content.
This article is shared by the audio and video algorithm team of Taobao Live. It gives an in-depth summary of the realization of high-definition and low-latency real-time video live broadcast technology. I hope to share it with you.
2. Introductory articles
If you don’t know much about live video technology, read the following introductory article:
Detailed Explanation of Real-time Audio and Video Live Broadcasting technology on Mobile Terminals (I) : The Beginning
A Detailed explanation of Real-time Audio and Video Live broadcasting technology on Mobile Terminals (ii) : Collection
Detailed Explanation of Real-time Audio and Video Live Broadcasting technology on Mobile Terminal (3) : Processing
Detailed Explanation of Real-time Audio and Video Live Broadcasting technology on Mobile Terminals (IV) : Coding and Encapsulation
Detailed Explanation of Real-time Audio and Video Live broadcast technology on Mobile Terminal (5) : Push stream and Transmission
Detailed Explanation of Real-time Audio and Video Live broadcasting technology on Mobile Terminal (VI) : Delay Optimization
3. Overview of the text
In the face of the high demand for real-time live video technology, the main problems are as follows:
- 1) We need to control quality and cost during content production;
- 2) Need to ensure user experience throughout content distribution and consumption.
To solve this problem, we have two optimization goals:
- 1) One is to reduce the bit rate on the premise of constant picture quality;
- 2) The second is to improve the picture quality under the premise of constant bit rate.
In terms of bit rate reduction, we effectively reduce the video bit rate bandwidth by the following means:
- 1) Self-developed high efficiency encoder;
- 2) Upgrade the play architecture;
- 3) Adding intelligent ROI;
- 4) Scene coding;
- 5) Intelligent code control tools.
Among these technologies:
- 1) Efficient encoders can significantly reduce the bit rate without changing the quality;
- 2) Scene encoding can be configured according to different screen content appropriate encoding parameters;
- 3) ROI picks out the areas that the human eye pays more attention to in the picture and gives them to the coder for key coding;
- 4) Intelligent code control according to the subjective characteristics of the human eye, eliminate the waste of code words because of exceeding the threshold of the human eye.
In terms of picture quality, we use the following algorithms to improve the look and feel quality of the produced content:
- 1) Enhancement of pretreatment;
- 2) Denoising;
- 3) High dynamic range and so on.
** In terms of experience optimization: ** Through the low-delay coding technology, the coding delay is reduced while the bit rate loss is very small, and the experience of viewers and anchors is increased.
To improve the efficiency of problem discovery and troubleshooting, the system has the capabilities of data collection, storage, exception event collection, intelligent alarm, alarm data operation, coding diagnosis platform, automatic fault handling, and change linkage. We have set up a full link monitoring system based on Taobao Live to solve the existing problems of the full link of Taobao Live and possible problems in the future from the three aspects of audio, video and network. To optimize the whole set of high quality low delay system.
At the same time, we established the objective quality and subjective quality evaluation system, using vMAF, PSNR, SSIM as a series of indicators as the objective quality evaluation. For massive passive scenarios, we also established a passive evaluation model based on CNN to ensure the accuracy of quality evaluation in passive scenarios. These effective evaluation methods can ensure the “picture quality unchanged” and monitor the online video quality.
The following sections provide an in-depth sharing of the key technical practices described above.
4. Narrowband HD practice
4.1 Self-developed S265 encoder
Bandwidth cost is a very heavy infrastructure cost in video service. How to reduce the cost under the premise of ensuring video quality is a crucial link in the whole link.
The data bandwidth of digital video signals collected by cameras is usually very high: 720p 25fps, for example, is 263.67Mbps, which is difficult to store and transmit.
Fortunately, inside a video image, there is a very high correlation between frames, and by using video compression technology to remove the correlation, the bandwidth can be reduced to 100-400 times the original. (For video coding technology, here is a more popular article: “Zero Foundation, The Most Popular Video Coding Technology in History”, recommended to get started.)
Video compression standards are mainly MPEG series developed by ISO (International Standards Organization) and H.26X series led by ITU (International Telecommunication Union). Every 10 years, the compression rate brought by the upgrade of video compression standards will be doubled.
As a newer generation of video compression standard than H264, H265 provides a more flexible coding structure and partitioning mode, and has carried out a lot of improvements and optimization in motion compensation, motion vector prediction, intra frame prediction, transformation, block filtering, entropy coding and other aspects. Thanks to these new coding tools and featured technologies, With the same picture quality, it can save half bit rate than H.264. In order to save bit rate without sacrificing picture quality, H265 has become our preferred coding standard.
Ali265 is taobao self-developed high-performance H.265 encoder, compared with the industry open source X265 can achieve BDrate20 percent of the gain, compared with X264 has more than 40 percent of the gain. At present, it has been used online in Taobao Live, Youku video, Arirang conference, VMate, UC cloud disk and other services.
The technical team of Taobao Live together with Aliyuntuan developed s265 encoder. Compared with the commonly used open source software X265 in the industry, 1PASS single pass encoding under the same PSNR index:
- 1) Veryslow speed grade has 28% bit rate savings;
- 2) Medium speed grade has 36% bit rate savings;
- 3) The bit rate of CRF mode is close to that of ABR mode.
S265 coding quality is optimized from two aspects of rate control and coding tool, and speed optimization algorithm is introduced from two aspects of fast algorithm and engineering algorithm. These are shared in detail in the next section.
4.2 Main optimization methods of S265 encoder
4.2.1) Bit rate control:
In order to further improve the compression quality, the main direction of the coder algorithm optimization is to find a strategy to select the optimal coding method and coding parameters, so as to obtain a better bit rate savings under the condition that the coder framework standard is certain.
Reasonable allocation of bit rate is an important work of the encoder. The goal of bit rate control is to allocate the code words to more valuable places, so as to minimize the coding distortion at the target bit rate, or to minimize the bit rate under the premise of fixed distortion.
Code control needs to solve two classic problems:
- 1) First, the frame-level code control and block-level code control allocate the number of code words in each GOP, frame and coding block according to the target code rate;
- 2) The second is to allocate these code words to each coding block in the most reasonable way when coding inside the block.
In the frame-level code control, the traditional method calculates the long-term complexity of all encoded frames, and calculates QP according to the ratio between the long-term complexity and the current bit rate.
As a result, QP becomes less and less sensitive to frame complexity, resulting in degraded coding quality or excess bit rates. Especially when calculating the first frame QP, the previous algorithm uses an empirical value which is only related to the current bit rate. Based on the Cutree theory, we accurately estimate the bit rate ratio of IPB frames in the pre-analysis length and the expected coding size, so as to obtain more accurate quantization coefficient before coding.
The block-level code control allocation is affected by time domain CUtree and space domain AQ. In the time domain, the importance of IBP frames is obviously different. The blocks referenced by subsequent frames not only affect their own quality, but also affect the quality of subsequent frames. Therefore, more referenced blocks need to be coded with high quality.
According to the intra – and inter-frame prediction costs, the cuTree algorithm calculates the proportion of information transfer, calculates the impact of the current block on the subsequent sequence, and then adjusts the QP offset. But considering the noise energy, in different exercise intensity, the intensity of texture edge, and the coding parameters, different reference block adjustment for subsequent frames of saving ratio is different, so the s265, through the method of parameter training, obtain the influence of multiple factors on the transmission efficiency, get a more accurate information than pass, thus more reasonable allocation rate in time domain.
▲ Cutree transfer process
On the other hand, the importance of each block in the airspace is also different.
The human eye is the ultimate observer of video, embarks from the human visual system, different block in the human eye visual redundancy is not the same, such as the human eye visual masking effect, its texture and strong near the edge is insensitive to noise, assigning rate more sensitive to human eyes, plain area can get better subjective quality.
In the encoder, we calculate the block variance energy and edge energy as the block cost, study the relationship between different block energy and the degree of human eye perception, estimate the impact of the bit rate partition between blocks on human eye attention, reasonably allocate the bit rate to more important texture blocks, and improve the efficiency of video perception coding.
4.2.2) Coding tools:
In the coding tools, S265 improves the traditional algorithms of scene switching detection, frame type decision, SAO, DEBLOCK, double encoding, RDOQ and implements a number of coding tools.
** For example: ** In the reference frame module, there are more tools to improve the reference efficiency.
** First of all: ** Frame types such as long-term reference frame and generalized B frame can improve the prediction quality. Long-term reference frame can effectively reduce the loss caused by the transmission of information through multiple frames for the live broadcast scene with little background change. The average EV can be increased by about 0.25dB by citing long-term reference frame. The traditional P frame is changed to generalized B frame, and the bidirectional prediction is used to replace the unidirectional prediction to reduce the noise, light change, sampling error and other residual sources of prediction.
After expanding the frame type, we make IBP frame type decisions based on the reference strength.
** Then: ** Within Minigop, we use the reference relationship of the pyramid structure to get a shorter reference distance than the traditional structure.
** Finally: ** In the management and selection of reference frames, we take into account the difference between stationary and moving blocks. Stationary blocks tend to refer to high quality frames, while moving blocks tend to refer to late frames, so we can get better reference quality by filtering out these two types of reference frames for the scene.
4.2.3) Speed optimization:
HEVC encoders have brought about the improvement of coding efficiency, but many new coding tools have the problem of high computational complexity.
** Therefore: ** optimize the speed of the encoder, which can open more encoding tools and search more coding mode space on high-end machines. Further improve the coding quality, in the low-end machine can reduce CPU hot and coding lag phenomenon.
HEVC’s ability to partition blocks from 64×64 to 4×4, and the proliferation of block types and modes, with several times the number of encoding modes available for H264, has made block partitioning and mode decisions an important bottleneck.
Therefore: ** In RDO, reducing the search times of CU hierarchy and selecting some necessary hierarchies is an important means to reduce the amount of calculation.
** First of all: ** Some prior information can be obtained from the reference block by using the temporal and spatial correlation, and then combined with the motion information and texture information of the block, the maximum estimation level and minimum estimation level of the current block CU level can be predicted through analysis.
** Second: ** The strategy of jumping out in advance in the decision-making process can also greatly reduce the computational amount. We jump out of the current mode in advance for traversal according to the flatness of the image texture or rdcost comparison under various modes. In some nonlinear scenes, we use CNN deep learning model to assist the decision mode.
** Inside the decision module: ** There are also a lot of complex calculations.
There are 35 modes in intra-frame prediction. We can estimate the most likely position of the best mode after calculating the simplest modes through Bayesian theory, so as to double the speed of the intra-frame mode screening process and control the loss within 0.01dB.
In addition, the motion search of inter-frame prediction is a process of finding the best matching block from the reference frame, and its sub-pixel search needs to do 7 or 8 tap interpolation filtering, which requires a lot of calculation. Therefore, we can use the integral pixel information to establish the binary quadratic error plane equation and estimate the position of the best sub-pixel point, avoiding the complete search process of sub-pixel.
Rdcost is usually used as the cost of the mode when evaluating the advantages and disadvantages of the mode, which needs to calculate the number of encoding bits and encoding distortion.
It is necessary to calculate the length of the code stream by entropy coding and to transform the coding coefficient back to the time domain for distortion.
In order to reduce the computation amount of RDcost, we adopted the linear estimation algorithm of distortion and rate, including two parts:
- 1) First, the quantization error energy is calculated in the frequency domain. The energy invariance of IDCT transform is used to calculate the square sum of quantization remainder to estimate the distortion.
- 2) The second is to establish the linear relationship between the characteristic information of coding coefficient and the size of code stream, and directly estimate the size of entropy coding from the characteristic information of coefficient.
By this method, we can jump the entropy coding process of pattern cost calculation and inverse transformation, inverse quantization, reconstruction, SSE and other processes. Saves a lot of calculation.
In addition to RDO, we also improved sliceType decision algorithm, dynamic Lagrange factor adjustment algorithm, fast Deblock and SAO decision.
In terms of engineering optimization, we also added several optimizations:
- 1) C function optimization, through optimizing process logic, splitting special paths, merging branches, looking up tables, circular optimization, rDOQ module, coefficient analysis, Deblock and other modules to nearly double the improvement;
- 2) For computation-intensive functions, we simD and optimize the execution speed of assembly code.
S265 is optimized at both the fast algorithm and engineering levels, and we bring significant performance improvements to HEVC coding. This enables real-time coding at 720P of 30 frames per second on low-end iphones.
4.3 Intelligent code control
Intelligent code control is taobao self – developed code rate control algorithm.
Ordinary ABR or CBR bit rate control in order to pursue the target bit rate, a large amount of bit rate is wasted in low complexity scenarios. According to the subjective quality model of human eyes, when PSNR is above a certain threshold, improving the quality cannot be detected by human eyes and will only consume too many code words.
We use machine learning method to estimate the quantization coefficient of the frame above the quality threshold based on the 17 kinds of historical coding information and the complexity of the frame to be encoded, and limit it below the ABR target code rate to ensure that each frame can be encoded at the most appropriate code rate.
After the online verification of Taobao Live streaming, it can reach 15% of the bandwidth saving, and the usage in Nail live broadcasting can save 52% of the bandwidth and reduce 62% of the push flow side jam.
4.4 Scenario Coding
Due to the variety of Taobao live streaming, the texture, lighting, background and degree of movement of various scenes are different.
Such as:
- 1) Outdoor anchors often walk around, and the frequency of frame changes is high;
- 2) Most beauty anchors sit indoors, and the light is generally brighter;
- 3) Jewelry anchors mainly shoot objects, and the pictures are mostly stationary.
In the face of a variety of live broadcast scenarios, a single encoder configuration cannot meet the current needs of Taobao Live. Opening or closing some coding tools has inconsistent effects on video coding results. How to select the best parameters for content has become the research direction of the industry.
Under this requirement, we propose the encoding parameter configuration policies based on different scenarios.
** First of all: ** We use multiple deep learning and machine learning models to conduct data training classification of tens of thousands of live videos of various content.
Contains two large feature dimensions, which are:
- 1) Semantic features;
- 2) Signal characteristics.
Semantic features include:
- 1) Anchor classification;
- 2) Commodity characteristics;
- 3) Environmental characteristics;
- 4) Sound characteristics;
- 5) RoI in time domain.
Signal characteristics include:
- 1) Motion characteristics;
- 2) Texture features;
- 3) Noise characteristics;
- 4) Luminance characteristics.
For different video sets with different features, we use a large scale server set to search for the best coding parameters, so as to find the best coding parameter combination suitable for the current video coding automatically and efficiently, and reduce the bit rate consumption as much as possible while improving the picture quality. Finally, according to the coding parameter set, the cluster is divided into multiple parameter configuration items.
When the anchor needs to push the stream, the standard coding parameters are first configured to push the stream. After collecting certain data, we will feed the semantic features and signal features of the video into the adaptive decision engine, classify the video through the deep neural network inside, and decide the encoding parameter configuration that should be sent to the current video. Then we will send the new parameter configuration back to the encoder for new push flow. This optimization enables the anchor to obtain the best quality video coding under the current situation.
Through this method, we obtained 7-10% BDrate revenue in Taobao Live, and 40% BDrate revenue in the Tao shooting scene.
4.5 Low-delay coding
In live broadcasting, low latency means high efficiency and a good experience.
Consider the following scenarios:
- 1) Scenario 1: After the anchor shows the next product, it takes 10 seconds to receive the question of the last product;
- 2) Scene 2: In the live broadcast of the class, the teacher fails to get feedback from the students after asking questions, which wastes part of the time.
These scenes bring bad experience to users, which makes it inefficient to live sell goods and live class.
When 5G is popularized, it will bring lower latency and better experience, but at the moment, IT is still 4G, so it is necessary to reduce latency.
The end-to-end delay is mainly distributed in:
- 1) Collection;
- 2) Coding;
- 3) Transmission;
- 4) transcoding;
- 5) distribution;
- 6) Play.
This section mainly optimizes coding latency.
Coding delay is divided into:
- 1) Delay caused by multithreading;
- 2) Cache frame delay;
- 3) Delay caused by B frames, etc.
The largest part of the coding delay comes from the encoder cache. By analyzing the cached images before coding, the coding efficiency can be greatly increased. If you crudely reduce the encoder cache, you can achieve lower latency, but the quality loss is higher.
So the idea was, can you simulate the effect of a longer cache with a smaller cache?
By analyzing the principle of CuTree, combining with the statistical relationship between lookahead length and passing cost, we can find a strong linear relationship between cache length and passing cost.
As shown in the figure below:
Different variants of the prediction model can be used according to the scene, and the effect of simulating a longer lookahead with a shorter lookahead is finally realized. In the test, the optimized lookaHead4 can save 13.5% bit rate compared with the optimized one in the live feed, effectively reducing the coding delay.
The results are as follows:
At the same time, in the previous test, it was found that the optimization was insensitive to the scene, and the scene with simple motion and scene with complex motion was equally effective.
Over the past year, we used the above optimization to reduce the bit rate of 265 bit stream from 1.4m to 800K without changing the picture quality.
4.6 Image quality enhancement
In the scene of Taobao Live, the main anchor has its own professional equipment and team, and the video and audio produced by the live broadcast are of relatively high quality. But for small and medium-sized anchors, users’ behavior is not controllable.
Therefore, the result is that many small and medium-sized anchors produce low quality videos and harvest a small number of viewers.
In view of this situation, we selected several cases with the most serious user habits, and increased the picture quality of these anchors, which significantly improved the live broadcast experience of users.
Here are some of the existing effects.
4.6.1) Shaking:
▲ Dejitter effect (original video link here to view)
Modern encoders can better deal with flat texture and translational motion. The former eliminates spatial correlation by intra-frame prediction, while the latter eliminates temporal correlation between frames by motion search.
But in the process of video capture, because of the video frame jitter caused by camera jitter, the encoder can not deal with it well.
Since the jitter is usually severe for small and medium-sized anchors and the equipment they carry is relatively old, we consider to improve the video frame from the collection source, and finally here we use the camera path smoothing algorithm to remove the jitter in the video frame.
4.6.2) Denoising:
When the lighting of live video is not ideal, the images collected by the camera will produce obvious floaters noise and Gaussian white noise, which will seriously affect the users’ perception of the video content. In this case, it is necessary to reduce the noise of the video.
For many excellent cloud denoising algorithms, deep learning is not suitable for mobile terminals.
Although there are a lot of deep learning frameworks for mobile terminals, they are not well matched with machines after all. In fact, many mid – and low-end mobile phones cannot run this generation model.
Based on this, we mainly consider efficiency in mobile terminal, so we adopt wiener filtering based time-domain noise reduction algorithm to achieve, training and optimization.
4.6.3) Overscore:
For some small micro anchors, recording and broadcasting devices can only support 360p, and the final video seen by the audience will be enlarged to 720p by interpolation and other traditional methods. In this way, the obtained video frame will inevitably produce fuzzy effect, which will affect the viewing sense of live broadcast.
Thanks to the optimization of deep learning on the mobile end, we have realized real-time oversegmentation of video frames on the mobile end on some high-end phones.
Among the numerous network architectures, we finally chose the best performance FSRCN scheme. The network architecture diagram is shown below.
** In the training process: ** We selected 1W+ high definition large picture of taobao categories, combined with the industry’s high definition open source data set, and used sample enhancement technology to train about 5000 rounds of models to achieve convergence effect.
** In addition: ** In order to eliminate the boundary effect caused by image segmentation, we have done the operation of image overlapping and merging, which has brought better supersegmentation effect while increasing part of the calculation time.
In order to run in real time on the mobile terminal: to avoid occupying too many resources, we optimized deconvolution calculation, and carried out superdivision of strong texture and some pixels in the static area according to human vision characteristics, so as to greatly improve the efficiency of the mobile terminal.
5, low latency transmission practice
5.1 Low latency Player
5.1.1) Delay analysis of regular players:
At present, there are two protocols, HLS and RTMP/HTTP-FLV, which are used for live transmission based on TCP.
Among them, the delay of HLS live broadcast is generally more than 10 seconds, and that of HTTP-FLV live broadcast is generally between 6 and 9 seconds. From the whole live link of push stream, CDN distribution and playback, the big delay comes from the playback end.
In the player, almost every thread has its own buffer. These buffers are used to smooth the jitter of the entire playback link, and their size determines the playback delay and playback smoothness during playback.
The VideoBuffer and AudioBuffer are used to store the audio and video packet to be decoded. This buffer is to smooth the jitter of the network. The jitter of push stream, CDN transmission and playback download will be accumulated to the playback end, which is a generation point of the maximum delay of the conventional player. The buffer delay is usually more than 5 seconds.
TCP based media transmission is not suitable for low-delay live broadcast scenarios for the following reasons:
- 1) Slow retransmission: TCP pursues complete reliability and sequence. After packet loss, TCP will continue to retransmit until the packet is confirmed. Otherwise, subsequent packets will not be received by the upper layer.
- 2) The upper layer cannot be optimized for: TCP congestion control and Qos policy are implemented in the kernel layer of the operating system;
- 3) Inaccurate congestion judgment: The congestion control based on packet loss is inconsistent with the actual network condition. Packet loss is not equal to congestion, which will also cause bufferbloat of sending link and increase RTT of link.
Our low-latency transmission SDK is built based on WebRTC, using several core modules of WebRTC, including RTP/RTCP, FEC, NACK, NetEQ, JitterBuffer, audio and video synchronization, congestion control, etc.
NetEQ and JitterBuffer are network jitter caches for audio and video, respectively, and this is one of the points where the transmission SDK has the highest latency.
RTP over UDP can better fight against the packet loss of the public network. In combination with adaptive caching and Qos optimization, the buffer delay of our JitterBuffer can be controlled below 700ms and the delay of watching the live broadcast is about 1 second under the condition of ensuring the overall fluency of the live broadcast.
5.1.2) Player’s adaptation to low-delay transmission SDK:
We encapsulated the extension demuxer of FFmpeg for the low-delay transmission module, registered the demuxer that supports the low-delay transmission protocol to FFmpeg, and the player opened the network connection to read data through FFmpeg. This access scheme basically does not affect the original logic of the player and has little change to the player.
The main changes are as follows:
1) Buffer size control:
When the low-delay transport protocol is used to pull the stream, the JitterBuffer of the network is the JitterBuffer of the underlying transport module. The buffer of the JitterBuffer of the player layer should be set to 0 seconds, otherwise excessive delay will be introduced.
2) Catton statistical modification:
Generally, the player determines the lag event according to the buffer water level. When the buffer is empty or remains empty for a period of time, the playback screen will lag and the lag event will be triggered. After the JitterBuffer of the player is taken over by the LOW-delay transmission SDK, the lag event should also be triggered by the low-delay transmission SDK.
3) Audio decoding process:
The audio obtained from NetEQ is already PCM data, and the audio data read by the player can be directly rendered. If the audio is used hard solution, there may be decoding compatibility problems, the phenomenon is that the sound can not be heard, but using FFmpeg soft solution can also be compatible.
5.2 Low-Latency Servers
Low latency transmission is a comprehensive problem, to start from the whole, not only from the design considerations, but also the need for client, server, data system close coordination.
RTP/RTCP scheme is adopted in the transport protocol design. Based on UDP semi-reliable transmission, mature technology, more suitable for audio and video scene. The difficulty is to reduce both the lag and the delay.
The overall algorithm strategy we use is as follows:
- A) Congestion control: The congestion control GCC&BBR algorithm is deeply optimized for the live scene, and takes into account both second start and delay.
- B) Layered frame loss: The SVC algorithm and GOP loss strategy based on B frames ensure that the bit rate is quickly reduced when the network is congested to solve the congestion.
- C) Retransmission control: Retransmission control should not only suppress retransmission storms, but also ensure fast retransmission.
- D) Smooth transmission optimization: Smooth transmission policies prevent network bursts and smooth traffic. At the same time for the second off scene depth customization. Re-designing the sending mechanism and algorithm greatly improves the sending performance.
- E) Second-start optimization: a variety of second-start strategies with the cooperation of the server and the end to ensure the fast broadcasting. Taobao live market average seconds open rate of more than 94%.
- F) Signaling optimization: RTCP APP proprietary protocol is adopted for signaling design, and a socket connection is used for audio and video transmission. The association protocol is more streamlined, ensuring quick media data delivery by 1RTT.
In addition, a large number of improvements and optimizations have been made from strategy to algorithm, which are driven by data and continuously optimized iteratively for scenarios.
5.3 E2E Segment Statistics
The end-to-end delay piecewise statistical system designed by us can count both the total delay of a single play and the delay of each stage.
It does not rely on NTP time, and is suitable for very large-scale networks.
Through the analysis of different platforms push stream end, server, player each stage of the delay, the market display, can be optimized for the special.
Looking to the future
With the acceleration of 5G network, the delay from the anchor side to the user side will be shorter and shorter.
The performance of the mobile terminal itself will improve, various picture quality enhancement, image rendering technology will gradually become hardware.
Deep learning models on mobile are also becoming lightweight, enabling the engineering of more and more advanced innovations in academia.