In live video, there are several key experience indicators, such as lag rate, clarity, delay time, second speed, etc. This paper introduces some optimization measures of live video in second experience.

Live and video are very different and more difficult. For example, the video can be pre-downloaded with the corresponding video list, while the live broadcast is a dynamic and real-time stream, not a static one. Therefore, there is no way to prepare and pre-download for caching. Caching is a powerful tool that can no longer be used, so how can we increase its speed in seconds?

Statistical caliber

To do index optimization, first determine the data as a support to optimize the index, then the need for the client buried point, clear statistical caliber is what, only in this way buried point data is of reference value. For time statistics, we need to make clear the start time and end time, and the difference is the second start time we need to report.

Start time: when the page is selected (in Android, corresponding to onPageSelected of ViewPager)

End time: callback of the first frame of the player;

Second start time: End time – Start time

Analysis of live link process

To improve the second open experience, first of all, we need to sort out the live link:

In simple terms, the first is the production end, that is, the anchor pushes the stream to the streaming media server, and then the streaming media server distributes the stream to the CDN everywhere. Users pull the stream consumption through the player, and the anchor end pushes the stream -> the streaming media server -> the EDGE node of THE CDN -> the terminal device.

We can roughly divide it into three parts according to different ends, namely, push flow part, flow service part and pull flow part.

It should be noted here that the second opening data we actually counted is only for the pull stream part, and the push stream does not directly affect the optimization strategy of the pull stream end, but will indirectly affect the second opening data. For example, under the same conditions, the second opening of H264 or H265 is definitely better. Therefore, in order to clarify the whole live link, Here some of the effects of the push-stream side on second on optimization and Settings are also mentioned.

Flow end optimization

IP httpDNS straight league

Player pull flow can be roughly divided into the following flows:

DNS resolution -> CDN node establishes connection -> receives GOP packet -> decodes first frame -> renders first frame

The longest one is DNS resolution relief. Therefore, to reduce the first screen time, you must reduce the DNS resolution time first.

The traditional DNS resolution process, known as localDNS solution in the industry, is very simple. APP sends a request for domain name resolution to the DNS Server of the network carrier, and the DNS Server of the carrier sends a recursive query to the GSLB system of the CDN. GSLB determines which operator and geographical location the query comes from by the IP address of the operating DNS Server, and then returns several appropriate CDN edge node IP addresses to it.

HttpDNS uses HTTP to resolve domain names instead of the existing UDP-based DNS protocol. Domain name resolution requests are directly sent to the HttpDNS server of Aliyun, bypassing carriers’ Local DNS and avoiding domain name hijacking and inaccurate scheduling problems caused by Local DNS.

Its principle is actually simpler than localDNS, that is, APP directly calls the httpsDNS API provided by CDN through HTTP protocol to obtain a set of appropriate edge node IP. Because there is one less link in the middle, this part of the time can be saved tens of milliseconds. Through the optimized IP addresses of nodes, we should pay attention to prevent a large number of users from clustering on a few nodes, resulting in load imbalance of nodes

Up – down multi – player logic transformation

Slide up is a more important way for us to switch between live broadcast, we can make the slide up seconds faster through some processing.

A closer look at the video shows that when we swipe the studio, the video screen of the next studio is already displayed, and after we release it, we will not reload the data of the current studio unless we leave the studio.

From the user’s visual point of view, the screen of the next broadcast room can be seen as soon as the user slides to the next broadcast room, which actually makes full use of the time from the beginning of the user to the finger off the screen to pull the stream.

The core logic is multiple players, one in the current broadcast room, one in the upper and the lower. When users slide the broadcast room card, they can judge whether we slide up or down according to position and the current broadcast position, and let the corresponding broadcast room Fragment to execute the start of the player.

It can be seen from the data that after this layer of optimization, the opening rate of up and down seconds has been very high, and the data of 65%+ achieved 0 s opening.

UI modules load lazily

The UI layout of the live broadcast room is more complex, so can we let the layout not all load in one time, gradually to load?

In simple terms, progressive loading is a piece-by-piece loading process where the current frame is loaded and then the next frame is loaded.

The benefit of this design is that it reduces the stress of loading the View at the same time. This way we can load the core View first and then gradually load the other views. For example, in our live broadcast room, we can load our video controls first, and then load other layout controls in turn. Observe that the implementation of Douyin live broadcasting is also gradually loaded in this way, with the video coming out first and then the interactive UI.

At present, in addition to the parent layout of the core and player controls, the Fragment in our live broadcast room is directly loaded through XML, and the rest are gradually loaded through the ViewStub mechanism.

Since the second on time is calculated after the page UI is loaded, the optimized data is not reflected in the second on the data, but it does optimize the second on time from the look and feel.

Benefits: UI layout first frame loading time increased by 90+%

Self-developed player first frame optimization

Live broadcast has high requirements for player experience. The overall framework of live broadcast is CDN connection, media file download, file parsing, audio and video decoding, audio and video rendering, and then optimize the second opening of the first frame from the overall playing link. The three links before file parsing can be optimized in the whole link. In terms of implementation strategy, it can be considered from two dimensions, which can be called buffer configuration and buffer management.

Buffer configuration, as far as possible to ensure the normal broadcast in the broadcast, as far as possible to optimize the link start.

Buffer management, try to ensure that the live broadcast can follow up to the current live point when playing.

In the process of CDN connection, the logic supporting IP direct connection is added to the self-developed player. The player can be directly connected to the CDN server through IP directly by setting a known server domain name.

In the process of downloading media files, the self-developed player adjusts the buffer of playback, and does not set additional conditions for playback, but only ensures sufficient video duration to be parsed.

At the file parsing level, the player starts playing when there is probably a small amount of data down.

Through version iteration test, the internal buffer, buffer size and frame chasing strategy of each link of the player were modified, and a relatively ideal second opening effect was obtained. At present, it maintains a level with the three-party player.

Push-end setup and optimization

Push stream protocol selection

Currently we use RTMP push stream, HTTP-FLV pull stream.

RTMP protocol is designed for streaming media and is widely used in streaming. Meanwhile, most CDN vendors support RTMP protocol.

Http-flv uses HTTP long connections similar to rTMP-like streams and is distributed by specific streaming servers, taking the best of both. And streaming protocols that can reuse existing HTTP distribution resources. Its real-time performance is equal to that of RTMP, and compared with RTMP, it saves part of the protocol interaction time, has a shorter first screen time, and has more extensible functions. As a live broadcast protocol proposed by Apple, HLS occupies an unshakable position in iOS terminal, and Android terminal also provides corresponding support. However, as real-time interactive live broadcast, this delay is unacceptable, so it is not considered.

The dynamic bit rate

Push stream end can choose different picture quality to push stream, and will dynamically set the bit rate and resolution according to the network condition, so the second on the data will be on the bit rate, resolution parameters to compare and analyze, such as to see 720P second on rate, 1080P second on rate.

Image quality video The resolution of the H. 264 transcoding Bit rate H. 265 transcoding Bit rate (30% lower than H.264)
Fluency (360P) 640 * 360 400Kbps 280Kbps
Standard Definition (480P) 854 * 480 600Kbps 420Kbps
High definition (720P) 1280 * 720 1000Kbps 700Kbps
Ultra hd (1080P) 1920 * 1080 2000Kbps 1400Kbps
2K 2560 * 1440 7000Kbps 4900Kbps
4K 3840 * 2160 8000Kbps 5600Kbps

H265

H.265 is designed to deliver higher quality Internet video over limited bandwidth, requiring half the bandwidth of H.264 to play back the same quality video. H.265/HEVC provides many different tools to reduce bit rates than H.264/AVC. In terms of encoding units, each macroblock/MB in H.264 is a fixed size of 16×16 pixels, while h.265 encoding units can be selected from

From the smallest 8×8 to the largest 64×64. Meanwhile, h.265’s in-frame prediction mode supports 33 directions (h.264 only supports 8), and provides better motion compensation processing and vector prediction methods.

Quality comparison tests show that at the same image quality, h.265 can reduce the video size by about 39-44% compared to H.264. H.265 encoded video quality is similar to or better than H.264 encoded video at 51-74% bit rate reduction, which is essentially better than expected signal-to-noise ratio (PSNR).

Similar to the dynamic bit rate, whether the pull flow is H264 or H265 is reported as a buried point parameter, and then the second open ratio of H264 and H265 at the same bit rate resolution is compared.

Streaming media server optimization

Cache the GOP

In addition to the optimization on the client side, we can also optimize from the streaming media server side.

Live stream image frames are divided into: the I frame, P frame and B frame, only the I frame is decoded independently can not rely on other frames, this means that when the player receives the I frame it can render out immediately, and receives the P frame and B frame is need to wait for depend on the frame and not immediately to complete the decoding and rendering, this period is the “black screen”.

Therefore, the server side can cache GOP (in H.264, GOP is closed, which is a sequence of image frames beginning with I frame) to ensure that the player side can obtain the I frame and immediately render the picture when accessing the live broadcast, so as to optimize the experience of loading the first screen.

This is the basic functionality of the streaming server, so there’s really nothing we need to do.

Hybrid cloud (in progress)

Introduction to hybrid cloud: Hybrid cloud combines public cloud and private cloud, and is the main mode and development direction of cloud computing in recent years. We already know the private cloud is mainly geared to the needs of business users, for security reasons, companies are more willing to put the data stored in a private cloud, but at the same time, hope can get the public cloud computing resources, in this case a hybrid cloud adopted by more and more, it will be public and private cloud to mix and match, to obtain the best effect, this kind of personalized solutions, It achieves the purpose of both saving money and safety.

Because different distributions of cloud service platform CDN is differ, such as ali cloud may hangzhou CDN nodes is more, faster, seven cows may Shanghai CDN node more (just like), fast, we can integrate several cloud services platform, to assess user is currently the fastest channel, so as to improve the speed of the user to connect. Not only is the speed improved, but also the implementation of multi-cloud, multi-active architecture and DISASTER recovery backup services. At present, we are docking platforms such as Ali Cloud, Wangsu, Kingsoft, Qiniu, Huawei and so on.

To optimize returns

After a Q governance, the overall second opening rate reaches 85%+ from about 60% :

Looking forward to

  • Quick UDP Internet Connection (QUIC) is the next generation of high-quality transport protocol based on UDP developed by Google. Since 2018, IETF has promoted QUIC as HTTP/3.0 network protocol specification. Compared with TCP, QUIC is more suitable for data transmission on weak networks and high packet loss scenarios.

  • Narrowband hd

    Under the same quality, the scene of slow motion video than violent video scene movement, the bit rate, the other at the time of high bit rate, increase rate of a video quality is less, so as long as find the right code rate, so the quality of the video lower code rate and high rate of quality is the same, so we need to video content complexity analysis, Obtain scene information, obtain the complexity of video space domain and time domain, then obtain the final complexity of video sequence, and finally determine the video coding scene, which is the significance of scene division.

References:

www.52im.net/thread-1033…

www.52im.net/thread-2087…

The text/child birth

Pay attention to the technology, do the most fashionable technology!