Author: Kun Jun

background

The first frame time refers to the time from the user clicks to play to the first frame of the video. “Zero first frame” is not really 0ms start to play, but the user can hardly perceive the existence of the first frame time, which corresponds to the first frame time less than 100ms in our playback quality buried point. In our player, we provide the extreme first frame optimization method in each link. When conditions allow, the first frame time can be compressed to less than 100ms. What users perceive is completely smooth playback, and there is no sense of pause on the first screen. Of course, in real business, some scenarios cannot use all the optimization conditions, such as the random playback scenario cannot be preloaded, some scenarios are not suitable for the use of player reuse technology. In combination with actual business scenarios, we can make use of the optimization capabilities provided by us as much as possible, so that most users can experience zero first frame.

Composition of the first frame

The concept of the first frame, the impact of the first frame on the video playback experience

The first frame time is an important core index of video applications and one of the core factors affecting the user’s viewing experience. For example, if a video takes several seconds to load the first frame, most users will give up before the first frame loads. Therefore, it is extremely important to optimize the first frame of video playback.The picture above is the whole process of a video click and play. It can be seen that the first frame time mainly includes the following parts: obtaining the video play link, establishing the network, downloading the video head data, audio and video decoding and rendering. Starting from the whole process of video playback, this paper introduces some general schemes of first frame optimization. At the end of this paper, the first frame optimization for the scene is introduced by taking the long video scene playing and the playback with historical progress as an example.

General first frame optimization method

Get the play address

The play address is delivered with the feed

The first step of video playback is to obtain the playback link of video resources. Generally speaking, a video resource will have a unique VIDEO ID, and there will be a service on the on-demand server to obtain the playback link according to the VIDEO ID information. If the APP server can call voD service to generate the playback address, The play address is then sent along with the feed flow, saving the client a network request.

Network built even

DNS preresolution, preconnection, connection multiplexing, HTTPS TLS 1.3 false start, session multiplexing 0RTT

After getting the player address, the player will connect to the CDN, which will be resolved by DNS first. In order to prevent DNS hijacking, most clients will use HTTPDNS to do DNS resolution, which will involve a network request time. We can adopt the strategy of DNS pre-resolution. For example, when an app is started, the server can deliver the domain names that the app may use in advance, and the client can perform DNS resolution for these domain names in advance and cache them. Connection multiplexing is supported in HTTP 1.1, so we can create a few socket connections to the CDN up front and then reuse the connection directly when playing. In addition, in order to deal with content hijacking, HTTPS is enabled in some areas of the play, and HTTPS has the TLS handshake process compared to HTTP, which introduces two more RTT to the first frame of the video. Using TLS False Start and session reuse, you can achieve 0RTT handshake.

Audio and video first package

Reduce probe and MOOV locations

After the connection between the player and CDN is completed, the player starts to download the video file. First, the player will try to detect the format, encoding and other information of the video file. If the video source is transcoded uniformly by the server, this detection process can be eliminated. At the same time, it is worth mentioning that common MP4 video files have a MOOV box, which will store audio and video stream track information such as decoding information, and audio and video frame and file corresponding relationship (seek), so usually players will download MOOV data first. The location of mooV will affect the start of the broadcast. For example, if the MOOV is at the end of the file, when you download the first part of the video and you find that the MOOV is not there, you go to the end of the file and look for the MOOV, which leads to two more network requests. To solve this problem, we can transcode the MOOV to the head of the video file, thus reducing the first frame time.

Audio video decoding

Decoder asynchronous initialization, decoder multiplexing

Usually, after the player reads the video data and gets the decoding information of the video, it can start to create the decoder to decode the video. However, the process of creating decoders, especially MediaCodec on The Android platform, is a time-consuming operation. Here I will focus on two optimizations: decoder asynchronous initialization and reuse. If the app Server passes the decoding information of the video to the player in advance, then the player can asynchronously initialize the decoder while making the connection, thus reducing the impact of hard solution creation time. And the decoder reuse can completely eliminate this time, along with this idea, we can do the reuse of the player thread or even the entire player reuse, these methods can greatly optimize the first frame time.

The seeding level

In theory, to achieve the ultimate first frame, you can play the video directly when the first frame is decoded. However, it is found in practice that this will lead to an increase in the lag of video playback, especially the lag of 1~3s after the video starts to play. After a large number of experiments, we found that if certain restrictions are placed on the start of video play, such as caching certain data before the start of video play, the lag time can be greatly reduced, and the impact on the first frame is not significant, and the user’s watching time and WATCHING VV can be significantly improved.

preload

Preloading is a common first frame optimization, where you can download a portion of the video data in advance for a quick start. But when to preload, how much preload, and how much parallel preload are all practical considerations. The first problem is the timing of preloading. For a 15s short video, it is completely possible to start the preloading after the current video is loaded, so that the preloading will not occupy the bandwidth with the current video playback. However, if a video is longer than 1min, we need to reconsider the preload timing. Specifically, we need to consider the available cache of the currently playing video, the download speed of the current network, the bit rate of the current video and the bit rate of the video to be preloaded, the number of concurrent preloads, and the number of concurrent preloads. Using this data, we can build a model to predict how the next video will lag. If there is a high probability that the lag will not occur, preloading can be enabled, and otherwise, it can be disabled or paused.

The other question is, how much preload is intuitive, at least to make sure that the first frame loads. A rough estimation method is the MOOV size plus the average bit rate of the video * the preloading time. In this way, the server can send the MOOV header size and the average bit rate of the video, and then adjust the parameter of the preloading time through the experiment on the APP, so as to adjust the preloading size.

pre-rendered

The principle is introduced

Preloading can only eliminate the network request time, but the player still needs to go through the steps of demultiplexing, decoding and rendering, which can take more than 200ms in the middle and low end machines. If the first frame of the video can be rendered in advance and not played, this part of the time will be reduced. Pre-render is the technique of rendering the first frame of the video in advance. Specifically, pre-render decodes the first frame of the video in advance and renders the first frame, but the audio is not played in the process. For example, on a small video scene, when the video card is slid, the pre-render has already started. During the card slide, the first frame of the video is likely to be loaded through the pre-render, so that when the card is slid to the center, the video is directly played, and the user basically does not feel the loading of the video.

Combined with use scenario optimization

The various methods of first frame optimization mentioned above are general strategies, and scenario-oriented optimization is also extremely important. In the next part of this article, we will introduce two scenarios of play optimization: 1. Optimizing playback of long video scenes 2. Optimizing playback with historical progress.

Long video scene optimization

Mp4 format introduction, MOOV size and duration relationship

Short videos are usually in the format of MP4. As mentioned above, THE download of MOOV is an important condition for the start of AN MP4 video. The size of MOOV is positively correlated with the length of the video, and roughly the size of MOOV is about 40KB/min. In this way, the long video moov header of 1 hour is 2.4MB. If the average network speed is 1MB/s, it takes 2.4 seconds to load, which is a poor experience for users on weak network. The video format FMP4 can solve this problem well. Fmp4 divides a complete video into several small segments, and the index of each segment exists in sidX box, so that the amount of data needed to start playing is greatly reduced, thus shortening the time of the first frame.

In addition, long videos often have front ads, and we can also pre-load the first frame of the feature during the front ads player, combined with pre-rendering.

Start with history progress

Keyframe start

There is a function in the long video is to remember the history progress to play, the usual way to achieve is to seek to the history progress in front of the latest key frame, and then the video frame into the decoder, in the decoder do frame loss processing, until the PTS to the specified history progress. Assuming that the bit rate of the video is 4Mbps and the GOP size of the video is 5s, the worst-case start of this scenario requires an additional 4 * 5=20Mb of data to be downloaded. If we restrict play to keyframe only, we can avoid this extra data download and thus significantly shorten the first frame time.

conclusion

This paper mainly introduces the corresponding optimization scheme according to each stage of the first frame, and also briefly introduces the two sharp tools for optimizing the first frame, preloading and pre-rendering. At the end of the paper, it introduces the corresponding optimization methods for the two scenes of long video and the beginning of historical progress.