Short video catered to people under the fragmentation of time the spirit of entertainment needs, or the pursuit of “rapid spike” environment, now I’m a bit short video poisoning, also have no matter to frequented some short video APP, so that left out a certain headlines and easy news few basic point, this time together holy bible can read several times. Of course, this is a technical article, other psychological, sociological, and product issues are not discussed here, and we are not at that level.
Without further ado, we also launched short video related products.
In the short video experience, the playback speed is undoubtedly one of the indicators that affect the experience most, because the short video is very short, ranging from ten seconds to a few minutes. If a video of ten seconds takes three seconds to load, it must be a very bad experience. Therefore, at the beginning of the definition of the product, the broadcast speed was set at about 1 second, and most of them were within 1 second, which is known as “second broadcast” in the industry, which needs to optimize the broadcast process. Emma, we’re finally getting to the point.
Before we talk about specific optimization, let’s look at the next short video, what is the general process of playing?
Then through the disassembly process, find the modules or points that can be optimized, and finally connect the lines, and give the optimization scheme.
Pictured above, mobile player via a video url domain, through DNS service request to the IP address, through the IP address and TCP connection to the video server, and then in the HTTP protocol above connected, request to the data, finally give the player parsing audio and video decoding, according to the user sees the picture sound, A start stream has ended.
In order to better find the places that can be optimized, we disassemble the figure above and give the following figure based on the actual situation
The blue part in the figure can be optimized, but the actual situation is that Flyme only accesses the content on the client, and the content is put on the SERVER of CP. Although there is optimization space, but flyme cannot optimize it, we will also introduce the optimization space that can be operated later. This is the status quo of the industry, but wouldn’t it be better to have access to a few more content options? The gray part in the figure cannot be optimized. There is no room for optimization in the process, and this part is easily affected by network conditions. Therefore, the optimization mentioned later is based on most normal network conditions, although some logic is also applicable to extreme network conditions, but this is not the focus of our discussion. The green parts of the figure can be optimized and implemented in project reality.
Each of these projects is described below.
Domain name: Domain name resolution
Time consuming cause:
The DNS request packet is sent to the local DNS server first. If it cannot be found, it is recursively forwarded to the root DNS server. This process is time-consuming. However, the period of the general cache is very short, and it needs to be constantly requested to keep updating, so it has great uncertainty.
Solution:
The DNS protocol version is set to AF_UNSPEC in ffMPEG for compatibility. If the DNS protocol version is set to AF_UNSPEC in FFMPEG, the DNS protocol version is set to AF_UNSPEC in FFMPEG. However, in actual projects, there is no IPv6 address, so the root DNS server cannot find the IPv6 address, which wastes a lot of time. You can use AF_INET to specify an IPv4 address, which saves more than half of the time. The first request or the request after the cache expires takes about tens of milliseconds to 100 milliseconds. You can do this by monitoring the elapsed time of the getAddrInfo function.
hints.ai_family = AF_INET; getaddrinfo(hostname, portstr, &hints, &ai);
Copy the code
2. Preset or DNS domain name IP address, 100 milliseconds or a lot of time for seconds to play, the solution is the DNS in advance, and use it every time you directly use the IP address, but this scheme has the limitation, may be suitable for a specific audio and video broadcast, for short video broadcast address is variety has the certain difficulty to operate, In addition, there are CP cutting flow and replacement of access CP, so the 100 ms can only be put here for now.
Socket cache: indicates the Socket buffer
Time consuming cause:
In the socket, there is a buffer concept. The sender first writes data to the buffer, and the receiver first passes through the buffer, and then reads data from the buffer. The mobile device serves as the receiver, and the buffer setting at the receiver is too small, affecting efficiency. If the buffer on the receiving end is too large, the bandwidth will be consumed in a short time. If the bandwidth is insufficient, network transmission problems will occur and traffic will be wasted. All these will affect the timely delivery of first-screen data.
Solution:
Adjust the buffer size of the receiving end according to the actual situation, and obtain a reasonable value by calculating and testing data. It can be adjusted in FFMPEG network and TCP. This is a lower level modification. For generality, it can extend HTTP/TCP options and pass through related parameters in the AVFormat API layer through AVDictionary mechanism provided by FFMPEG. Av_dict_set (& avDictionary, “param”, “value of param”); setsockopt(fd, SOL_SOCKET, SO_RCVBUF,&len,sizeof(len));
Third, the Probe buffer
Time consuming cause:
At the player end, the relevant information of the video to be played, such as encapsulation format, resolution, audio and video coding, is not known at the beginning. A piece of data needs to be read and then detected to obtain the corresponding information, and a buffer is needed to store the detection data. If the buffer setting is too small, the information cannot be analyzed, leading to re-detection. If the buffer setting is too large, the time of receiving stream will be increased and the first screen will be affected. If the buffer setting is too small, delay will be introduced.
Solution:
Adjust the buffer according to the actual situation, calculate and test the data to get a reasonable value. Buffer limits can be passed through via probesize and max_analy_duration of the FFMPEG AVFormatContext structure. You can evaluate the probe time by looking at the avformat_find_stream_info function. AVFormatContext->probesize = n; AVFormatContext->max_analyze_duration = m*AV_TIME_BASE;
Four, Probe the list
Time consuming cause:
It is also a detection process. At the beginning, the player does not know the format of the data, but needs to obtain a score based on the supported format, and then provide corresponding formats based on this score, similar to Android sniffing. Therefore, if ffMPEG supports more formats, the detection list will be longer. The corresponding detection time is longer. For short videos, the content format of CP is basically determined, which is MP4+H264/H265(uncommon)+AAC. So many forms of probing are unnecessary.
Solution:
Remove unusable formats in ffmpeg build config and keep only needed formats, such as MP4, to minimize the probe list. You can evaluate the probe time by looking at the avformat_find_stream_info function. Disable avidisable asfdisalbe mkvand so on…
Five, the Player buffer
Time consuming cause:
For non-live players, it is common practice to design a buffer in the player for smooth playback and audio and video synchronization, especially when the network is unstable or poor, this buffer becomes particularly important. The buffer buffer is usually based on the frame number set, there are also set to 1 to 2 seconds, there are also set to 3-5 seconds, because generally such as online movie, TV series is the whole process of playing a few minutes, or even a few hours of experience, at the start of the buffer a few seconds is acceptable, but in a short video of the scene under this is unacceptable.
Solution:
Strategic optimization, to ensure the first time output of the video, the buffer mechanism moved to the first screen after playing, of course, here also need to take care of audio, to ensure the synchronization of audio and video, some trade-offs to be made. This is actually a very important part. The design of The Android NuPlayer framework is limited by these factors, so the playback speed is far from meeting the requirements. Then we made an Exoplayer, but without secondary development, exoplayer still cannot meet the requirements. It needs to be designed according to its own playback frame. We use the Normandie playback frame developed by ourselves, which has been online for nearly two years and supports multiple audio and video businesses. We will not elaborate on it here.
MP4 Size: Resolution/QP/I frame Position
Time consuming cause:
Resolution this is easy to understand. If the resolution of a video file is very high, it will have a large frame of data, and the transmission time will be longer accordingly. Therefore, choosing an appropriate resolution for recording or transcoding content is also a consideration for the load on the player end. About 720P on the mobile end is enough for short videos such as personal shows, while the resolution of short videos of content aggregation can be lower. QP refers to image quality. An image with the same resolution can have many levels of QP, which is closely related to coding. The higher THE QP, the higher the image quality, the higher the bit rate, and the longer the corresponding transmission time. Similarly, the higher the QP is not, the better. For 720P videos that do not quickly switch between different scenes, there is little difference between 3M and 5M bit rates. Choose the right QP to find a balance between picture quality and transmission.
I frame position, and refers to the video I frame at the beginning of the video file location, players in order to prevent problems, such as flower screen in general will find at the beginning of the play or the seek first the I frame decoding, general video files have 25 to 30 frames a second, obviously I frame in the first frame and in the last frame in seconds is influential.
Solution:
According to the actual situation, choose the appropriate resolution/QP in the product service chain. Put frame I at the beginning of the file.
MP4 MOOV Box Position & Http Re-Connection
Time consuming cause:
If an HTTP Re-connection occurs during the playback process, the time consuming of the HTTP connection will definitely increase, and the time consuming of the HTTP connection is basically not optimized. Therefore, the occurrence of the HTTP Re-connection must be avoided. However, as the mainstream format of short video encapsulation, the position of THE MOOV box in the file directly affects whether HTTP re-connection will occur. To put it directly, the MOOV box at the end of the file, namely MDAT box, will generate HTTP-Reconnection. Introduce a delay.
Solution:
Put MOOV box in front when uploading MP4 files. 2. After uploading MP4 files to the server, repackage and put MOOV box in front
If you are a traditional audio and video engineer, you should be familiar with MOOV Box. Here is a brief introduction. Mp4 is made up of many boxes, including MOOV boxes, which store important information such as audio and video encoding before playing. Therefore, MOOV box is placed in the front of the file and can be directly read, analyzed and continued to read audio and video data playing through an HTTP request. MOOV box is placed at the end of the file, but the player does not know that it is placed at the end of the file. The player needs to initiate an HTTP request to read the initial part of the data, and when it finds that there is no MOOV box, it will seek to the back of the file, read the MOOV box, and then seek to the front of the file after reading it. Read audio and video frame data from the start position, each time seek occurs, re-launch an HTTP connetion request. Therefore, the MOOV box was placed in the back two more HTTP Connections than in the front, and it took three times longer than the front scheme. The following are two short videos: one moov box is placed in the front and the other in the back. If the moov box is placed in the back, you can see through the log of NmdPlayer that re-connection will occur, requesting the last data of the file to obtain the Moov box.
Eight, Server/CDN
Time consuming cause:
CDN node deployment, routing policy cache and pull flow all affect the delay
Solution:
Server for related optimization.
Nine, TCP connection
Time consuming cause:
Protocol time, such as the TCP handshake mechanism, is fixed on a stable network, and long on a poor network
Solution:
The deployment of the CDN backbone can ameliorate this situation.
Ten, the Http connection
Time consuming cause:
Protocol time: The time is fixed on a stable network and long on a poor network
Solution:
The deployment of the CDN backbone can ameliorate this situation.
For technical students, there is nothing profound, what we need to do is to grasp the core of the problem, break down a big problem into small problems, and then complete the small goals, and finally come into being. In fact, many things are like this, you take it as a technical problem, break it down into small problems, complete a small goal, such as first earn it 100 million.
We also achieved a small goal by optimizing the start speed of most short videos to fall within 1 second, reaching the level of “second play” in the industry.
One more thing… The pursuit of the ultimate experience has always been our goal, and the ultimate experience of each function and each detail adds up to the ultimate product. 1. Based on the above analysis, we believe that there is at least 100-200ms room for optimization. 2. In addition, our next plan is to use big data to monitor the situation that does not fall within 1 second, and promote the optimization of such situation according to the data analysis report. That’s our next little goal.
I recorded a video a long time ago, playing a short video with a resolution of 960×720 online, which is a relatively high resolution. The personal shows represented by Kuaishou and Douyin are about the same resolution, while Douyin may be lower. Content aggregators like watermelon have a much lower resolution, mostly 640×360 or 480P.
S70901-18153290http://v.youku.com/v_show/id_XMzU0MTg3MzE0NA==.html
Before optimization (Undecoupled Framework Android framework)
slowhttp://v.youku.com/v_show/id_XMzU0MTg3NTU5Ng==.html
Optimized (decoupled frame Normandie + Short video customization optimized)
fasthttp://v.youku.com/v_show/id_XMzU0MTg3NTE0NA==.html
Original author: Walker. Xu, the original link: https://segmentfault.com/a/1190000014405913