background
With the development of technology and the change of people’s lifestyle and entertainment, video consumption has become an inaccessible part of daily life. In today’s society, users may watch videos anytime and anywhere, and their network environment will become complicated. In order to solve the problem of video streaming in complex scenarios such as insufficient network speed, weak network and frequent network jitter, the adaptive streaming media came into being. Adaptive streaming media is an idea and a technical realization, which can provide users with better playback experience under complex network conditions.
Take 🌰 for example, the following is a simple scene, through the simulation of network speed changes, to observe the change of automatic transmission definition selection. In the case of limited Internet speed, the playback sharpness will be reduced from 1080P to 360P to reduce the chance of throwdown and ensure smooth playback of videos as much as possible. When the Internet speed limit is released, the sharpness will be gradually increased to play higher-definition videos for users and improve the video viewing experience.
Adaptive streaming media transmission
QoE
Quality of Experience (QoE) refers to a user’s comprehensive subjective perception of the Quality and performance of a device, network, system, application or service. In video playback scenarios, user playback experience is measured by opening, clarity, and smoothness. The QoE indexes pursued by the adaptive bit rate strategy generally include minimizing the number of times and frequency of slowdowns, maximizing the average video playback bit rate, minimizing the frequency and amplitude of bit-rate switching, minimizing the start delay, etc.
Streaming media transmission technology
Streaming media
Streaming media refers to a technology and process that compresses a series of media data, sends data in segments on the Internet, and transmits video and audio instantly on the Internet for viewing. This technology makes the data packets can be sent like water. If you do not use this technique, you must download the entire media file before using it.
Streaming media actually refers to a new mode of media transmission, audio stream, video stream, text stream, image stream, animation stream, etc., rather than a new media. The most important technical feature of streaming media is streaming transmission, which makes data can be transmitted like flowing water. Streaming is the general term for the technology of transmitting media over a network.
Adaptive bit rate
What is Adaptive Bit-Rate (ABR)?
In order to adapt to the changes of network bandwidth of different users, prevent the user from playing the process of lag, and ensure the user watching quality, most video-related products adopt the Adaptive bit-rate (ABR) strategy based on HTTP (e.g. : B site, Youku, YouTube, etc.), different products have different adaptive bit rate (ABR) solutions. Simply put, it is to provide the user with “automatic” gear, and select the most suitable play gear for the user in real time according to the playing situation of the user.
Streaming transmission implementation
There are two main ways to implement Streaming: Real Time Streaming and Progressive Streaming.
- Real-time streaming
For: RTSP (Real Time Streaming Protocol), RTMP (Real Time Messaging Protocol)
The advantages of real-time streaming media protocol are low latency, high stability, high coding compatibility, and high security. The disadvantages of real-time streaming media protocol are that additional streaming media servers need to be deployed, and there is a risk of being intercepted by the firewall
- Progressive HTTP streaming
Based on HTTP protocol, media data is transmitted from the server to the client. Instead of waiting for all media data to be downloaded, the user only needs to obtain meta data and part of the audio and video data to play, which is commonly called “play down”.
Advantages: No additional server deployment is required. Disadvantages: Relatively high startup delay and low security
Development history of online video playback logic
Adaptive HTTP streaming media transmission
The core of the adaptive multi-bit rate scheme is to dynamically adjust the clarity (bit rate) of the requested video stream according to the information such as network status and playback status, so as to achieve a balance between smoothness, clarity and smoothness to maximize the user experience. Http-based adaptive works by splitting the entire stream into small HTTP-based files to download, a few at a time. While the media stream is playing, the client can choose to download the same resource from many different alternate sources at different rates, allowing the streaming session to adapt to different data rates.
The key word
Multi-bit rate: Based on the same video source, video fragments with different sharpness are generated. Video slices of fixed length are cut to generate multiple video fragments. Video fragment alignment: Fragments with different sharpness at the same time have the same content
The intuitive effect of adaptive is shown in the figure below:
Images from https://mp.weixin.qq.com/s/thnhhbw2ieFywCFSCHXyGQ
process
- Host live stream upload/user video source
- Codec encapsulation of the server
- Streaming media distributor
- Description file
- The media file after cutting
- The client
- The client loads and parses the description file to form a file download link
- Current network speed and supported encoding to load the corresponding video clip for playback
Comparison of adaptive protocols based on HTTP
Existing adaptive protocols are as follows. HLS protocol first appeared, proposed by Apple, mainly to solve some problems of RTMP protocol. For example, RTMP does not use standard HTTP interfaces to transmit data. Therefore, RTMP may be shielded by firewalls in special network environments. HLS is based on HTTP and is easy to transmit media streams over CDN (Content delivery Network). Later, HDS, MSS and DASH emerged, among which MPEG-DASH is an international standard. Between different protocols, the baseline idea is the same, the implementation details are different.
Apple HTTP Live Streaming (HLS)
Adobe HTTP Dynamic Streaming (HDS)
Microsoft Smooth Streaming (MSS)
MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH)
Example DASH Protocol
DASH fragment description file that describes the duration, encoding, clarity, and encryption of audio and video fragments. By analyzing the description file, the audio and video collection information can be obtained, the address rules can be obtained, and the specific audio and video address can be resolved.
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:cenc="urn:mpeg:cenc:2013" minBufferTime="PT5.00 S" type="static" mediaPresentationDuration="PT1H22M27. 000 s" profiles="urn:mpeg:dash:profile:isoff-live:2011">
<ProgramInformation moreInformationURL="http://gpac.sourceforge.net">
<Title>Merged MPD</Title>
</ProgramInformation>
<Period>
<AdaptationSet segmentAlignment="true" bitstreamSwitching="true" maxWidth="688" maxHeight="382" mimeType="video/mp4">
<Representation id="1" codecs="avc1.640015" width="572" height="320" frameRate="25" bandwidth="1371070" scanType="progressive">
<BaseURL>tos-cn-vd-0026/abd43ed9655c493ab9fa7a51ef696a92/</BaseURL>
<SegmentTemplate timescale="1000" media="1_$Number%04d$" startNumber="1" duration="5000" initialization="1_init"></SegmentTemplate>
<ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011" value="cenc" cenc:default_KID="5ef57228-d0bb-309b-c310-b6670002043b"></ContentProtection>
</Representation>
<Representation id="2" codecs="avc1.640015" width="430" height="240" frameRate="25" bandwidth="864691" scanType="progressive">
<BaseURL>tos-cn-vd-0026/8970023cc97246c8861313657dd9d028/</BaseURL>
<SegmentTemplate timescale="1000" media="1_$Number%04d$" startNumber="1" duration="5000" initialization="1_init"></SegmentTemplate>
<ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011" value="cenc" cenc:default_KID="5ef57228-d0bb-309b-c310-b6670002043b"></ContentProtection>
</Representation>
<Representation id="3" codecs="Avc1.64001 E" width="688" height="382" frameRate="25" bandwidth="1973187" scanType="progressive">
<BaseURL>tos-cn-vd-0026/23f05dc6987c46b69c92fc21636ca748/</BaseURL>
<SegmentTemplate timescale="1000" media="1_$Number%04d$" startNumber="1" duration="5000" initialization="1_init"></SegmentTemplate>
<ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011" value="cenc" cenc:default_KID="5ef57228-d0bb-309b-c310-b6670002043b"></ContentProtection>
</Representation>
</AdaptationSet>
<AdaptationSet segmentAlignment="true" mimeType="audio/mp4">
<Representation id="4" codecs="Mp4a. 40.2" bandwidth="70947">
<BaseURL>tos-cn-vd-0026/abd43ed9655c493ab9fa7a51ef696a92/</BaseURL>
<SegmentTemplate timescale="1000" media="2_$Number%04d$" startNumber="1" duration="5000" initialization="2_init"></SegmentTemplate>
<AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"></AudioChannelConfiguration>
</Representation>
<Representation id="5" codecs="Mp4a. 40.2" bandwidth="70947">
<BaseURL>tos-cn-vd-0026/8970023cc97246c8861313657dd9d028/</BaseURL>
<SegmentTemplate timescale="1000" media="2_$Number%04d$" startNumber="1" duration="5000" initialization="2_init"></SegmentTemplate>
<AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"></AudioChannelConfiguration>
</Representation>
<Representation id="6" codecs="Mp4a. 40.2" bandwidth="102853">
<BaseURL>tos-cn-vd-0026/23f05dc6987c46b69c92fc21636ca748/</BaseURL>
<SegmentTemplate timescale="1000" media="2_$Number%04d$" startNumber="1" duration="5000" initialization="2_init"></SegmentTemplate>
<AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"></AudioChannelConfiguration>
</Representation>
</AdaptationSet>
</Period>
</MPD>
Copy the code
Bit rate adaptive policy
At present, dynamic adaptive streaming media transmission technology based on HTTP has become the main technology of network video transmission. From the perspective of system, adaptive bit rate strategies can be divided into several categories:
- Adaptive bit rate strategy based on client
- Based on the server adaptive bit rate strategy
- Adaptive Bit Rate Strategy based on Network Information (considering network related information)
- Mixed mode adaptive rate policy (may be one or more of client-based, server-based, or network information based adaptive rate policies)
The specific categories are shown in the figure below:
Adaptive strategy based on client bit rate
The client-based adaptive bit rate strategy generally monitors the available bandwidth of the client, the size of the player buffer and other information to switch the appropriate video bit rate segments, in order to adapt to the fluctuation of network bandwidth, so as to improve the QoE of users.
Client-based adaptive bit rate policies can also be divided into the following categories:
- Based on bandwidth prediction, this method predicts the available bandwidth of the client as the basis for choosing the bit rate of the next video clip. However, due to the inaccuracy of bandwidth prediction, generally speaking, bandwidth-based prediction method is difficult to provide high QoE, and will lead to more frequent video player buffer jitter, so that users watch video quality is poor. There are usually three ways to calculate the available bandwidth of a client.
- Available Bandwidth based on last segment (LSB)
- Average Session Bandwidth Based (SAB)
- Based on sliding Window average (WAB)
Under voD, the description file can be used to obtain the bit rate information with different sharpness. When the predicted bandwidth is high, the video with high bit rate is selected; when the predicted bandwidth is low, the video with low bit rate is selected.
- An adaptive bitrate method based on the player buffer, in which the client uses the size of the playback buffer as the standard to select the video clip of the next specific bit rate to play.
- Hybrid mode adaptive bit rate method, in this class, by considering the available bandwidth, player buffer size, playback speed, player visual window size and other factors, to make adaptive bit rate algorithm decision.
Images from https://zhuanlan.zhihu.com/p/160774505
- Adaptive bit rate method based on machine learning, using big data, reinforcement learning, deep learning, etc., to calculate the data model, through the data model to make decisions. Compared with the above heuristic strategy of controlling the rate selection by establishing various models or rules, machine learning method can “autonomously” learn an ABR algorithm to adapt to the current network state.
Adaptive strategy based on server bit rate
Based on the server’s adaptive bit rate strategy the server’s active bit rate adjustment strategy is adopted, which can complete the adaptive bit rate without the client’s additional support. Server driven approach by the HTTP server is responsible for media content distribution rate selection decisions, according to the server send buffer size or client feedback playback buffer size, measuring network throughput of the link between the client and server side and client resolution, such as processing power, choose a particular bit rate by the service end media fragmentation. The server-driven adaptive bit rate selection method is a “thin client, fat server” method, which reduces the complexity of client software development and simplifies the process of client software update. However, this method requires the server to control the sending buffer state information of each client separately and collect the network connection state information between the server and client to determine the code rate of media fragments to be sent, which increases the processing load of the server and results in poor system scalability. At the same time, the server-side driven bit rate adaptive method needs to modify the Web server function and obtain the support of network service provider, which increases the difficulty of system deployment. Therefore, the current mainstream streaming media distribution system mainly adopts the client-driven rate adaptive selection method.
Bit rate adaptive strategy based on network information
The adaptive bit rate strategy based on network information allows the client to make bit rate selection decisions using the information related to network layer. This strategy is generally after collect some measure of network state information, notify the client to choose the appropriate bit rate, but the network information collection and processing need some special module to complete (such as network agent, etc.), the method can make full use of network information to complete the better rate selection decisions.
Mixed – mode bit rate adaptive strategy
The mixed mode adaptive bit rate policy can collect a lot of information about the network, the client, and the server to help the client make a better bit rate decision.
conclusion
Adaptive streaming media technology itself can bring a better user experience, in real situations, the product can have different focus on QoE, have this kind of scenario, for example, need to control the number of caton case, as far as possible to ensure the user to watch hd video, this in turn will drive the adaptive adjustment to meet the demand of the algorithm.