background

MP4 is our common video format, we often play server video directly when the REQUEST MP4 video source. But in fact, this is not good, MP4 header file [FTYP + MOOv] is large, the initial playback needs to download the complete header file and parse, and then download a certain length of playable video clips to play. In addition, as the size of the video increases, the header file gets bigger and the initial playback time gets longer. This requires a way to speed up initial video parsing, and HLS is apple’s solution to this problem.

HLS

HLS is short for HTTP Live Streaming. It is an HTTP-based network transmission protocol proposed by Apple. It supports both Live broadcast and on-demand, as well as multi-definition, audio and video dual-track, and subtitle functions. Its principle is to divide a whole video into several small videos, and the whole play is made up of these fragments.

HLS is widely used on mobile terminals. Currently, HLS supports the following clients:

  • IOS 3.0 and above, AVPlayer native support for HLS
  • Android 3.0 and later
  • Adobe Flash Player 11.0 and later

It works like this:

1. Collect audio and video

2. Encode audio and video in the server

3. After coding, it is delivered to a Stream Segmenter in the form of MPEG-2 transmission Stream.

4. Slicer creates index files and TS playlists. Index files are used to indicate audio and video positions, and TS is the real multimedia fragments

5. Place the previous resource on the HTTP server

6. The client requests the index file to play, and can find the play content through the index file

HTTP Live Streaming Document

M3U8

A key step in implementing HLS is step 4 above, which is the organization of index files and TS playlists. This is the M3U8 format. M3U8 is the Unicode version of M3U, 8 for UTF-8 encoding, and M3U and M3U8 are both multimedia list file formats.

Let’s use a WWDC video as an example to see what the M3U8 format looks like. Not all of the M3U8 format fields are shown below, but some of the commonly used fields are included, and are sufficient to help us understand the M3U8 format.

The playing page is:Developer.apple.com/videos/play…, we can get the M3U8 file in the video playing process by capturing the packet by Charles.

Before analyzing this path format, we need to know that M3U8 has two formats. One is to exist as the Master Playlist, which contains some instructions and paths of audio and video, subtitles, and the path indicated by the Master Playlist is another M3U8 file, that is, another format, which exists as a player. It also has paths inside, which indicate fragment (TS) files, which are real multimedia content.

Hls_vod_mvp. m3u8 is the main list file, and 0640. M3u8 is the video list file.

M3U8 Format description

Sometimes we do testing, or some special cases we may need to manually modify the contents of the M3U8 file, so we need to have some understanding of its format. The definition of this format is in RFC 8216. Here are some considerations:

  • M3U8 files must be encoded in UTF-8, cannot use Byte Order Mark (BOM) Byte Order, and cannot contain UTF-8 control characters (U+0000 ~ U_001F and U+007F ~ U+ 009F).
  • Each line of the M3U8 file content is either an empty line, a URI, or a#A string that starts with no whitespace characters.
  • Built-in tags are all#EXTThe character string is case sensitive.
  • A URI is a content path, which can be relative or absolute

Master M3U8 list file

Main M3U8 index file, typically used to specify multiple index sources. Let’s first analyze the contents of the main M3U8 file hls_VOD_MVp.m3U8. Its header looks like this

The head format

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-INDEPENDENT-SEGMENTS
Copy the code

#EXTM3U indicates that the file is in M3U format, and all M3U files should put this content in the first line.

# ext-x-version Indicates the compatible VERSIOn of the playlist, currently 7.

# ext-X-independent-segments This tag indicates that all media samples in a media segment can be decoded independently without relying on other media segment information.

Subtitle format

Further down are some captions. Captions are not required.

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="eng",URI="subt itles/eng/prog_index.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subsC",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="eng",URI="sub titles/engc/prog_index.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Japanese",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="jpn",URI="sub titles/jpn/prog_index.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subsC",NAME="Japanese",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="jpn",URI="su btitles/jpnc/prog_index.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Chinese",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="zho",URI="subt itles/zho/prog_index.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subsC",NAME="Chinese",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="zho",URI="sub titles/zhoc/prog_index.m3u8"Copy the code

# ext-x-media Specifies a multilingual MEDIA list resource for the same content.

TYPE is the resource TYPE. The options are AUDIO, VIDEO, SUBTITLES, and closed-captions.

The above content is set to TYPE=SUBTITLES, that is, SUBTITLES.

Group-id Specifies the GROUP to which multi-language translation belongs. This parameter is mandatory

NAME is the readable description of the translation flow, which corresponds to the displayName of AVMediaSelectionOption.

DEFAULT, AUTOSELECT, and FORCED are three BOOL values respectively, corresponding to whether the translation stream is selected by DEFAULT if necessary information is missing. If the user does not display the Settings, the playback stream will be played. FORCED only applies to the subtitle type and is used to mark the current automatic selection of the translation stream.

LANGUAGE is used to specify the LANGUAGE type, which is based on [ISO 639 LANGUAGE code](www.w3.org/WAI/ER/WD-A… ISO 639 language code) standard Settings. When the system default player selects a subtitle, the name of the display subtitle list is set according to this value.

The URI is the location information of the resource, which in this case corresponds to a subtitled M3U8 file. Subtitles /eng/prog_index. m3U8 is a relative path

Based on the above information, we can analyze the meaning of the above content as follows: Currently, the video supports three kinds of subtitles: English, Japanese and Chinese. However, each language has two Ext-X-Media messages, which are grouped differently, one in subs and one in subsC. Why are there two groups? We’ll talk about that later.

Video format

Look further down for an index of video content:

# EXT - X - STREAM - INF: BANDWIDTH = 827299, business - BANDWIDTH = 747464, CODECS = "avc1.64001 f, mp4a. 40.2," RESOLUTION = 640 x360, FRAME RATE = 29.970, AUDIO = "program_audio", SUBTITLES = "subs" 0640/0640. M3u8 # EXT - X - I - FRAME - STREAM - INF: BANDWIDTH = 360849, business - BANDWIDTH = 320932, CODECS = "avc1.64001 f," RESOLUTION = 640 x360, URI = "0640/06 40_I-Frame.m3u8"Copy the code

Ext-x-stream-inf: This property specifies a backup source, i.e., the video playback path and some video information. The following is the configuration of the corresponding content: BANDWIDTH is the peak bit rate, 827299, 827299bit/s, i.e., 101KB of traffic consumed per second at the peak.

AVERAGE-BANDWIDTH indicates the AVERAGE bit rate, 747464

CODECS is the encoding information, AVC1.64001F, MP4A. 40.2, AVC represents the H264 encoding format, 64001F is the encoding parameter represented by hexadecimal, 64,00f and 1F respectively represent three different parameter values. Mp4a is an audio encoding format in which 40.2 represents the encoding parameters of the audio.

The RESOLUTION of a video source is 640×360.

Frame-rate indicates the maximum FRAME RATE. 29.970 indicates that the current maximum FRAME RATE is 29.970 frames per second.

Note AUDIO indicates the AUDIO group, and program_audio indicates the name of the corresponding AUDIO group.

SUBTITLES indicate the corresponding subtitle groups, subs is the name of the corresponding subtitle group. The subtitle information above has a group-ID that corresponds to this value.

The URI is the content path. 0640/0640.m3u8 corresponds to the m3u8 file path of the video source. This can be seen in the packet capture information.

Under ext-X-stream-INF is ext-X-i-frame-stream-INF, which represents the I frames (key frames) of the multimedia resources contained in the playlist file. Since an i-frame is just a frame, it does not contain audio content, and the rest of the parameters are in the same format as the video content.

After that, there are video sources with different resolution rates, 1920×1080, 1280×720, 960×540 and 480×270. Because HLS will switch resolution according to the network situation, multiple resolution is generally prepared for selection. According to the analysis of captured packet data, the first segment played was 640 definition, followed by the 2-8 segments of 480 definition, and then switched to 640 definition.

Audio formats

Further down is the index of the corresponding audio

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio",LANGUAGE="eng",NAME="Alternate Audio",AUTOSELECT=YES,DEFAULT=YES,URI="audio1/audio1.m3u8"
Copy the code

# ext-x-media – # ext-x-media – # ext-x-media

TYPE=AUDIO, this time the TYPE is AUDIO.

Group-id indicates the GROUP ID, which corresponds to the AUDIO content in ext-X-stream-INF.

URI= Audio path of audio1/ Audio1.m3u8.

Alternate sources for different encoding formats

In the main M3U8 file we can also see a 640 resolution video source, which is not the same as the above 640 resolution, and it looks like this:

# EXT - X - STREAM - INF: BANDWIDTH = 1922391, business - BANDWIDTH = 1276855, VIDEO - RANGE = the SDR, CODECS = "hvc1.2.4. H150. B0, mp4a. 40.2", RESOLU TION = 640 x360, FRAME - RATE = 29.970, AUDIO = "program_audio_0", SUBTITLES = "subsC" is 0640 c/prog_index. M3u8 # EXT - X - I - FRAME - STREAM - INF: BANDWIDTH = 1922391, business - BANDWIDTH = 1276855, CODECS = "hvc1.2.4. H150. B0," RESOLUTION = 640 x360, URI =" 0640c/iframe_index.m3u8" #EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="program_audio_0",LANGUAGE="eng",NAME="Alternate Audio",AUTOSELECT=YES,DEFAULT=YES,URI="audioc/prog_index.m3u8"Copy the code

CODECS encode format for hvc1.2.4. H150. B0, mp4a. 40.2, audio encoding format hasn’t changed, but the video encoding format changed. Hvc1 is one of the HEVC (H265) encoding format, which is a new generation of video encoding format launched by Apple. Due to compatibility problems, many clients can not parse this format, so it is not very popular. The video source of this format appears here as a backup. Comparing two contents with the same resolution, it can also be found that hVC1 format has a higher bit rate than AVC1 format, which indicates that hVC1 content is larger and AVC1 compression ratio is higher under the same resolution.

Corresponding to hVC1 format video source, its subtitle content group and audio content group are also changed, which is why there are two copies of the above subtitle in the same language, they respectively correspond to AVC1 and HVC1 format video source.

This is the main list for M3U8, where the audio and video are handled separately and can actually be put together.

M3u8 file containing media material

Take the file 0640. M3u8 for example

#EXTM3U # ext-x-version :4 # ext-x-targetDuration :7 # ext-x-media-sequence :1 # ext-x-playlist-type :VOD #EXTINF:6.006, Ts #EXTINF:6.006, 0640_00002.ts #EXTINF:6.006, 0640_00003.ts.... #EXT-X-ENDLISTCopy the code

#EXTM3U and # ext-x-version are the M3U header and compatible VERSION numbers respectively. This format is an earlier one, so the VERSION number is lower than the main file.

Ext-x-targetduration Specifies the maximum time for each segment to be played. 7 indicates 7 seconds. The segment in this directory cannot exceed 7 seconds.

Ext-x-media-sequence indicates the SEQUENCE number of the first segment in the playlist. 1 indicates that the segment starts from 1.

#EXTINF indicates the length of the segment, and 6.006 indicates that the current segment is 6.006s. Information about the total length of the video is accumulated by this value.

Ts is the relative path of the clip. Ts file represents a video or audio clip. It can be in THE formats of TS, MP4, AAC, etc. Since we already specified that we start at 1, the sequence number is 0640_00001.

# ext-x-endList is the end identifier of the media content, because m3U8 can stand for both on-demand and live. The difference between on-demand and live depends on whether there is this identifier at the end of the file. If it doesn’t, it’s live, and it goes on and on.

The content of audio file audio1.m3u8 and subtitle file pro_index. M3u8 are similar, the difference lies in that their slice content is an ACC audio file and a WebVTT subtitle file.

M3U8 with sliced content can also exist as a separate video link, in which case the sliced content needs to contain both audio and video content.

File encryption

The HLS protocol supports encryption. If the index file contains information from a key file, the following media file must be decrypted using the key before it can be opened. Current HLS supports AES-128 encryption using 16-OCTET key. The key format is an array packed with 16 octal groups in binary format.

There are three encryption configuration modes:

Mode 1: allows you to specify a path to the key file on disk. The slicer inserts the URL of the existing key file into the index file. All media files are encrypted with this key.

Pattern two: The slicer generates a random key file, stores it in the specified path, and references it in the index file. All media files are encrypted using this random key.

Pattern 3: Generate a random key file for every n fragments, save it to a specified location, and reference it in the index. The key in this mode is in alternate encryption state. Each set of N fragment files is encrypted using a different key.

Reference: HLS-ios Video Playback service architecture in-depth exploration (I)