Nowadays, more and more businesses want to show videos in their own pages. If an app says it doesn’t support video, it’s just as unimaginable as if it couldn’t see pictures in an app N years ago

Of course, playing video is inseparable from the VIDEO element of the HTML5 standard. Therefore, I hope to understand the principle and performance of video tags from the essence of video

A, use,


MDN: THE HTML element is used to embed a media player in AN HTML or XHTML document to support video playback within the document.

<video controls width="250">
    <source src="/media/cc0-videos/flower.webm"
            type="video/webm">
    <source src="/media/cc0-videos/flower.mp4"
            type="video/mp4">
    Sorry, your browser doesn't support embedded videos.
</video>
Copy the code

The example above shows the use of the

The content in the middle of the
tag is a downgrade for when the browser does not support this element.

Browsers don’t all support the same video format, so you can provide multiple video sources in theelement, and the browser will use the first one it supports

Second, browser and player


It’s easy to use, but it’s worth noting that there are two important messages in the sentence “browsers don’t all support the same video format.

1. Different browsers support different video formats

Or to put it another way: Different browsers support different video formats because they have different built-in video players

The video element is a ** replaceable element. ** Replaceable elements are elements that the browser decides to display based on their tags and attributes. These elements often have no actual content, i.e. an empty element

In Chrome, the preset player is CPlayer, while in Safari, the preset player is SPlayer

Different player supportVideo formatOf course, it’s not the same. Not only that, but performance varies in UI, controllability, and a few other areas

To discuss the video format supported by the video TAB, you must first specify the browser

2. Browsers only support specific video formats

The following chart shows the mainstream video formats supported by the major browsers

If you look at the first line of the table for chrome’s supported formats, you’ll see that only Ogg, MP4, and WebM are in parentheses. But what really determines whether or not to play is the ones outside the brackets, which are relatively unfamiliar h.264, HEVC, VP8 and so on

So what is the format of the video, but also from the composition of the video

3. Video format


The essence of a video is simple, a fast-playing image.

A few simple concepts:

  • Frames: Each frame represents a still image
  • Frame rate (FPS) : The number of images displayed per second. The larger the frame rate, the smoother the picture; The smaller the frame rate, the more stagnant the picture
    • 12 FPS: Due to the special physiology of the human eye, images viewed at a frame rate higher than about 10-12 frames per second are considered coherent
    • 24 FPS: Sound films shoot and play at a frame rate of 24 frames per second, which is acceptable to the average person
    • 30 FPS: In the early days of highly dynamic video games, frame rates below 30 frames per second were incoherent because there was no motion blur to reduce the flow
    • 60 FPS: In practice, 60 FPS is a better experience than 30 FPS
    • 85 FPS: Generally speaking, the limit of video processing in the brain

1. The source file

Let’s calculate the size of the video in its simplest definition

  • To describe colors in RGB, 0 to 255 equals 256 possibilities, which is exactly 1 byte. So one pixel255 * 255 * 255The size of3B
  • The resolution of the 980 * 720Normal resolution pictures are required980 * 720 * 3 b material 2 m
  • The normal videoFrame rate30 indicates that 30 images are played in 1s. So a 1-minute video is equal to2 m * 30 * 60 material 3.5 G

A 980-resolution pure video of 1 minute length at 30 frame rate is about 3.5g in size

The original YUV, RGB format video source file is that big

2. The coding

A video of this size is obviously unusable, and at current Internet speeds, a video of a few minutes can take half a day to download

So the first task is compression, the technical term is encode. Video is generally encoded in two ways

2.1 Intra-frame coding

In-frame coding is compressing images, such as all images compressed as JPeGs (the idea being that the human eye is more sensitive to brightness than to color). Usually after this step you can reduce the size of the video by 90%

2.2 Interframe coding

Interframe coding takes advantage of the correlation between frames. For example, compared with the first frame, only the sun and people change, so the second frame only needs to store the corresponding changes (similar to git, only save diff).Compressed frames are divided into:The I frame,P frameandB frame

  • I frame: key frame, using the intra – frame compression technology
  • P frame: Forward reference frame. When compressed, only the previously processed frame is referred to. Interframe compression is used
  • B frame: bidirectional reference frame, in compression, it refers to the frame before and after it. Interframe compression is used

In addition to I/P/B frames, there are image sequencesGOP. GOP: There is an image sequence between two I frames, and there is only one I frame in an image sequence

3. Encoding format

Coding ideas can be generally classified into these two, how to achieve the specific, that we have their own powers

Different companies, different organizations, different generations will have different solutions. Efficiency and cost are totally different

Common coding standards (roughly in order of coding efficiency from low to high, the lower the better)

standard Launch institutions instructions
H.262, MPEG-2 Part 2 MPEG/ITU-T A precursor to MPEG-4, MPEG-2 was the standard codec for DVDS and early Blu-ray discs. It’s not often used for streaming video.
AVC (H.264), MPEG-4 Part 10 MPEG/ITU-T H.264 provides high quality video images at low bit rates. Is the most common and mainstream standard
VP8 Google Similar to the h. 264
HEVC (H.265) MPEG/ITU-T H.265 is designed to deliver higher-quality web video with limited bandwidth, taking up half as much space as H.264 for the same visual quality
VP9 Google Similar to the h. 265
AV1 AOM The specific goal of AV1 is to significantly reduce the bit rate compared with VP9 and HEVE, and the target is about 30% reduction. Decoding is of low complexity, with a target roughly twice that of VP9

Of course, these are just specifications (like our front-end ES6 and ES10). Even if again advanced, no one to promote, use also useless

Whether a coding standard can be popularized and used depends not only on efficiency, but also on a series of factors such as the organization behind it, patent and so on. The most popular and widely used coding specification is H.264

To put it another way, h.264 is currently the most popular video format

4. Package format

We usually see XXX.MP4, XXX.AVI files, accurately called video encapsulation format. Is a container that contains video encoding (e.g. H.264), audio encoding (e.g. AAC), synchronization information, captions, metadata (e.g. Titles), and so on

Codec name (short) Full Codec name Package format Container Support
AV1 AOMedia Video 1 MP4, WebM
AVC (H.264) Advanced Video Coding 3GP, MP4, WebM, TS, FLV
H.263 H.263 Video 3GP
HEVC (H.265) High Efficiency Video Coding MP4
MP4V-ES MPEG-4 Video Elemental Stream 3GP, MP4
MPEG-1 MPEG-1 Part 2 Visual MPEG, QuickTime
MPEG-2 MPEG-2 Part 2 Visual MP4, MPEG, QuickTime
Theora Theora Ogg
VP8 Video Processor 8 3GP, Ogg, WebM
VP9 Video Processor 9 MP4, Ogg, WebM

So go back to the original support chart and use Chrome to list it. H.264 and H.265 (HEVC) can both be packaged into MP4, but Chrome only supports the former

5. The audio

A complete “video” contains video coding, audio coding, subtitles and many other information.

Video encoding and audio encoding generally exist together. ** Like video, audio goes from source to encoding to encapsulation. Comparison can be understood, this article does not make other explanation

Common audio coding standards

standard Launch institutions instructions
AAC MPEG All fields (New)
AC-3 Dolby Inc. The movie
MP3 MPEG Fields (old)
WMA Microsoft Inc. Microsoft platform

The most widely used audio encoding format is AAC

4. Video processing

As we have learned from the above, whether a player can play a video depends on its encoding format, but the encapsulation format can not be ignored

Take the widely used FLV format as an example:

1. Encapsulate and unpack

H.264 and AAC can be packaged as MP4 or FLV. But Chrome can only play the former and not the latter, which makes no sense when you think about it. For example, two things can be separated in box A, but not in box B

Because essentially what matters isthings(H.264, H.265), rather thanbox(MP4, FLV)This is the browser versus the differenceVideo codingSupport forChrome supports H.264. So you still have to understandthingswithboxHow is it converted between

The step of wrapping code into a file is called remux, and the step of disassembling the file into code is called demux.

Encapsulation, abstractly, is actually arranging things into different codes

Suppose h.264’s binary is 123456789, and the different packages are as follows (those other than numbers can refer to audio, subtitles, etc.)

  • MP4
ab
123456789
b
c
d
Copy the code
  • FLV
a
b
123
c
456
d
789
Copy the code

So it looks like we could unpack FLV back to H.264 and audio and subtitles, and then package it as MP4

2. MSE

There is an easier way, we don’t have to seal the package at all

With the Media Source Extensions API, you can unpack the video encoding and audio encoding and play it directly

The Media Source Extension API (MSE) provides the capability to implement plug-in – free web-based streaming media. With MSE, media streams can be created in JavaScript and can be played using

The well-known related library in the industry is flv.js, which is open source by former staff of STATION B, which can unpack FLV of specified coding into H.264 and ACC and play it through MSE.Currently, MSE is generally supported except for iOS and IE

3. The other

A little further can think of, can use JS unpack, so decoding can also?If you look at the video implementation, you can see that the browser is decoded by FFmpeg. In fact, js or Wasm can replace this step

Suppose we now have a video format that the browser doesn’t support

  1. The encoding format found after unpacking is not supported
  2. Heart a malicious, through JS or Wasm compiled FFmpeg completed the decoding, got the video source file

And then there are two ideas

  1. Encode it into H.264 and play it over MSE
  2. According to the source file YUV, RGB, get the original picture of each frame. Continuously play pictures in Canvas to form a video, and simultaneously play audio in Audio

There are industry counterparts for both methods, but neither has been widely used due to performance and compatibility issues

So in theory, any video format can be played

Five, the live broadcast


All of the above is playing a full video. In reality, a video of a few minutes must be larger than 10M. Major video websites are more likely to have more than dozens of minutes of drama and variety shows, as well as live broadcasts of varying sizes

1. Video transmission

In these cases, you certainly can’t wait for the video to be fully downloaded before playing it. There are two major solutions for handling large video data

  • Slice file: Divide the video into 3-10s video clips, download and play them one by one
  • Continuous stream: A video stream is transmitted and played in data frames (essentially smaller segments).

There are four common video transmission protocols

agreement http-flv rtmp hls dash
transport http tcp http http
Video package format flv FLV, tag, Ts file Mp4, 3GP, WebM slices
Time delay low low high high
The data segment Continuous flow Continuous flow Slice file Slice file
HTML 5 play Unpack and play via HTML5 (flv.js) Does not support Native support, or via (hls.js) Play directly or HTML5 unpacked play

The conclusion is that slicing files are more compatible and more suitable for voD. Continuous stream has better timeliness and is suitable for live broadcast

In fact, the two solutions only solve the transmission problem. For example, after a 10s video is divided into 10 1s videos, the video has to be reloaded after each 1s video is played.

Basically, every 1s of playback, the progress bar on the video jumps from beginning to end. At the code level, the initialization and termination of the video will be re-executed every 1s. This experience is certainly not enough for normal use

2. Streaming media

There are two solutions

2.1 Data is unpacked and played through MSE

Binary files controlled by MSE can be deleted at will. So the new data is added directly to the video source through the stream, and the video does not reload

But iOS Safari, which has poor MSE support, doesn’t work with this solution

2.2 Browser Handling

The first scenario itself needs to deal with the following parts

  • Keep sending requests for the latest video
  • Unpack the video and audio coding
  • Push encoding file to MSE

HTTP Live Streaming (HLS) is a Live Streaming protocol proposed by Apple. So it’s supported natively in browsers on their platforms

By passing the HLS address directly to the video’s SRC property, the browser has done all of this

And from the beginning, video is a replaceable element. So many browsers (especially domestic mobile browsers) will use their own developed, different from the standard player.

  • The downside is that it’s not as controllable as standard video. Like having no control over the display hierarchy, always at the top of the page…
  • The upside is that it will help you deal with a lot of videos and even streaming agreements

For example, QQ browser and UC browser. You can play FLV and HLS streams directly

So I put together a list of live implementations that have always used the best solution:(Very few browsers do not support HTTP-FLV streams after hijacking, but will support HLS streams, such cases can be whitelisted)

Reference:

  • What happens between recording the video and playing it in the browser
  • Web video codec guide
  • HTML5 Video Support by Codec
  • Wiki: HTML 5 Video
  • Multimedia Front End Manual