Project background

Chinese prickly ash made a innovation projects in the spring, this is a live project of variety show, the front main job is to make a PC home site, in the front page of the site needs to be a player, can play live FLV streaming video, even when the user clicks on a video review button, pop-up window play HLS video stream; We didn’t think much about it when we started developing this player. We just used the simplest way anyone could think of, flv.js and hls.js! When the video is playing, the Player output by middleware video.js is called to play the video. This Player initializes the Player according to the end character of the video address: New HLS or flvjs.createPlayer: provides a consistent interface to call the player created by hls. js and flv.js. This is perfect for the product, but it feels a bit silly to write code, and the combined size of HLS. Js (208KB) and flv.js (169KB) is a bit tear-jerking. Then we had an idea if the two could be combined to form a lib that could play both FLV and HLS videos. The ideal is full, but the reality is very thin. Although these two lib are written by JavaScript, their category is video class, which was only called before, but we did not have a thorough understanding of it, but we still started to try under the leadership of the great force (LI).

FLV. JS analysis

Flv. js works by downloading FLV files and transcoding them into IOS BMFF (MP4 fragments) fragments, and then transferring the MP4 fragments to HTML5 Video tags for playback through Media Source Extensions. Its structure is shown below:

SRC /flv.js is the external output of flV.js components, events and errors, convenient for users to perform various operations according to the event thrown and obtain the corresponding playback information; The most important ones are the two players returned under flv.js: NativePlayer and FLVPlayer; NativePlayer is a repackaging of the browser’s own player so that, like FLVPlayer, it responds to common events and actions. FLVPlayer is the most popular player; The most important things in FLVPlayer can be divided into two parts: 1. MSEController; 2. Transmuxer;

MSEController

This MSEController is responsible for establishing a connection between the HTML Video Element and the SourceBuffer. Accept InitSegment(FTYP + MOOV in ISO BMFF) and MediaSegment (MOOF + MDATA in ISO BMFF); Add these two fragments to SourceBuffer in sequence, and give some control and state feedback to SouceBuffer;

Transmuxer

Transmuxer is mainly responsible for downloading, decoding, transcoding and sending segments; It mainly contains two modules, TransmuxingWorker and TransmuxingController. TransmuxingWorker enables multi-threading to execute TransmuxingController and forward events thrown by TransmuxingController. TransmuxingController is the drudgery department that actually downloads, decodes, transcodes and sends segments, doing all the hard work. Transmuxer(true superior) and TransmuxingController(false superior) are both calling its functions and passing its output;

Let’s welcome the hard-earned department

TransmuxingController

TransmuxingController is also a large department, with three groups working under him: IOController, Demuxer, and Remuxer;

  1. IOController

IOController has three main functions: firstly, it is responsible for selecting loaders under the IOController to select the loader that is most suitable for the current browser environment to load media streams from the server; The second is to store the data sent by loader; Third, send the data to Demuxer (decode) and store the unprocessed data of Demuxer;

  1. demuxer

Demuxer is the employee in charge of decoding. He needs to parse FLV data sent by IOController into videoTrack and audioTrack. The parsed data is sent to the Remuxer transcoder. After decoding, it will return the length of the processed data to the IOController, which will store the unprocessed data (total data – processed data) and wait for the next data to be sent to Demuxer with the additional unprocessed data from the header.

  1. remuxer

Remuxer is responsible for converting videoTrack and audioTrack into InitSegment and MediaSegment and sending them up, and synchronizing audio and video during the conversion process.

So what happens is FLVPlayer says start up, Loader loads data => IOController stores and forwards data => Demuxer decodes data => Remuxer transcodes data => TransmuxingWorker and Transmuxer forwards data =>

MSEController receive data => SourceBuffer; After a series of actions, the video can play;

Analysis of HLS. JS

HLS. Js works by downloading the index.m3u8 file, parsing the Level file, and then downloading the corresponding TS file based on Fragments from the Levels, transcoding them into IOS BMFF Fragments. Then transfer the MP4 clip to the HTML5 Video tag for playback through Media Source Extensions;

The structure of hls.js is as follows

HLS. Js is a bit flat compared to flv.js’ multi-tier classification. After inheriting the Observer trigger function, Go deep into various departments (i.e., various controllers and loaders) to command (hls.trigger(hlsevents.xxx, data) operations); After inheriting EventHandler, each department allocates its own responsibilities during instantiation. Take buffer-controller.js as an example:


constructor (hls: any) {
    super(hls,
      Events.MEDIA_ATTACHING,
      Events.MEDIA_DETACHING,
      Events.MANIFEST_PARSED,
      Events.BUFFER_RESET,
      Events.BUFFER_APPENDING,
      Events.BUFFER_CODECS,
      Events.BUFFER_EOS,
      Events.BUFFER_FLUSHING,
      Events.LEVEL_PTS_UPDATED,
      Events.LEVEL_UPDATED);
    this.config = hls.config;
  }

Copy the code

Buffer-controller.js is mainly responsible for the following functions:

  1. Resets the media buffer in response to the BUFFER_RESET event
  2. The SourceBuffer is initialized with the appropriate codec information when received in response to the BUFFER_CODECS event
  3. Respond to the BUFFER_APPENDING event by adding the MP4 fragment to SourceBuffer
  4. The BUFFER_APPENDED event is triggered after the buffer is successfully added
  5. Flusher the specified buffer range in response to the BUFFER_FLUSHING event
  6. The Buffer_uploading event is triggered after the buffer is flushed successfully

Buffer-controller.js is initialized to respond only to Events.MEDIA_ATTACHING, Events.MEDIA_DETACHING, and so on. OnMediaDetaching and other methods are used to respond to and complete these tasks, regardless of other things. After completing its tasks, it will inform other departments through HLS that it has completed its work, and transfer the work results to other departments. For example, line 581 this.hls. Trigger (events.buffer_uploading) in buffer-controller.js, This line tells other departments (controllers) that buffer_uploading is done;

If you see 'this.hls. Trigger (events.xxxx)', you can find the next step by searching onXXX(remove the underline from the event) method in the whole codeCopy the code

After understanding the HLS.JS code read routine we can more clearly understand the HLS.JS implementation of playing HLS stream of the general process;

  1. Ls. Js only plays HLS streams, there is no NativePlayer, so top-level SRC/ls. Js corresponds to flv.jsFLVPlayer, directly provide API, respond to the outside world’s various operations and send information; It will tell you when it’s ready to playHlsEvents.MANIFEST_LOADING.
  2. When playlist-loader receives hlsevents. MANIFEST_LOADING, it uses the XHRLoader to load the M3U8 document. After parsing the document, it will obtain the level contained in the document. Level [0] is the desired data); Playlist – loader will be issuedLEVEL_LOADEDAnd carry level information;
  3. The level-controller records the level information, calculates the interval for updating the M3U8, and continuously loads the M3U8 file to update the level. The Stream-Controller loads the fragment(ts files in the M3U8 document) after a series of operations; aFRAG_LOADINGEvent and initialize the decoder and transcoder (Demuxer object, which Remuxer initializes in Demuxer instantiation)
  4. FragmentLoader receivedFRAG_LOADINGThe corresponding TS file is then loaded and issued after the loading of the TS file is completeFRAG_LOADEDEvent, and send out the TS Uint8 data and other fragment information together;
  5. instream-controllerreceiveFRAG_LOADEDHe will call it after the eventonFragLoadedMethod, in which Demuxer parses the TS file, and through the collaboration of Demuxer and Remuxer, Generate InitSegment(data carried by FRAG_PARSING_INIT_SEGMENT event) and MediaSegment(data carried by FRAG_PARSING_DATA event), It is transmitted via steam-Controller to buffer-Controller and finally added to SourceBuffer.

How to combine

Through the analysis of flv.js and hls.js, their common process is downloading, decoding, transcoding and transferring to SourceBuffer; Same loader(FragmentLoader and FetchStreamLoader), same decoding and transcoding (demuxer and remuxer), Same SourceBuffer Controller (MSEController and buffer-controller); The difference is that their control flow is different, and the HLS flow has an extra step to parse the document;

Let’s think about how to combine two lib:

  1. According to the purpose of the project: the project is a main live broadcast, secondary broadcast site; The FLV live broadcast function is the most important function. The playback of HLS stream can only be used when users click the video to review and view the past program videos.

  2. According to the requirements of other projects: The main station of Huajiaoshi is now also in the form of HTTP-FLV for live display, and HLS stream is planned to play small anchor videos (on-demand);

  3. According to the situation in the industry, http-FLV (mature infrastructure, simple technology and small delay) is still basically used for live streaming in the industry, while HLS stream is generally used for live streaming in mobile terminals.

Therefore, we decide to use loader, Demuxer and Remuxer in HLs. js to form a new player library on the basis of FLv. js, which can play FLV videos. Also can play HLS stream (according to the needs of the project only contains single-bit rate stream live and on-demand, does not contain multi-bit rate stream, automatic switch bit rate, decryption and other functions);

Specific implementation process

First of all, we planned how to access the embedded functions:

  1. Loader access

HLS. Js requires FragmentLoader, XHRLoader, M3U8Parser, LevelController, StreamController, etc. FragmentLoader is a component that controls the loading of TS files and feedback the loading state of fragments. XHRLoader is a component that performs loading of TS files and playlist files. LevelController is the component that determines which TS file to load at the current level. LevelController is the component that determines which TS file to load at the current level. FragmentLoader is responsible for the LevelController and StreamController functions when adding flv.js. When the IOController calls startLoad, XHRLoader will fetch and parse the playList, store Level details, select Level, check the Fragment sequenceNum to get the next TS file address, let XHRLoader load. (FragmentLoader came to the new company with new responsibilities).

  1. Access to Demuxer and Remuxer,

FLV and TS files are parsed in different ways, but in TransmuxingController, both of them need access to the unified data source IOController. Therefore, FLV decoding and transcoding are put into a FLVCodec object for external output function. TS decoding and transcoding are concentrated in TSCodec external output function; Instantiate decoders and transcoders based on the type of incoming media.

  1. Access to IOController and _mediaCodec

In TransmuxingController, a _mediaCodec object is used to manage FLVCodec and TSCodec, and the bindDataSource method owned by both IOController is called when accessing the data source. One thing to note here is that the FLVCodec function returns a number type consumed; This parameter represents the decoded and transcoded output length of FLVCodec. It needs to be returned to IOController for IOController to strip decoded data, store undecoded data, and pass it to FLVCodec next time. Because of TS file structure (each TS package is an integer multiple of 188 bytes), TSCodec processes everything every time and only needs to return consumed = 0.

  1. HLS stream on demand seek function access

In flv.js, every time SEEK operates, the KeyFrame information in MediaInfo will be used to find the corresponding Range point, and then load from the Range point. For HLS on-demand streams, you need to query the Level information in FragmentLoader and loop each Fragment to determine whether the time point of seek is at the playing time of the current Fragment. If yes, load the Fragment immediately.

  1. The handling of various unexpected situations

Add logger print log in the embedded component, and connect the error return into flV.js framework, so that it can return the error information and log information of response;

The specific structure is shown as follows:

In addition, we did the following:

  1. We also plugged in Typescript to enable type checking of function parameters.
  2. Integrated 354PR from Jamken (thanks ❤ Jamken) to flv.js in FLV-mp4remuxer to fix audio/video sync issues in flv.js;
  3. Also added video Supplemental Enhancement Information parsing, via listenerHJPlayer.Events.GET_SEI_INFOEvent to get custom SEI information in the format of Uint8Array;

An attempt at live video interaction

In the project, the host will provide the option of the development direction of the event during the program playing, and then the front panel will pop up for users to choose the direction, and the program will perform live according to the direction of the answer. According to the previous scheme, the general situation is to choose the Socket server to deliver the message, the front end receives the message after the display of options, and then the user chooses, click submit the answer such a process; Last year aliyun launched a novel live-streaming answer solution; The options are no longer delivered by the Socket server, but delivered by the video cloud server along with the video. Play the SDK resolution video in the video supplementary enhanced information, display options; We have practiced this scheme, and the general process is as follows:

When the host ask questions after the background staff will fill out the problem in the background, through the video cloud SDK to 360 video cloud, to deal with video, video cloud to join video added information, when playing the SDK, after receipt of the video with SEI information after decoding to heavy, will contain information to variety of studio interaction component, Interactive component display, users click to select the answer and submit it to the background for summary, the program content changes according to the summary of the answer;

Compared with the traditional scheme, the scheme using video SEI information transmission interaction has the following advantages:

  1. It can be synchronized with the moderator’s audio and video, avoiding the problem that the moderator has announced the start but the panel does not appear because the server does not send messages in a timely manner.
  2. Low cost, the problem is delivered by the video rather than the server, but the delay will be higher (can be inserted in the video in advance, after the moderator to raise the question, reduce the delay);

The content of video supplementary and enhanced information is generally specified by the cloud server. Except for the first 16 bits of UUID, the content is different. Therefore, the player directly casts SEI information (Uint8Array format data) through the GET_SEI_INFO event, and users need to parse information according to the format given by their own video cloud. Also note that SEI messages are sent repeatedly over a period of time, so users need to re-write them themselves.

The last

After we finished this project, we applied it to the main station of Huajiao to play FLV live broadcast. In addition, we also opened the project to HJPlayer, hoping to help those programmers who meet the same project needs. If there is any problem in use, it can be put forward in ISSUES and let us discuss and solve it together.

The topic outside

  1. One might ask why your video review doesn’t use FLV files, so you can just use flv.js to play it.

A: click on the video review, need playlists in the past five minutes before the content, if use FLV file, then each must be generated from the interception of a video in the video store FLV file, and then the front pull file playback, it will increase a lot of pieces of video file, then will bring a series of storage problems; If the HLS stream is used, you can find the corresponding TS file in the stored HLS review file according to the timestamp returned by the front end, and generate an M3U8 document.

  1. What is Video Supplemental Enhancement Information?

Video supplementary information is one of the features of H.264 video compression standard, which provides a way to add information into the video stream. It is not necessary to be present in the decoding process, it may be helpful, but it does not and does not matter; SEI information can be inserted at the generation end of video content and during transmission. The inserted information, together with other video content, is transmitted to the playback SDK through the network; In the header of NAL UINT in H264/AVC encoding format, there is a type field to indicate the type of NAL UINT. When type = 6, the information carried by NAL UINT is supplementary enhanced information (SEI).

  1. Information about the SEI

The next digit after NAL uint type is the SEI type. Generally, user-defined SEI information type is 5, that is, user_datA_unregistered. The next bit of the SEI type up to 0xFF is the length of the data carried, followed by the 16-bit UUID, followed by the end of the 16-bit UUID until 0x00, which is the custom information content. So message content length = the length of the data carried by the SEI message – 16 bit UUID; The resolution method of customized information content should be defined according to the data format given by the video cloud.