0 background

In the current service scenario, a user uploads a video. After the video is successfully uploaded, the frame cutting service is run in the background and the picture is returned as the recommended cover for display. However, this scheme needs to wait for the video to be uploaded and then run the frame cutting task in the background, so the user has to wait a long time.

Therefore, the front end should be considered for frame cutting, and recommendation covers should be generated at the beginning of video uploading to improve user experience.

1 Scheme Comparison

1.1 canvasSectional frame

CurrentTime =seconds (videoobject. currentTime=seconds) and then draw the image in the canvas. There is a related open source library, you can experience its demo. However,

1.2 WebassemblySectional frame

Using the powerful C/C++ written FFMPEG, through emscripten compiler packaged into the form of WASM + JS, and then using JS to achieve the video frame cutting function. In terms of compatibility, Webassembly has been supported by all major browsers, but only some browsers are still not supported, and the old solution is used for the unsupported browsers.

1.2.1 ffmpeg. Wasm

Currently, there is an open source library ffmpeg.wasm. The library includes:

  • @ffmpeg/core compile ffMPEG to generate ffMpeg -core.wasm + JS glue code.
  • @ffmpeg/ffmpegThe section that implements the invocation of the glue code generated in the previous step providesload.runSuch as API. At the same time, if the developer is right@ffmpeg/coreIf not, you can build a custom oneffmpeg-core.wasm.

So, can you just use it? There are some problems to be solved:

  • Browser compatibility: As we know, the browser’s JS thread is single-threaded and mutually exclusive with the rendering thread. In order not to block the page rendering and js main thread,@ffmpeg/coreWhen compiling FFMPEG, thepthreadsThe resulting JS glue is used in the codesharedarraybuffer.sharedarraybufferIt can satisfy the data sharing between the main thread and the worker, as well as the data sharing among multiple workers, which is ideal for this scenario. But becauseSecurity issues, all major browsers are disabled by default, you need to configure additional return header fields, andSupport is not ideal, cannot meet the online standard.

  • Wasm redundant:@ffmpeg/corecompiledffmpeg-core.wasmAlmost all of the ffMPEG features are included, many of which are not needed for frame capture.

Therefore, the final solution is to use Webassembly frame truncation to implement:

  1. Custom compile FFMPEG, compile out withoutsharedarraybuffertheffmpeg-core.wasm+js and optimizeffmpeg-core.wasmFile size.
  2. use@ffmpeg/ffmpegCall the ffMPEG method compiled in the previous step to achieve the truncated frame function.
  3. useweb workerRun the truncated business code to prevent blocking the main thread.

2 compile ffmpeg

2.1 Understanding Concepts

First, let’s look at the concepts and principles involved.

2.1.1 ffmpeg

Ffmpeg is an excellent C/C++ audio and video processing library that can achieve video screenshots. Libraries involved in the screenshot:

  • Libavcodec: Encodes and decodes audio and video.
  • Libavformat: encapsulate and unencapsulate audio and video.
  • Libavutil: a library that contains common utility functions, including arithmetic, character manipulation, etc.
  • Libswscale: Image scaling and pixel format conversion.

Components involved:

  • Demuxer: Decapsulate the video
  • Decoder: To decode video
  • Encoder: After getting the decoded frame, output picture encoding
  • Muxer: Image encapsulation

2.1.2 WebAssembly

WebAssemblyOr WASM: a new format that is portable, small, fast loading, and Web compatible. Provides an efficient compilation target for languages such as C, C++ and Rust, enabling code written in multiple languages to run at near native speed on a network platform.

2.1.3 emscripten

Emscripten is a WebAssembly compiler toolchain. Specifically, languages such as C/C++, through the Clang front-end to LLVM intermediate code (IR), and from LLVM IR to WASM. WebAssembly is then downloaded by the browser, which then passes through the WebAssembly module, to the assembly code of the target machine, to the machine code (x86/ARM, etc.).

What are LLVM and Clang?

LLVM is the same as LLVM Intermediate Representation (LLVM IR) for different front and back ends. Clang is a sub-project of LLVM, a C/C++/Objective-C compiler front end based on the LLVM architecture.

  • Frontend: lexical analysis, syntax analysis, semantic analysis, and intermediate code generation
  • Optimizer: Intermediate code optimization (loop optimization, remove useless code, etc.)
  • Backend: Generates object code. If the object code is absolute instruction code (machine code), the object code can be executed immediately. If the object code is assembly instruction code, it must be assembled by the assembler (generating machine code) before it can be run.

2.2 Build an Emscripten environment

You can install Emscripten by following the instructions on the website. A more recommended method is to download Docker Desktop and use the established Emscripten environment by running Docker to avoid the pit of the local development environment. Download the ffMPEG source code, here refer to @ffmPEG /core, using version 4.3.1. In the source directory, write the following script to run Docker:

#! /bin/bash
set -euo pipefail

EM_VERSION=2.0.8

docker pull emscripten/emsdk:$EM_VERSION
docker run \
  --rm \
  -v $PWD:/src \ # Bind mount
  emscripten/emsdk:$EM_VERSION \
  sh -c 'bash ./build.sh'

Copy the code

You can run the docker run –help command to view the command description. The next step is to write the compiled script build.sh.

2.3 Configuring FFMPEG compilation Parameters

Use emconfigure to set the appropriate environment parameters and configure the FFmpeg compilation parameters. Documentation for the configuration:

  • runemconfigure ./configure --helpView all available configurations.
  • A detailed description of the FFMPEG configuration can be viewed here.
  • Documentation for FFMPEG Muxers and Demuxers
  • Ffmpeg Encodes and decodes documentation
emconfigure ./configure 
  --target-os=none        # use none to prevent any os specific configurations
  --arch=x86_32           # use x86_32 to achieve minimal architectural optimization
  --enable-cross-compile  # enable cross compile
  --disable-x86asm        # disable x86 asm
  --disable-inline-asm    # disable inline asm
  --disable-stripping     # disable stripping
  --disable-programs      # disable programs build (incl. ffplay, ffprobe & ffmpeg)
  --disable-doc           # disable doc
  --nm="llvm-nm"
  --ar=emar
  --ranlib=emranlib
  --cc=emcc
  --cxx=em++
  --objcc=emcc
  --dep-cc=emcc
  Remove components that are not needed
  --disable-avdevice
  --disable-swresample
  --disable-postproc
  --disable-network
  --disable-pthreads
  --disable-w32threads
  --disable-os2threads
  # Decapsulation, codec, etc
  --disable-everything The key to reducing wASM volume is to disable some components except the following
  --enable-filters
  --enable-muxer=image2
  --enable-demuxer=mov # mov,mp4,m4a,3gp,3g2,mj2
  --enable-demuxer=flv
  --enable-demuxer=h264
  --enable-demuxer=asf
  --enable-encoder=mjpeg
  --enable-decoder=hevc
  --enable-decoder=h264
  --enable-decoder=mpeg4
  --enable-protocol=file

emmake make -j4 # building
Copy the code

2.4 generated js + wasm

Compile the link code generated in the previous make step to JavaScript + WebAssembly. You can view the EMCC parameter options through emcc –help and the Clang parameter options through clang –help.

emcc
  -I. -I./fftools # Add directory to include search path
  -Llibavcodec -Llibavdevice -Llibavfilter -Llibavformat -Llibavresample -Llibavutil -Llibpostproc -Llibswscale -Llibswresample # Add directory to library search path
  -Qunused-arguments # Don't emit warning for unused driver arguments.
  -o wasm/dist/ffmpeg-core.js fftools/ffmpeg_opt.c fftools/ffmpeg_filter.c fftools/ffmpeg_hw.c fftools/cmdutils.c fftools/ffmpeg.c # output
  -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lm # library
  -s USE_SDL=2 # use SDL2
  -s MODULARIZE=1 # use modularized version to be more flexible
  -s EXPORT_NAME="createFFmpegCore" # assign export name for browser
  -s EXPORTED_FUNCTIONS="[_main]" # export main and proxy_main funcs
  -s EXTRA_EXPORTED_RUNTIME_METHODS="[FS, cwrap, ccall, setValue, writeAsciiToMemory]" # export extra runtime methods
  -s INITIAL_MEMORY=33554432 # 33554432 bytes = 32MB
  -s ALLOW_MEMORY_GROWTH=1 # allows the total amount of memory used to change depending on the demands of the application
  --post-js wasm/post-js.js # emits a file after the emitted code. use to expose exit function
  -O3 # optimize code and reduce code size
Copy the code

The final build of FFMPEG-core. wasm is 5MB in size and will be even smaller after gzip.

3. Realize frame cutting function

Here is a simple implementation to run the compiled ffmPEG-core.js.

3.1 Calling JS glue code

This part of calling js glue code is already implemented in the open source @FFmpeg/FFmPEG library, and we can simply use its API.

const { createFFmpeg } = require('@ffmpeg/ffmpeg');
const ffmpeg = createFFmpeg({ log: true });

(async() = > {await ffmpeg.load();
  await ffmpeg.run('-i'.'example.mp4'.'-r'.'80'.'-vf'.'select="eq(pict_type,I)"'.'-frames'.'8'.'frame-%04d.jpg'); // Capture 8 frames and key frames}) ();Copy the code

So how do we implement the load and run methods?

The first thing to understand here is that JavaScript can only use Number as an argument when exchanging data with C. Since JavaScript and C/C++ have completely different data systems from a language perspective, Number is the only intersection between the two, so when they call each other, they are essentially exchanging numbers.

Therefore, if the argument is a string, an array, or other type other than Number, you need to split it into the following steps:

  • useModule._malloc()inModuleAllocate memory in heap, get address PTR;
  • PTR where data such as strings/arrays are copied to memory;
  • Call a C/C++ function with the PTR as an argument.
  • useModule._free()The release of PTR.

As you can see, the call process is quite tedious. To simplify the call process, Emscripten provides cwrap functions.

Here is part of the source code for @ffmpeg/ FFmpeg:

const createFFmpegCore = require('path/to/ffmpeg-core.js');
let ffmpeg;

/ / load
const load = async () => {
  Core = await createFFmpegCore({
    print: (message) = >{}}); ffmpeg = Core.cwrap('_main'.'number'['number'.'number']); // cwrap calls the exported main function
};

const parseArgs  = (Core, args) = > {
  const argsPtr = Core._malloc(args.length * Uint32Array.BYTES_PER_ELEMENT);
  args.forEach((s, idx) = > {
    const buf = Core._malloc(s.length + 1);
    Core.writeAsciiToMemory(s, buf);
    Core.setValue(argsPtr + (Uint32Array.BYTES_PER_ELEMENT * idx), buf, 'i32');
  });
  return [args.length, argsPtr]; // [array length, array PTR]
};

// Run the ffmpeg command
const run = (. _args) = > {
  return new Promise((resolve) = >{ ffmpeg(... parseArgs(Core, _args));// Pass the command arguments
  });
};

module.exports = {
  load,
  run,
};
Copy the code

3.2 Frame truncation optimization

Cutting n frames at a time is very slow, because it is read frame by frame until the frame of the specified time is captured. The longer the time, the longer the capture time.

Therefore, change to such a cut frame, directly cut the key frame specified time, speed up a hundred times.

// ...
(async() = > {// ...
  // Obtain the duration first
  const frameNum = 8;
  const per = duration / (frameNum - 1);
  for (let i = 0; i < frameNum; i++) {
    await ffmpeg.run('-ss'.`The ${Math.floor(per * i)}`.'-i'.'example.mp4'.'-s'.'960x540'.'-f'.'image2'.'-frames'.'1'.`frame-${i + 1}.jpeg`);
  }
})();
Copy the code

4 web worker

Because -s USE_PTHREADS=1 is not configured in the build, the above call to ffmpeg will block the js main thread and render the page. For example, when the recommendation cover is generated, the progress status of the uploaded video cannot be updated, and the user cannot respond to other buttons on the page. Therefore, you need to add a Web worker to run it.

Web Workers are scripts that run on threads separate from browser page threads and can be used to divert almost all heavy processing from page threads. The main thread and worker can communicate with the onMessage event via the postMessage() method:

main.js

var myWorker = new Worker('worker.js');
myWorker.onmessage = function(e) {
  result.textContent = e.data;
  console.log('Message received from worker');
}
first.onchange = function() {
  myWorker.postMessage([first.value,second.value]);
  console.log('Message posted to worker');
}
Copy the code

worker.js

onmessage = function(e) {
  console.log('Message received from main script');
  var workerResult = 'Result: ' + (e.data[0] * e.data[1]);
  console.log('Posting message back to main script');
  postMessage(workerResult);
}
Copy the code

Comlink (1.1KB) is recommended here, which makes this message-based API more friendly by providing an RPC implementation. For example, to achieve truncated frame communication:

main.js

import * as Comlink from 'Comlink';
async function onFileUpload(file) {
  const ffmpegWorker = Comlink.wrap(new Worker('./worker.js'));
  const frameU8Arrs = await ffmpegWorker.getFrames(file);
  console.log('get frameU8Arrs from worker', frameU8Arrs);
}
Copy the code

worker.js

import * as Comlink from 'Comlink';
async function getFrames(file) {
  // ...
  // Obtain the duration, such as duration, first
  const frameNum = 8;
  const per = duration / (frameNum - 1);
  let frameU8Arrs = [];
  for (let i = 0; i < frameNum; i++) {
    await ffmpeg.run('-ss'.`The ${Math.floor(per * i)}`.'-i'.'example.mp4'.'-s'.'960x540'.'-f'.'image2'.'-frames'.'1'.`frame-${i + 1}.jpeg`);
  }
  // Obtain the image binary data Uint8Array from MEMFS
  for (let i = 0; i < frameNum; i++) {
    const u8arr = await ffmpeg.FS('readFile'.`frame-${i + 1}.jpeg`); 
    frameU8Arrs.push(u8arr);
    ffmpeg.FS('unlink', fileName);
  }
  return frameU8Arrs;
}

Comlink.expose({
  getFrames,
});
Copy the code

In addition, if you are using Webpack, you can configure the worker-plugin to load the correct worker.js path.

5 Online Effect

After going online, users can select and edit the video cover for browsers that support this solution without waiting for the video to be uploaded. The truncated frame duration is on average 50% less than the old scheme. Point of optimization:

  • Some browsers report errors, and then continue to optimize, improve browser support. (for example, QQ browser does not support Webassembly.Memory, some version of Safari fetch wasm failed)
  • There is room for reduction in wASM volume. (For example, when compiling with — enable-filters, all filters are included)

reference

  • Build FFmpeg WebAssembly version (= ffmpeg.wasm): Part.2 Compile with Emscripten
  • Front end video frame extraction ffMPEG + Webassembly