0 background

In current service scenarios, a user uploads a video, and after the video is successfully uploaded, the background will run the frame capture service, and finally return a picture as a recommended cover for the user. In this scheme, the user needs to wait for the video to be uploaded, read the video in the background, and then run the frame cutting task, which takes a long time.

Therefore, consider the front end to do frame capture, and generate the recommended cover when uploading the video to improve the user experience.

1 Comparison of Schemes

1.1 canvasSectional frame

Use the

However,

1.2 WebassemblySectional frame

The use of powerful C/C++ written FFMPEG, through the Emscripten compiler packaged into wASM + JS form, and then use JS to achieve video frame capture function.

In terms of compatibility, Webassembly has been supported by all major browsers, but only some browsers still do not support it, and for those browsers that do not support it, the old scheme is used.

This scheme has been practiced in platform B and other platforms, and relevant implementation can be referred to. Finally, it was decided to use this scheme.

1.3 WebassemblyTruncated frame implementation comparison

1.3.1 ffmpeg. Wasm

At present, there is an open source library ffmpeg.wASM. The library includes:

  • @ffmpeg/core: Compile FFMPEG to generate FFmPEG-core. wasm + JS glue code.
  • @ffmpeg/ffmpeg: implements the part that calls the glue code generated in the previous step, provides load, RUN, etc apis. Also, if developers aren’t happy with @ffmPEG /core, they can build a custom ffmPEG-core.wasm.

So, can you just use it? There are these problems to be solved:

  • Browser compatibility: As we know, browser JS threads are single-threaded and mutually exclusive with render threads. In order not to block the render and JS main thread of the page,@ffmpeg/coreConfigured when ffMPEG is compiledpthreads, resulting in the js glue used in the codesharedarraybuffer.sharedarraybufferIt can meet the data sharing between the main thread and workers, as well as the data sharing between multiple workers, which is ideal for this scenario.

However, due to security issues, all mainstream browsers are disabled by default, and some return header fields need to be configured, and the support is not ideal, can not reach the online standard.

  • Wasm redundant:@ffmpeg/corecompiledffmpeg-core.wasmIncludes almost all the features of FFMPEG,fileThe size is 24MB (8.5MB after Gzip), much of which is not needed for frame capture.

1.3.2 Implementation of other platforms

By customizing FFMPEG, the resulting WASM file size can be reduced to 4.7MB (or even smaller after gzip) depending on business requirements (few formats are supported).

However, they maintain a c language entry file, with FFmpeg provided by the internal library, to achieve the frame truncation function, and then compile FFmpeg.

This approach tests your understanding of FFmpeg and is tied to a specific version of FFmpeg, which may change apis and directories as FFmpeg versions upgrade. In addition, with the development of business, we may use more functions of FFMPEG, and we need to modify the C code, which has low maintainability.

1.4 summary

Therefore, the final solution is to use Webassembly frame capture to implement:

  1. Custom ffMPEG compilation, optimize wASM file size.
  2. Use the fftools/ffmpeg.c entry file provided with FFmpeg (V4.3.1) without having to write your own C code.
  3. Compile without stripsharedarraybufferFfmpeg-core. wasm+js, and finally use the Web worker to run the business code related to the intercept frame to prevent blocking the main thread.
  4. Call compiler generated FFMPEG JS glue code, achieve frame truncation function, this part can be used@ffmpeg/ffmpeg.

2 Custom ffMPEG compilation

2.1 Run Docker using the official Emscripten environment

Emscripten is a WebAssembly compiler tool chain.

Download Docker Desktop and use the established Emscripten environment by running Docker to avoid the pit of local development environment. MAC Docker Desktop is always connected, measured Windows Ubuntu Docker command is more stable.

In ffmPEG source directory, write the following script to run docker:

#! /bin/bash
setEuo pipefail EM_VERSION=2.0.8 docker pull emscripten/emsdk:$EM_VERSION
docker run \
  --rm \
  -v $PWD:/src \ # bind mount
  -v $PWD/wasm/cache:/emsdk_portable/.data/cache/wasm \
  emscripten/emsdk:$EM_VERSION \
  sh -c 'bash ./build.sh'

Copy the code

2.1.1 Understand the principle of Emscripten

Specifically, C/C++ and other languages, through clang front-end into LLVM intermediate code (IR), from LLVM IR to WASM. The browser then downloads WebAssembly, which passes through the WebAssembly module to the target machine’s assembly code, and then to the machine code (x86/ARM, etc.).

So what are LLVM and Clang?

  • LLVM is the unified LLVM Intermediate Representation (LLVM IR) code used by different front and back ends.
  • Clang is a subproject of LLVM, a C/C++/Objective-C compiler front end based on the LLVM architecture.

  • Frontend: lexical analysis, syntax analysis, semantic analysis, generation of intermediate code
  • Optimizer: intermediate code optimization (loop optimization, remove useless code, etc.)
  • Backend: Generates object code. If the object code is absolute instruction code (machine code), this object code can be executed immediately. If the object code is assembly instruction code, the assembly must be assembled (generating machine code) before it can run.

Next, write the compiled script build.sh.

2.2 Configuring FFMPEG compilation Parameters to eliminate redundancy

Ffmpeg is an excellent C/C++ audio and video processing library, you can achieve video screenshots.

First, we need to know the libraries and components involved in implementing screenshots.

Libraries involved:

  • Libavcodec: Encoding and decoding of audio and video.
  • Libavformat: Encapsulates and unencapsulates audio and video files.
  • Libavutil: a library of common utility functions, including arithmetic operations, character manipulation, etc.
  • Libswscale: Image scaling and pixel format conversion.

Components involved:

  • Demuxer: Unpack the video
  • Decoder: Decodes video
  • Encoder: Output image encoding after obtaining decoded frames
  • Muxer: Image encapsulation

Use Emconfigure to set the appropriate environment parameters, and configure FFmpeg compilation parameters. Documentation on configuration:

  • runemconfigure ./configure --helpView all available configurations.
  • Detailed instructions on FFMPEG configuration can be viewed here.
# configure FFMpeg with Emscripten
emconfigure ./configure 
  --target-os=none        # use none to prevent any os specific configurations
  --arch=x86_32           # use x86_32 to achieve minimal architectural optimization
  --enable-cross-compile  # enable cross compile
  --disable-x86asm        # disable x86 asm
  --disable-inline-asm    # disable inline asm
  --disable-stripping     # disable stripping
  --disable-programs      # disable programs build (incl. ffplay, ffprobe & ffmpeg)
  --disable-doc           # disable doc
  --nm="llvm-nm"
  --ar=emar
  --ranlib=emranlib
  --cc=emcc
  --cxx=em++
  --objcc=emcc
  --dep-cc=emcc
  Remove unnecessary libraries
  --disable-avdevice
  --disable-swresample
  --disable-postproc
  --disable-network
  --disable-pthreads
  --disable-w32threads
  --disable-os2threads
  Configure the required decapsulation, codec, etc
  --disable-everything The key to reducing wASM volume is to disable individual components except the following
  --enable-filters
  --enable-muxer=image2
  --enable-demuxer=mov # mov,mp4,m4a,3gp,3g2,mj2
  --enable-demuxer=flv
  --enable-demuxer=h264
  --enable-demuxer=asf
  --enable-encoder=mjpeg
  --enable-decoder=hevc
  --enable-decoder=h264
  --enable-decoder=mpeg4
  --enable-protocol=file

# build dependencies
emmake make -j4
Copy the code

2.4 generated js + wasm

Use EMCC to compile the linked code generated by make in the previous step into JavaScript + WebAssembly. Here use fftools/ffmpeg.c as the entry file, do not need to maintain a C language entry file.

Emcc parameter options can be viewed with emCC –help, and CLang parameter options can be viewed with clang –help.

emcc
  -I. -I./fftools # Add directory to include search path
  -Llibavcodec -Llibavdevice -Llibavfilter -Llibavformat -Llibavresample -Llibavutil -Llibpostproc -Llibswscale -Llibswresample # Add directory to library search path
  -Qunused-arguments # Don't emit warning for unused driver arguments.
  -o wasm/dist/ffmpeg-core.js # output
  fftools/ffmpeg_opt.c fftools/ffmpeg_filter.c fftools/ffmpeg_hw.c fftools/cmdutils.c fftools/ffmpeg.c # input
  -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lm # library
  -s USE_SDL=2 # use SDL2
  -s MODULARIZE=1 # use modularized version to be more flexible
  -s EXPORT_NAME="createFFmpegCore" # assign export name for browser
  -s EXPORTED_FUNCTIONS="[_main]" # export main and proxy_main funcs
  -s EXTRA_EXPORTED_RUNTIME_METHODS="[FS, cwrap, ccall, setValue, writeAsciiToMemory]" # export extra runtime methods
  -s INITIAL_MEMORY=33554432 # 33554432 bytes = 32MB
  -s ALLOW_MEMORY_GROWTH=1 # allows the total amount of memory used to change depending on the demands of the application
  -s ASSERTIONS=1 # for debug
  --post-js wasm/post-js.js # emits a file after the emitted code. use to expose exit function
  -O3 # optimize code and reduce code size
Copy the code

The final ffMPEG-core. wasm build is 5MB in size and will be smaller after gzip.

Source: build. Sh

At this point, ffMPEG compilation is complete! Now back to the familiar front end.

3 Achieve the frame capture function

3.1 Calling THE JS glue code

This part of calling JS glue code is already implemented in the open source library @ffmpeg/ffmpeg, and we can simply use its API.

const { createFFmpeg } = require('@ffmpeg/ffmpeg');
const ffmpeg = createFFmpeg({ log: true });

(async() = > {await ffmpeg.load();
  / /... The part about obtaining duration is omitted
  const frameNum = 8;
  const per = duration / (frameNum - 1);
  for (let i = 0; i < frameNum; i++) {
    await ffmpeg.run('-ss'.`The ${Math.floor(per * i)}`.'-i'.'example.mp4'.'-s'.'960x540'.'-f'.'image2'.'-frames'.'1'.`frame-${i + 1}.jpeg`);
  }
})();
Copy the code

During this period, it was also found that -ss placed before -I could intercept frames at a specified time without waiting to be read frame by frame, which could improve the speed of screenshots. You can view related API documents.

P.S. @ffmpeg/ffmpeg does not currently support loading ffmPEG-core-wasm +js without pThreads.

3.1.1 JavaScript and C exchange data

So how do the load and run methods actually work? The first thing to know is that JavaScript can only use Number as an argument when exchanging data with C. Because JavaScript and C/C++ have completely different data systems from a language perspective, Number is the only intersection of the two, so essentially when they call each other, they are exchanging numbers.

Therefore, if the parameter is a string, array, or other non-number type, you need to split it into the following steps:

  • useModule._malloc()inModuleAllocate memory in heap, get address PTR;
  • Copy string/array data to PTR in memory;
  • Take PTR as an argument and call C/C++ functions for processing;
  • useModule._free()The release of PTR.

@ffmpeg/ffmpeg

const createFFmpegCore = require('path/to/ffmpeg-core.js');
let ffmpeg;

/ / load
const load = async () => {
  Core = await createFFmpegCore({
    print: (message) = >{}}); ffmpeg = Core.cwrap('_main'.'number'['number'.'number']); // cwrap calls the exported main function
};

const parseArgs  = (Core, args) = > {
  const argsPtr = Core._malloc(args.length * Uint32Array.BYTES_PER_ELEMENT);
  args.forEach((s, idx) = > {
    const buf = Core._malloc(s.length + 1);
    Core.writeAsciiToMemory(s, buf);
    Core.setValue(argsPtr + (Uint32Array.BYTES_PER_ELEMENT * idx), buf, 'i32');
  });
  return [args.length, argsPtr]; // array length, array PTR]
};

// Run ffmpeg
const run = (. _args) = > {
  return new Promise((resolve) = >{ ffmpeg(... parseArgs(Core, _args));// Pass command arguments
  });
};

module.exports = {
  load,
  run,
};
Copy the code

4 web worker

Because -s USE_PTHREADS=1 was not configured in the build, the above method of calling FFmPEG blocks the MAIN JS thread and rendering of the page. For example, when the recommended cover is generated, the progress status of the uploaded video cannot be updated, and the user cannot respond when clicking other buttons on the page. Therefore, you need to add a Web worker to run it.

Web workers are scripts that run on a separate thread from the browser page thread and can be used to divert almost all heavy processing from the page thread. The main thread and the worker can communicate with onMessage events via the postMessage() method.

But writing the communication process using the postMessage() method and onMessage events makes the code cumbersome. Comlink (1.1kB) is recommended to make the code more friendly and the communication less perceptive.

For example, frame capture communication: main.js

import * as Comlink from 'Comlink';
async function onFileUpload(file) {
  const ffmpegWorker = Comlink.wrap(new Worker('./worker.js'));
  const frameU8Arrs = await ffmpegWorker.getFrames(file);
}
Copy the code

worker.js

import * as Comlink from 'Comlink';
async function getFrames(file) {
  // ...
  Duration, etc
  const frameNum = 8;
  const per = duration / (frameNum - 1);
  let frameU8Arrs = [];
  for (let i = 0; i < frameNum; i++) {
    await ffmpeg.run('-ss'.`The ${Math.floor(per * i)}`.'-i'.'example.mp4'.'-s'.'960x540'.'-f'.'image2'.'-frames'.'1'.`frame-${i + 1}.jpeg`);
  }
  // Get image binary data from MEMFS Uint8Array
  for (let i = 0; i < frameNum; i++) {
    const u8arr = await ffmpeg.FS('readFile'.`frame-${i + 1}.jpeg`); 
    frameU8Arrs.push(u8arr);
    ffmpeg.FS('unlink', fileName);
  }
  return frameU8Arrs;
}

Comlink.expose({
  getFrames,
});
Copy the code

Comlink is an RPC implementation based on Es6 Proxy and postMessage(). In the example, ffmpegWorker is an object located in worker.js, and what is obtained in main.js is only the handle of ffmpegWorker’s ontology. In fact, ffmpegWorker.getFrames and other methods are also executed on worker.js.

The only pitfall is that the output of this library is ES6 code, which needs to be converted to ES5 code through a build configuration.

4.1 webpack configuration

Also, if you use Webpack, you may have problems loading the correct worker.js path. The worker-plugin can be configured like this.

const WorkerPlugin = require('worker-plugin');
const isPub = true; // Whether the production environment
{
  // ...
  plugins: [
    new WorkerPlugin({
      globalObject: 'this'.filename: isPub ? '[name].[chunkhash:9].worker.js' : '[name].worker.js',})],}Copy the code

5 Online Effect

For browsers that support this solution, users can select and edit video covers without waiting for videos to be uploaded.

It also takes much less time to capture a video in the front than in the background. This is more obvious in the larger the video size.

6. Follow-up optimization points

6.1 Improving Browser Support

Some browsers reported errors, and then continued to optimize, improve browser support. (For example, fetch WASM error in Safari version).

6.2 Reducing the WASM File Size

There is still room for wASM volume reduction. (If enable-filters are configured in the build configuration, all filters are used).

6.3 Optimization of Reading Video Files

Because MEMFS is used by default, the entire video file is stored in memory and processed. Large video files, such as 800MB+ video files, will occupy nearly 3G memory when running tasks in Firefox 90, and the browser will crash.

const getVideoInfo = async (file) => {
  / /... Implement the fileToUint8Array method first
  const bufferArr = await fileToUint8Array(file);
  ffmpeg.FS('writeFile'.'example.mp4', bufferArr); // Save to MEMFS first
  await ffmpeg.run('-i'.'example.mp4'.'-loglevel'.'info');
}
Copy the code

The solution that comes to mind is to use WORKERFS. WORKERFS runs in the Web Worker and provides read-only access to File and Blob objects inside Woker without copying the entire data into memory, which fits our needs.

reference

  • Build FFmpeg WebAssembly version (= ffmpeg.wasm): Part.2 Compile with Emscripten
  • Front-end video frame extraction FFMPEG + Webassembly