0 background
In current service scenarios, a user uploads a video, and after the video is successfully uploaded, the background will run the frame capture service, and finally return a picture as a recommended cover for the user. In this scheme, the user needs to wait for the video to be uploaded, read the video in the background, and then run the frame cutting task, which takes a long time.
Therefore, consider the front end to do frame capture, and generate the recommended cover when uploading the video to improve the user experience.
1 Comparison of Schemes
1.1 canvas
Sectional frame
Use the
However,
1.2 Webassembly
Sectional frame
The use of powerful C/C++ written FFMPEG, through the Emscripten compiler packaged into wASM + JS form, and then use JS to achieve video frame capture function.
In terms of compatibility, Webassembly has been supported by all major browsers, but only some browsers still do not support it, and for those browsers that do not support it, the old scheme is used.
This scheme has been practiced in platform B and other platforms, and relevant implementation can be referred to. Finally, it was decided to use this scheme.
1.3 Webassembly
Truncated frame implementation comparison
1.3.1 ffmpeg. Wasm
At present, there is an open source library ffmpeg.wASM. The library includes:
- @ffmpeg/core: Compile FFMPEG to generate FFmPEG-core. wasm + JS glue code.
- @ffmpeg/ffmpeg: implements the part that calls the glue code generated in the previous step, provides load, RUN, etc apis. Also, if developers aren’t happy with @ffmPEG /core, they can build a custom ffmPEG-core.wasm.
So, can you just use it? There are these problems to be solved:
- Browser compatibility: As we know, browser JS threads are single-threaded and mutually exclusive with render threads. In order not to block the render and JS main thread of the page,
@ffmpeg/core
Configured when ffMPEG is compiledpthreads, resulting in the js glue used in the codesharedarraybuffer
.sharedarraybuffer
It can meet the data sharing between the main thread and workers, as well as the data sharing between multiple workers, which is ideal for this scenario.
However, due to security issues, all mainstream browsers are disabled by default, and some return header fields need to be configured, and the support is not ideal, can not reach the online standard.
- Wasm redundant:
@ffmpeg/core
compiledffmpeg-core.wasm
Includes almost all the features of FFMPEG,fileThe size is 24MB (8.5MB after Gzip), much of which is not needed for frame capture.
1.3.2 Implementation of other platforms
By customizing FFMPEG, the resulting WASM file size can be reduced to 4.7MB (or even smaller after gzip) depending on business requirements (few formats are supported).
However, they maintain a c language entry file, with FFmpeg provided by the internal library, to achieve the frame truncation function, and then compile FFmpeg.
This approach tests your understanding of FFmpeg and is tied to a specific version of FFmpeg, which may change apis and directories as FFmpeg versions upgrade. In addition, with the development of business, we may use more functions of FFMPEG, and we need to modify the C code, which has low maintainability.
1.4 summary
Therefore, the final solution is to use Webassembly frame capture to implement:
- Custom ffMPEG compilation, optimize wASM file size.
- Use the fftools/ffmpeg.c entry file provided with FFmpeg (V4.3.1) without having to write your own C code.
- Compile without strip
sharedarraybuffer
Ffmpeg-core. wasm+js, and finally use the Web worker to run the business code related to the intercept frame to prevent blocking the main thread. - Call compiler generated FFMPEG JS glue code, achieve frame truncation function, this part can be used
@ffmpeg/ffmpeg
.
2 Custom ffMPEG compilation
2.1 Run Docker using the official Emscripten environment
Emscripten is a WebAssembly compiler tool chain.
Download Docker Desktop and use the established Emscripten environment by running Docker to avoid the pit of local development environment. MAC Docker Desktop is always connected, measured Windows Ubuntu Docker command is more stable.
In ffmPEG source directory, write the following script to run docker:
#! /bin/bash
setEuo pipefail EM_VERSION=2.0.8 docker pull emscripten/emsdk:$EM_VERSION
docker run \
--rm \
-v $PWD:/src \ # bind mount
-v $PWD/wasm/cache:/emsdk_portable/.data/cache/wasm \
emscripten/emsdk:$EM_VERSION \
sh -c 'bash ./build.sh'
Copy the code
2.1.1 Understand the principle of Emscripten
Specifically, C/C++ and other languages, through clang front-end into LLVM intermediate code (IR), from LLVM IR to WASM. The browser then downloads WebAssembly, which passes through the WebAssembly module to the target machine’s assembly code, and then to the machine code (x86/ARM, etc.).
So what are LLVM and Clang?
- LLVM is the unified LLVM Intermediate Representation (LLVM IR) code used by different front and back ends.
- Clang is a subproject of LLVM, a C/C++/Objective-C compiler front end based on the LLVM architecture.
- Frontend: lexical analysis, syntax analysis, semantic analysis, generation of intermediate code
- Optimizer: intermediate code optimization (loop optimization, remove useless code, etc.)
- Backend: Generates object code. If the object code is absolute instruction code (machine code), this object code can be executed immediately. If the object code is assembly instruction code, the assembly must be assembled (generating machine code) before it can run.
Next, write the compiled script build.sh.
2.2 Configuring FFMPEG compilation Parameters to eliminate redundancy
Ffmpeg is an excellent C/C++ audio and video processing library, you can achieve video screenshots.
First, we need to know the libraries and components involved in implementing screenshots.
Libraries involved:
- Libavcodec: Encoding and decoding of audio and video.
- Libavformat: Encapsulates and unencapsulates audio and video files.
- Libavutil: a library of common utility functions, including arithmetic operations, character manipulation, etc.
- Libswscale: Image scaling and pixel format conversion.
Components involved:
- Demuxer: Unpack the video
- Decoder: Decodes video
- Encoder: Output image encoding after obtaining decoded frames
- Muxer: Image encapsulation
Use Emconfigure to set the appropriate environment parameters, and configure FFmpeg compilation parameters. Documentation on configuration:
- run
emconfigure ./configure --help
View all available configurations. - Detailed instructions on FFMPEG configuration can be viewed here.
# configure FFMpeg with Emscripten
emconfigure ./configure
--target-os=none # use none to prevent any os specific configurations
--arch=x86_32 # use x86_32 to achieve minimal architectural optimization
--enable-cross-compile # enable cross compile
--disable-x86asm # disable x86 asm
--disable-inline-asm # disable inline asm
--disable-stripping # disable stripping
--disable-programs # disable programs build (incl. ffplay, ffprobe & ffmpeg)
--disable-doc # disable doc
--nm="llvm-nm"
--ar=emar
--ranlib=emranlib
--cc=emcc
--cxx=em++
--objcc=emcc
--dep-cc=emcc
Remove unnecessary libraries
--disable-avdevice
--disable-swresample
--disable-postproc
--disable-network
--disable-pthreads
--disable-w32threads
--disable-os2threads
Configure the required decapsulation, codec, etc
--disable-everything The key to reducing wASM volume is to disable individual components except the following
--enable-filters
--enable-muxer=image2
--enable-demuxer=mov # mov,mp4,m4a,3gp,3g2,mj2
--enable-demuxer=flv
--enable-demuxer=h264
--enable-demuxer=asf
--enable-encoder=mjpeg
--enable-decoder=hevc
--enable-decoder=h264
--enable-decoder=mpeg4
--enable-protocol=file
# build dependencies
emmake make -j4
Copy the code
2.4 generated js + wasm
Use EMCC to compile the linked code generated by make in the previous step into JavaScript + WebAssembly. Here use fftools/ffmpeg.c as the entry file, do not need to maintain a C language entry file.
Emcc parameter options can be viewed with emCC –help, and CLang parameter options can be viewed with clang –help.
emcc
-I. -I./fftools # Add directory to include search path
-Llibavcodec -Llibavdevice -Llibavfilter -Llibavformat -Llibavresample -Llibavutil -Llibpostproc -Llibswscale -Llibswresample # Add directory to library search path
-Qunused-arguments # Don't emit warning for unused driver arguments.
-o wasm/dist/ffmpeg-core.js # output
fftools/ffmpeg_opt.c fftools/ffmpeg_filter.c fftools/ffmpeg_hw.c fftools/cmdutils.c fftools/ffmpeg.c # input
-lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lm # library
-s USE_SDL=2 # use SDL2
-s MODULARIZE=1 # use modularized version to be more flexible
-s EXPORT_NAME="createFFmpegCore" # assign export name for browser
-s EXPORTED_FUNCTIONS="[_main]" # export main and proxy_main funcs
-s EXTRA_EXPORTED_RUNTIME_METHODS="[FS, cwrap, ccall, setValue, writeAsciiToMemory]" # export extra runtime methods
-s INITIAL_MEMORY=33554432 # 33554432 bytes = 32MB
-s ALLOW_MEMORY_GROWTH=1 # allows the total amount of memory used to change depending on the demands of the application
-s ASSERTIONS=1 # for debug
--post-js wasm/post-js.js # emits a file after the emitted code. use to expose exit function
-O3 # optimize code and reduce code size
Copy the code
The final ffMPEG-core. wasm build is 5MB in size and will be smaller after gzip.
Source: build. Sh
At this point, ffMPEG compilation is complete! Now back to the familiar front end.
3 Achieve the frame capture function
3.1 Calling THE JS glue code
This part of calling JS glue code is already implemented in the open source library @ffmpeg/ffmpeg, and we can simply use its API.
const { createFFmpeg } = require('@ffmpeg/ffmpeg');
const ffmpeg = createFFmpeg({ log: true });
(async() = > {await ffmpeg.load();
/ /... The part about obtaining duration is omitted
const frameNum = 8;
const per = duration / (frameNum - 1);
for (let i = 0; i < frameNum; i++) {
await ffmpeg.run('-ss'.`The ${Math.floor(per * i)}`.'-i'.'example.mp4'.'-s'.'960x540'.'-f'.'image2'.'-frames'.'1'.`frame-${i + 1}.jpeg`);
}
})();
Copy the code
During this period, it was also found that -ss placed before -I could intercept frames at a specified time without waiting to be read frame by frame, which could improve the speed of screenshots. You can view related API documents.
P.S. @ffmpeg/ffmpeg does not currently support loading ffmPEG-core-wasm +js without pThreads.
3.1.1 JavaScript and C exchange data
So how do the load and run methods actually work? The first thing to know is that JavaScript can only use Number as an argument when exchanging data with C. Because JavaScript and C/C++ have completely different data systems from a language perspective, Number is the only intersection of the two, so essentially when they call each other, they are exchanging numbers.
Therefore, if the parameter is a string, array, or other non-number type, you need to split it into the following steps:
- use
Module._malloc()
inModule
Allocate memory in heap, get address PTR; - Copy string/array data to PTR in memory;
- Take PTR as an argument and call C/C++ functions for processing;
- use
Module._free()
The release of PTR.
@ffmpeg/ffmpeg
const createFFmpegCore = require('path/to/ffmpeg-core.js');
let ffmpeg;
/ / load
const load = async () => {
Core = await createFFmpegCore({
print: (message) = >{}}); ffmpeg = Core.cwrap('_main'.'number'['number'.'number']); // cwrap calls the exported main function
};
const parseArgs = (Core, args) = > {
const argsPtr = Core._malloc(args.length * Uint32Array.BYTES_PER_ELEMENT);
args.forEach((s, idx) = > {
const buf = Core._malloc(s.length + 1);
Core.writeAsciiToMemory(s, buf);
Core.setValue(argsPtr + (Uint32Array.BYTES_PER_ELEMENT * idx), buf, 'i32');
});
return [args.length, argsPtr]; // array length, array PTR]
};
// Run ffmpeg
const run = (. _args) = > {
return new Promise((resolve) = >{ ffmpeg(... parseArgs(Core, _args));// Pass command arguments
});
};
module.exports = {
load,
run,
};
Copy the code
4 web worker
Because -s USE_PTHREADS=1 was not configured in the build, the above method of calling FFmPEG blocks the MAIN JS thread and rendering of the page. For example, when the recommended cover is generated, the progress status of the uploaded video cannot be updated, and the user cannot respond when clicking other buttons on the page. Therefore, you need to add a Web worker to run it.
Web workers are scripts that run on a separate thread from the browser page thread and can be used to divert almost all heavy processing from the page thread. The main thread and the worker can communicate with onMessage events via the postMessage() method.
But writing the communication process using the postMessage() method and onMessage events makes the code cumbersome. Comlink (1.1kB) is recommended to make the code more friendly and the communication less perceptive.
For example, frame capture communication: main.js
import * as Comlink from 'Comlink';
async function onFileUpload(file) {
const ffmpegWorker = Comlink.wrap(new Worker('./worker.js'));
const frameU8Arrs = await ffmpegWorker.getFrames(file);
}
Copy the code
worker.js
import * as Comlink from 'Comlink';
async function getFrames(file) {
// ...
Duration, etc
const frameNum = 8;
const per = duration / (frameNum - 1);
let frameU8Arrs = [];
for (let i = 0; i < frameNum; i++) {
await ffmpeg.run('-ss'.`The ${Math.floor(per * i)}`.'-i'.'example.mp4'.'-s'.'960x540'.'-f'.'image2'.'-frames'.'1'.`frame-${i + 1}.jpeg`);
}
// Get image binary data from MEMFS Uint8Array
for (let i = 0; i < frameNum; i++) {
const u8arr = await ffmpeg.FS('readFile'.`frame-${i + 1}.jpeg`);
frameU8Arrs.push(u8arr);
ffmpeg.FS('unlink', fileName);
}
return frameU8Arrs;
}
Comlink.expose({
getFrames,
});
Copy the code
Comlink is an RPC implementation based on Es6 Proxy and postMessage(). In the example, ffmpegWorker is an object located in worker.js, and what is obtained in main.js is only the handle of ffmpegWorker’s ontology. In fact, ffmpegWorker.getFrames and other methods are also executed on worker.js.
The only pitfall is that the output of this library is ES6 code, which needs to be converted to ES5 code through a build configuration.
4.1 webpack configuration
Also, if you use Webpack, you may have problems loading the correct worker.js path. The worker-plugin can be configured like this.
const WorkerPlugin = require('worker-plugin');
const isPub = true; // Whether the production environment
{
// ...
plugins: [
new WorkerPlugin({
globalObject: 'this'.filename: isPub ? '[name].[chunkhash:9].worker.js' : '[name].worker.js',})],}Copy the code
5 Online Effect
For browsers that support this solution, users can select and edit video covers without waiting for videos to be uploaded.
It also takes much less time to capture a video in the front than in the background. This is more obvious in the larger the video size.
6. Follow-up optimization points
6.1 Improving Browser Support
Some browsers reported errors, and then continued to optimize, improve browser support. (For example, fetch WASM error in Safari version).
6.2 Reducing the WASM File Size
There is still room for wASM volume reduction. (If enable-filters are configured in the build configuration, all filters are used).
6.3 Optimization of Reading Video Files
Because MEMFS is used by default, the entire video file is stored in memory and processed. Large video files, such as 800MB+ video files, will occupy nearly 3G memory when running tasks in Firefox 90, and the browser will crash.
const getVideoInfo = async (file) => {
/ /... Implement the fileToUint8Array method first
const bufferArr = await fileToUint8Array(file);
ffmpeg.FS('writeFile'.'example.mp4', bufferArr); // Save to MEMFS first
await ffmpeg.run('-i'.'example.mp4'.'-loglevel'.'info');
}
Copy the code
The solution that comes to mind is to use WORKERFS. WORKERFS runs in the Web Worker and provides read-only access to File and Blob objects inside Woker without copying the entire data into memory, which fits our needs.
reference
- Build FFmpeg WebAssembly version (= ffmpeg.wasm): Part.2 Compile with Emscripten
- Front-end video frame extraction FFMPEG + Webassembly