An overview of the

In this paper, FFmpeg video decoding as the theme, mainly introduces the main process of FFmpeg video decoding, the basic principle; Secondly, the article also describes the simple application related to FFmpeg video decoding, including how to play video according to a certain time sequence on the basis of the original FFmpeg video decoding, how to join the logic of SEEK when playing video; In addition, the article focuses on the details that may be easily omitted when decoding video, and finally simply describes how to package a basic video decoding function VideoDecoder.

preface

FFmpeg

FFmpeg is a set of open source computer programs that can be used to record, convert digital audio and video, and convert them into streams. It can generate libraries for processing and manipulating multimedia data, including libavcodec and LibavFormat, which are advanced audio and video decoding libraries.

FFmpeg six common function modules

  • Libavformat: a library that encapsulates and unencapsulates multimedia files and protocols, such as MP4 and FLV, and network protocols such as RTMP and RTSP.
  • Libavcodec: audio and video decoding core library;
  • Libavfilter: Library for audio, video, and subtitles;
  • Libswscale: image format conversion library;
  • Libswresample: Audio resampling library;
  • Libavutil: tool library

Basic introduction to video decoding

  1. Demultiplexing (Demux) : Demultiplexing is also called decapsulation. There is a concept called package format, package format refers to the combination of audio and video formats, common mp4, FLV, MKV and so on. Generally speaking, encapsulation is the product of combining audio stream, video stream, subtitle stream and other accessories into a package according to certain rules. Unpacking plays the opposite role to packaging, disassembling a streaming media file into audio data and video data. In this case, the split data is compressed and encoded. The common video compression data format is H264.

  1. Decode: In simple terms, it is to decompress the compressed encoded data into the original video pixel data. The commonly used format of the original video pixel data is YUV.

  1. Color Space Convert: Usually for image display, it displays the image through RGB model, but using YUV model can save bandwidth when transmitting image data. Therefore, when displaying images, it is necessary to convert yuV pixel format data into RGB pixel format before rendering.
  2. Render: send the data of each video frame that has been decoded and converted into color space to the graphics card for rendering on the screen.

First, the preparation before the introduction of FFmpeg

1.1 FFmpeg SO library compilation

  • Download and unpack the source library from FFmpeg’s official website.
  • Download the NDK library and decompress it.
  • Configure configure in the FFmpeg source library directory after decompression, modify the highlighted part of several parameters to the following content, the main purpose is to generate Android can use the name – version.
# # · · · · · · build Settings SHFLAGS = '- Shared - Wl - soname, $$(@ F)' LIBPREF = "lib" LIBSUF = "a" FULLNAME = '$(NAME) $(BUILDSUF)' LIBNAME='$(LIBPREF)$(FULLNAME)$(LIBSUF)' SLIBPREF="lib" SLIBSUF=".so" SLIBNAME='$(SLIBPREF)$(FULLNAME)$(SLIBSUF)' SLIBNAME_WITH_VERSION='$(SLIBNAME).$(LIBVERSION)' # Changed configuration SLIBNAME_WITH_MAJOR='$(SLIBNAME)$(FULLNAME)-$(LIBMAJOR)$(SLIBSUF)' LIB_INSTALL_EXTRA_CMD='$$(RANLIB)"$(LIBDIR)/$(LIBNAME)"' SLIB_INSTALL_NAME='$(SLIBNAME_WITH_MAJOR)' SLIB_INSTALL_LINKS = '$(SLIBNAME) # · · · · · ·Copy the code
  • Create a new script file in the FFmpeg source library directorybuild_android_arm_v8a.sh, configure the path of the NDK in the file, and enter the following other contents;
# # to empty the last compilation of the make the clean here first configure your path of the NDK export the NDK = / Users/bytedance/Library/Android/SDK/the NDK / 21.4.7075529 TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/darwin-x86_64 function build_android { ./configure \ --prefix=$PREFIX \ --disable-postproc \ --disable-debug \ --disable-doc \ --enable-FFmpeg \ --disable-doc \ --disable-symver \ --disable-static \ --enable-shared \ --cross-prefix=$CROSS_PREFIX \ --target-os=android \ --arch=$ARCH \ --cpu=$CPU \ --cc=$CC \ --cxx=$CXX \ --enable-cross-compile \ --sysroot=$SYSROOT \ --extra-cflags="-Os -fpic $OPTIMIZE_CFLAGS" \ --extra-ldflags="$ADDI_LDFLAGS" make clean make -j16 make install echo "============================ build android arm64-v8a success ==========================" } # arm64-v8a ARCH=arm64 CPU=armv8-a API=21 CC=$TOOLCHAIN/bin/aarch64-linux-android$API-clang CXX=$TOOLCHAIN/bin/aarch64-linux-android$API-clang++ SYSROOT=$NDK/toolchains/llvm/prebuilt/darwin-x86_64/sysroot CROSS_PREFIX=$TOOLCHAIN/bin/aarch64-linux-android- PREFIX=$(pwd)/android/$CPU OPTIMIZE_CFLAGS="-march=$CPU" echo $CC build_androidCopy the code
  • Set permissions for all files in the NDK folderchmod 777 -R NDK;
  • Terminal execution script./build_android_arm_v8a.shStart compiling FFmpeg. The compiled files will be in FFmpegandroidIn the directory, multiple. So files appear;

  • To compile ARM-V7a, just copy and modify the above script as followsbuild_android_arm_v7a.shThe content of the.
#armv7-a
ARCH=arm
CPU=armv7-a
API=21
CC=$TOOLCHAIN/bin/armv7a-linux-androideabi$API-clang
CXX=$TOOLCHAIN/bin/armv7a-linux-androideabi$API-clang++
SYSROOT=$NDK/toolchains/llvm/prebuilt/darwin-x86_64/sysroot
CROSS_PREFIX=$TOOLCHAIN/bin/arm-linux-androideabi-
PREFIX=$(pwd)/android/$CPU
OPTIMIZE_CFLAGS="-mfloat-abi=softfp -mfpu=vfp -marm -march=$CPU "
Copy the code

1.2 Introduce FFmpeg’s so library in Android

  • NDK environment, CMake build tool, LLDB (C/C++ code debugging tool);
  • When creating a C++ module, the following important files are generated:CMakeLists.txt,native-lib.cpp,MainActivity;
  • inapp/src/main/Under directory, create a directory and name itjniLibs, this is the default directory where Android Studio places the so dynamic library; Then, injniLibsCreate a directoryarm64-v8aDirectory, and then paste the compiled.so file into this directory; Then paste the.h header generated at compile time (the interface exposed by FFmpeg) intocppIn the directoryincludeIn the. The above.so dynamic library directory and.h header directory will be thereCMakeLists.txtExplicitly declare and link in;
  • The top of theMainActivityIn which to load C/C++ code compiled libraries:native-lib.native-lib 在 CMakeLists.txtIs added to the library named “ffmpeg”, so inSystem.loadLibrary()The input is “ffmpeg”;
class MainActivity : AppCompatActivity() { override fun onCreate(savedInstanceState: Bundle?) { super.onCreate(savedInstanceState) setContentView(R.layout.activity_main) // Example of a call to a native method Sample_text.text = stringFromJNI()} // Declare an externally referenced method corresponding to the C/C++ layer code. external fun stringFromJNI(): String Companion Object {// Load C/C++ compiled library in init{} : Init {system.loadLibrary ("ffmpeg")}}}Copy the code
  • native-lib.cppExternal is a C++ interface file where external methods declared in the Java layer are implemented;
#include <jni.h>
#include <string>
extern "C" JNIEXPORT jstring JNICALL
Java_com_bytedance_example_MainActivity_stringFromJNI(
        JNIEnv *env,
        jobject /* this */) {
    std::string hello = "Hello from C++";
    return env->NewStringUTF(hello.c_str());
}
Copy the code
  • CMakeLists.txtIs a build script intended for the configuration to compilenative-libThe construction information of this SO library;
# For more information about using CMake with Android Studio, read the # documentation: https://d.android.com/studio/projects/add-native-code.html # Sets the minimum version of CMake required to build the Native Library. Cmake_minimum_required (VERSION 3.10.2) # Declares the project. Project (" FFmPEG ") # Creates and  names a library, sets it as either STATIC # or SHARED, and provides the relative paths to its source code. # You can define multiple libraries, And CMake builds them for you. # Gradle Automatically packages shared libraries with your APK. Set (FFmpeg_lib_dir ${CMAKE_SOURCE_DIR}/.. /${ANDROID_ABI}) set(FFmpeg_head_dir ${CMAKE_SOURCE_DIR}/FFmpeg)  add_library( # Sets the name of the library. ffmmpeg # Sets the library as a shared library. SHARED # Provides a relative path to your source file(s). native-lib.cpp ) # Searches for a specified prebuilt library and stores the path as a # variable. Because CMake includes system libraries in the search path by # default, you only need to specify the name of the public NDK library # you want to add. CMake verifies that the library exists Add_library (Avutil SHARED IMPORTED) set_target_properties(avutil PROPERTIES IMPORTED_LOCATION ${FFmpeg_lib_dir}/libavutil.so ) add_library( swresample SHARED IMPORTED ) set_target_properties( swresample PROPERTIES IMPORTED_LOCATION ${FFmpeg_lib_dir}/libswresample.so ) add_library( avcodec  SHARED IMPORTED ) set_target_properties( avcodec PROPERTIES IMPORTED_LOCATION ${FFmpeg_lib_dir}/libavcodec.so ) find_library( # Sets the name of the path variable. log-lib # Specifies the name of the NDK library that # you want CMake to locate. log) # Specifies libraries CMake should link to your target library. You # can link multiple libraries,  such as libraries you define in this # build script, prebuilt third-party libraries, Or system libraries. target_link_libraries(# Specifies the target libraries. audioffmmpeg # link the ffmpeg. so libraries added earlier to the target library Native lib on Avutil swresample avcodec-landroid # Links the target library to the log library # included in the NDK. ${log-lib})Copy the code
  • This brings FFmpeg into the Android project.

Second, THE principle and details of FFmpeg decoding video

2.1 Main Process

2.2 Basic Principles

2.2.1 Common FFMPEG interface

// 1 allocate AVFormatContext avformat_alloc_context(); // 2 Open avformat_open_input(AVFormatContext **ps, const char *url, const AVInputFormat * FMT, AVDictionary **options); Avformat_find_stream_info (AVFormatContext * IC, AVDictionary **options); // 4 Avcodec_alloc_context3 (const AVCodec *codec); // 5 Populates the codec context avCODEC_parameterS_to_context (AVCodecContext *codec, const AVCodecParameters * PAR) based on the codec parameters associated with the data stream; Avcodec_find_decoder (enum AVCodecID ID); // open the codec avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options); Av_read_frame (AVFormatContext *s, AVPacket * PKT); Avcodec_send_packet (AVCodecContext *avctx, const AVPacket *avpkt); Avcodec_receive_frame (AVCodecContext * AVCTx, AVFrame *frame);Copy the code

2.2.2 Overall idea of video decoding

  • You have to register firstlibavformatAnd register all codecs, reuse/dereuse groups, protocols, etc. It is the first function to be called in any FFMPEg-based application, and only by calling this function can the functions of FFmpeg be used properly. In addition, in the latest version of FFmpeg, this line of code can now be omitted;
av_register_all();
Copy the code
  • Open the video file and extract the data flow information in the file;
auto av_format_context = avformat_alloc_context();
avformat_open_input(&av_format_context, path_.c_str(), nullptr, nullptr);
avformat_find_stream_info(av_format_context, nullptr);
Copy the code
  • Then obtain the subscript of the video media stream to find the video media stream in the file;
int video_stream_index = -1; for (int i = 0; i < av_format_context->nb_streams; I++) {// match the subscript of the video media stream,  if (av_format_context->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) { video_stream_index = i; LOGD(TAG, "find video stream index = %d", video_stream_index); break; }}Copy the code
  • Get the video media stream, get the decoder context, get the decoder context, configure the parameter value of the decoder context, open the decoder;
Auto stream = av_format_context->streams[video_stream_index]; Auto codec = avCodec_find_decoder (stream-> codecPAR ->codec_id); AVCodecContext* codec_ctx = AVCoDEC_alloc_context3 (codec); Auto RET = AVCODEC_parameterS_to_context (coDEC_ctx, stream-> codecPAR); If (ret >= 0) {// Open the decoder avcodec_open2(codec_ctx, COdec, nullptr); / /......}Copy the code
  • By specifying pixel format, image width, image height to calculate the memory size required by the buffer, allocate the setting buffer; And since we’re drawing it on top, we need to useANativeWindow, the use ofANativeWindow_setBuffersGeometrySets the properties of this draw window;
video_width_ = codec_ctx->width; video_height_ = codec_ctx->height; int buffer_size = av_image_get_buffer_size(AV_PIX_FMT_RGBA, video_width_, video_height_, 1); // Out_buffer_ = (uint8_t*) av_malloc(buffer_size * sizeof(uint8_t)); // Limit the number of pixels in the buffer by setting the width and height, not the size of the display screen. // If the buffer does not match the display screen size, the actual display may be stretched, Int result = ANativeWindow_setBuffersGeometry(Native_WINDOW_, video_width_, video_height_, WINDOW_FORMAT_RGBA_8888);Copy the code
  • Allocate memory space to RGBA pixel formatAVFrame, used to store the frame data converted to RGBA; Set up thergba_frameBuffer to make it without_buffer_Associated;
auto rgba_frame = av_frame_alloc();
av_image_fill_arrays(rgba_frame->data, rgba_frame->linesize,
                     out_buffer_,
                     AV_PIX_FMT_RGBA,
                     video_width_, video_height_, 1);
Copy the code
  • To obtainSwsContext, it’s callingsws_scale()It is used for image format conversion and image scaling. YUV420P may be called when converted to RGBAsws_scaleFormat conversion failed and cannot return the correct height value due to the same reason as the callsws_getContext 时 flagsAbout, need to beSWS_BICUBICSwitch toSWS_FULL_CHR_H_INT | SWS_ACCURATE_RND;
struct SwsContext* data_convert_context = sws_getContext(
                    video_width_, video_height_, codec_ctx->pix_fmt,
                    video_width_, video_height_, AV_PIX_FMT_RGBA,
                    SWS_BICUBIC, nullptr, nullptr, nullptr);
Copy the code
  • Allocate memory space for storing raw dataAVFrame, pointing to the original frame data; And allocate memory space to store the data before decoding the videoAVPacket;
auto frame = av_frame_alloc();
auto packet = av_packet_alloc();
Copy the code
  • Read the compressed frame data from the video code stream, and then start decoding;
ret = av_read_frame(av_format_context, packet);
if (packet->size) {
    Decode(codec_ctx, packet, frame, stream, lock, data_convert_context, rgba_frame);
}
Copy the code
  • inDecode()The function will contain native compressed datapacketAs input to the decoder;
/* send the packet with the compressed data to the decoder */
ret = avcodec_send_packet(codec_ctx, pkt);
Copy the code
  • The decoder returns decoded frame data to the specifiedframeOn, subsequent can have decodedframe 的 ptsConverted to time stamp, according to the display sequence of the time axis frame by frame drawn to the playing picture;
while (ret >= 0 && ! Is_stop_) {// Return decoded data to frame RET = AVCODEC_receive_frame (codec_ctx, frame); if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) { return; } else if (ret < 0) { return; Auto decode_time_ms = frame-> PTS * 1000 / stream->time_base.den; if (decode_time_ms >= time_ms_) { last_decode_time_ms_ = decode_time_ms; is_seeking_ = false; / /... / / picture data format conversion / /... / / draw the transformed data to the screen} av_packet_unref (PKT); }Copy the code
  • Before drawing the picture, we need to convert the image data format, which is used hereSwsContext;
Int result = sws_scale(sws_context, (const uint8_t* const*) frame->data, frame->linesize, 0, video_height_, rgba_frame->data, rgba_frame->linesize); if (result <= 0) { LOGE(TAG, "Player Error : data convert fail"); return; }Copy the code
  • I’m using it because I’m drawing it on topANativeWindow 和 ANativeWindow_Buffer. Before drawing the screen, you need to use the next drawing of the lock windowsurfaceTo draw, then write the frame data to be displayed to a buffer, and finally unlock the window’s drawingsurfaceTo publish the buffer data to the screen display;
// Result = ANativeWindow_lock(native_window_, &window_buffer_, nullptr); if (result < 0) { LOGE(TAG, "Player Error : Can not lock native window"); } else {// Draw the image to the interface // Note: Auto bits = (Uint8_t *) Window_buffer_. Bits; for (int h = 0; h < video_height_; h++) { memcpy(bits + h * window_buffer_.stride * 4, out_buffer_ + h * rgba_frame->linesize[0], rgba_frame->linesize[0]);  } ANativeWindow_unlockAndPost(native_window_); }Copy the code
  • So that’s the main decoding process. In addition, since C++ needs to release resources and memory space by itself, it needs to call the freed interface to release resources after decoding to avoid memory leaks.
sws_freeContext(data_convert_context);
av_free(out_buffer_);
av_frame_free(&rgba_frame);
av_frame_free(&frame);
av_packet_free(&packet);

avcodec_close(codec_ctx);
avcodec_free_context(&codec_ctx);

avformat_close_input(&av_format_context);
avformat_free_context(av_format_context);
ANativeWindow_release(native_window_);
Copy the code

2.3 Simple Applications

In order to better understand the process of video decoding, here is encapsulated a VideoDecoder VideoDecoder, the decoder will initially have the following functions:

VideoDecoder(const char* path, std::function<void(long timestamp)> on_decode_frame);

void Prepare(ANativeWindow* window);

bool DecodeFrame(long time_ms);

void Release();
Copy the code

In the video decoder, entering the specified timestamp returns the decoded frame of data. One of the most important is the DecodeFrame(long time_MS) function, which can be called by the user, passing in the specified frame timestamp, and then decoding the corresponding frame data. In addition, synchronization locks can be added to enable decoding threads and use thread separation.

2.3.1 Adding a Synchronization lock for Video playback

If the video is decoded, there is no need to use synchronous wait.

However, if the video is to be played, the lock should be used for synchronous waiting after every frame is drawn by decoding. This is because the decoding and drawing need to be separated when the video is played, and decoding and drawing should be carried out in a certain time axis order and speed.

condition_.wait(lock);
Copy the code

The synchronization lock is awakened when the upper layer calls the DecodeFrame function passing in the decoded timestamp, allowing the decoded drawing loop to continue.

Bool VideoDecoder: : DecodeFrame (long time_ms) {/ / · · · · · · time_ms_ = time_ms; condition_.notify_all(); return true; }Copy the code

2.3.2 Adding seek_frame during Playback

In normal playback, the video is played frame by frame decoding; However, in the case of dragging the progress bar to the specified SEEK point, it may not be too efficient to decode the seek point frame by frame from beginning to end. At this time, you need to check the time stamp of seek points within certain rules, and meet the conditions of direct seek to the specified time stamp.

In the FFmpeg av_seek_frame
  • av_seek_frameYou can locate key frames and non-key frames, depending on the selectionflagValue. Because the decoding of the video depends on the key frame, we generally need to locate the key frame;
int av_seek_frame(AVFormatContext *s, int stream_index, int64_t timestamp,
                  int flags);
Copy the code
  • av_seek_frameIn theflagIs used to specify the position relationship between the sought I frame and the passed timestamp. When seeking the past timestamp, the timestamp may not be exactly at the position of I frame, but because decoding depends on I frame, it is necessary to find an I frame near this timestamp firstflagIt indicates whether to seek the previous I frame or the next I frame of the current timestamp;
  • flagThere are four options:
Flag options describe
AVSEEK_FLAG_BACKWARD The first Flag is the most recent keyframe before seek reaches the requested timestamp. In general, SEEK is in ms, and if the specified MS timestamp is not a keyframe (high chance), seek will automatically backtrack to the nearest keyframe. Although this flag positioning is not very accurate, it can better deal with the problem of removing the Mosaic, because the way with BACKWARD will look back to the key frame and locate the key frame.
AVSEEK_FLAG_BYTE The second Flag is the position in the file to seek (in bytes), exactly the same as AVSEEK_FLAG_FRAME, but with a different search algorithm.
AVSEEK_FLAG_ANY The third Flag is that you can seek to any frame, not necessarily the key frame, so the use of screen may appear (Mosaic), but the progress and hand slide completely consistent.
AVSEEK_FLAG_FRAME The fourth Flag is the frame serial number corresponding to the time stamp of seek, which can be understood as finding the nearest key frame BACKWARD, and the direction is opposite to that of BACKWARD.
  • flagIt may contain more than one of the above values simultaneously. Such asAVSEEK_FLAG_BACKWARD | AVSEEK_FLAG_BYTE;
  • FRAME 和 BACKWARDIs according to the interval between frames to calculate the target position of seek, suitable for fast forward and fast back;BYTEIs suitable for large sliding.
The seek of the scene
  • If the timestamp passed in when decoding is forward, and there is a certain distance beyond the previous frame timestamp, seek is needed. The “certain distance” here is estimated through many experiments, not all of them are 1000ms as used in the following code.
  • If it is in the backward direction and less than the last decoding timestamp, but the distance from the last decoding timestamp is large (such as more than 50ms), seek to the last key frame;
  • Using a bool variableis_seeking_To prevent other operations that interfere with SEEKING, the purpose is to control that only one SEEK operation is currently in progress.
if (! is_seeking_ && (time_ms_ > last_decode_time_ms_ + 1000 || time_ms_ < last_decode_time_ms_ - 50)) { is_seeking_ = true; Times_ms LOGD(TAG, "seek frame time_ms_ = %ld,  last_decode_time_ms_ = %ld", time_ms_, last_decode_time_ms_); av_seek_frame(av_format_context, video_stream_index, time_ms_ * stream->time_base.den / 1000, AVSEEK_FLAG_BACKWARD); }Copy the code
Insert seek’s logic

Insert the logic of seek before the av_read_frame function (which returns a frame of video media). If the condition of SEEK is met, use av_seek_frame to reach the specified I frame. Av_read_frame is then decoded to the destination timestamp.

Int ret = av_read_frame(av_format_context, packet);Copy the code

2.4 Details in decoding process

2.4.1 Conditions of seek when DecodeFrame

When using the av_seek_frame function, you need to specify the correct flag and specify the conditions for the seek operation, otherwise the video may appear Mosaic.

if (! is_seeking_ && (time_ms_ > last_decode_time_ms_ + 1000 || time_ms_ < last_decode_time_ms_ - 50)) { is_seeking_ = true; Av_seek_frame (...,...,..., AVSEEK_FLAG_BACKWARD); }Copy the code

2.4.2 Reduce the number of decoding

In video decoding, it is possible to decode the frame data without the time stamp in some conditions. Such as:

  1. If the current decoding timestamp is in the same direction as the previous decoding timestamp or the current decoding timestamp, no decoding is required;
  2. If the current decoding timestamp is not larger than the last decoding timestamp and the distance between the current decoding timestamp and the last decoding timestamp is small (for example, the distance between the current decoding timestamp and the last decoding timestamp is not more than 50ms), no decoding is required.
bool VideoDecoder::DecodeFrame(long time_ms) {
    LOGD(TAG, "DecodeFrame time_ms = %ld", time_ms);
    if (last_decode_time_ms_ == time_ms || time_ms_ == time_ms) {
        LOGD(TAG, "DecodeFrame last_decode_time_ms_ == time_ms");
        return false;
    }
    if (time_ms <= last_decode_time_ms_ &&
        time_ms + 50 >= last_decode_time_ms_) {
        return false;
    }
    time_ms_ = time_ms;
    condition_.notify_all();
    return true;
}
Copy the code

With the constraints of these conditions, it will reduce some unnecessary decoding operations.

2.4.3 Using PTS of AVFrame

  1. AVPacketStore the data before decoding (encoded data :H264/AAC, etc.), save the data after decoding and before decoding, it is still compressed data;AVFrameStore decoded data (pixel data :YUV/RGB/PCM, etc.);
  2. AVPacket 的 pts 和 AVFrame 的 ptsMeaning varies. The former indicates when the decompression package is displayed, and the latter indicates when the frame data is displayed.
** * Presentation timestamp in AVStream->time_base units; the time at which * the decompressed packet will be presented to the user. * Can be AV_NOPTS_VALUE if it is not stored in the file. * pts MUST be larger or equal to dts as presentation cannot happen before * decompression, unless one wants to view hex dumps. Some formats misuse * the terms dts and pts/cts to mean something different. Such timestamps * must be converted to true pts/dts before they are stored in AVPacket. */ int64_t pts; // AVFrame PTS /** * Presentation timestamp in time_base units (time when frame should be shown to user). */ int64_t pts;Copy the code
  1. Whether or not the currently decoded frame data is drawn on the screen depends on the result of the comparison of the time stamp passed in to the decoded frame with the time stamp returned by the current decoder. Not available hereAVPacket 的 pts, it is most likely not an incremental timestamp;
  2. The premise of rendering is when the specified decoding timestamp is not greater than the PTS converted timestamp of the currently decoded frame.
auto decode_time_ms = frame->pts * 1000 / stream->time_base.den; LOGD(TAG, "decode_time_ms = %ld", decode_time_ms); if (decode_time_ms >= time_ms_) { last_decode_time_ms_ = decode_time_ms; is_seeking = false; // Picture drawing // ···}Copy the code

2.4.4 There is no data in the video when decoding the last frame

Use av_read_frame(AV_format_context, packet) to return a frame of video media streaming to AVPacket. The function returns Success if the int value is 0, or Error or EOF if it is less than 0.

So if a value less than 0 is returned during video playback, the avcodec_flush_buffers function is called to reset the state of the decoder, flush the contents of the buffers, and then seek to the current passed timestamp, complete the decoded callback, and let the synchronization lock wait.

Ret = av_read_frame(AV_format_context, packet); ret = av_read_frame(av_format_context, packet); if (ret < 0) { avcodec_flush_buffers(codec_ctx); av_seek_frame(av_format_context, video_stream_index, time_ms_ * stream->time_base.den / 1000, AVSEEK_FLAG_BACKWARD); LOGD(TAG, "ret < 0, condition_.wait(lock)"); On_decode_frame_ (last_decode_time_MS_); condition_.wait(lock); }Copy the code

2.5 Upper encapsulation decoder VideoDecoder

If you want to package a VideoDecoder in the upper layer, you just need to expose the C++ layer VideoDecoder interface in native-lib. CPP, and then the upper layer through JNI call C++ interface.

For example, when the upper layer wants to pass in the specified decoding timestamp for decoding, write a deocodeFrame method, and then pass the timestamp to the C++ layer nativeDecodeFrame for decoding. The implementation of nativeDecodeFrame method is written in native-lib.cpp.

// FFmpegVideoDecoder.kt class FFmpegVideoDecoder( path: String, val onDecodeFrame: (timestamp: Long, texture: SurfaceTexture, needRender: Boolean) -> Unit){SurfaceTexture, needRender: Boolean) {timeMs: Long, sync: Boolean = false) {if (nativeDecodeFrame(decoderPtr, decoderPtr, TimeMS) &&sync) {// ······ ··} else {// ······ ···}} Private External Fun nativeDecodeFrame(decoder: Long, timeMS: Long): Boolean companion object { const val TAG = "FFmpegVideoDecoder" init { System.loadLibrary("ffmmpeg") } } }Copy the code

Then in native-lib. CPP call C++ layer VideoDecoder interface DecodeFrame, so through JNI to establish the connection between the upper layer and C++ bottom layer

// native-lib.cpp
extern "C"
JNIEXPORT jboolean JNICALL
Java_com_example_decoder_video_FFmpegVideoDecoder_nativeDecodeFrame(JNIEnv* env,
                                                               jobject thiz,
                                                               jlong decoder,
                                                               jlong time_ms) {
    auto videoDecoder = (codec::VideoDecoder*)decoder;
    return videoDecoder->DecodeFrame(time_ms);
}
Copy the code

Three, heart

Technical experience

  • FFmpeg is compiled and combined with Android to realize video decoding playback, with high convenience.
  • Because it is to use C++ layer to achieve specific decoding process, there will be learning difficulties, it is better to have a certain C++ foundation.

Fourth, the appendix

C++ encapsulated VideoDecoder

  • VideoDecoder.h
#include <jni.h> #include <mutex> #include <android/native_window.h> #include <android/native_window_jni.h> #include <time.h> extern "C" { #include <libavformat/avformat.h> #include <libavcodec/avcodec.h> #include <libswresample/swresample.h> #include <libswscale/swscale.h> } #include <string> /* * VideoDecoder Can be used to decode the data of the video media stream in an audio and video file (e.g..mp4). * Java layer after passing in the specified file path, can pass in the specified timestamp according to a certain FPS cycle for decoding (frame extraction), this implementation is provided by C++ DecodeFrame to complete. * At the end of each decoding, the timestamp of decoding a frame is called back to the upper decoder for use by other operations. */ namespace codec { class VideoDecoder { private: std::string path_; long time_ms_ = -1; long last_decode_time_ms_ = -1; bool is_seeking_ = false; ANativeWindow* native_window_ = nullptr; ANativeWindow_Buffer window_buffer_{}; Int video_width_ = 0; int video_height_ = 0; uint8_t* out_buffer_ = nullptr; // on_decode_frame is used to call back the timestamp of extracting the specified frame to the upper decoder for other operations. std::function<void(long timestamp)> on_decode_frame_ = nullptr; bool is_stop_ = false; // use STD ::mutex work_queue_mtx with the lock "STD ::unique_lock< STD ::mutex>" for loop synchronization; STD ::condition_variable condition_; STD ::condition_variable condition_; Void Decode(AVCodecContext* codec_ctx, AVPacket* PKT, AVFrame* frame, AVStream* stream, std::unique_lock<std::mutex>& lock, SwsContext* sws_context, AVFrame* pFrame); Public: // To create a new decoder, pass in the media file path and a decoded callback on_decode_frame. VideoDecoder(const char* path, std::function<void(long timestamp)> on_decode_frame); Void Prepare(ANativeWindow* window); void Prepare(ANativeWindow* window); Bool DecodeFrame(long time_ms); bool DecodeFrame(long time_ms); // Release the decoder resource void Release(); Static int64_t GetCurrentMilliTime(void); }; }Copy the code
  • VideoDecoder.cpp
#include "VideoDecoder.h"
#include "../log/Logger.h"
#include <thread>
#include <utility>

extern "C" {
#include <libavutil/imgutils.h>
}

#define TAG "VideoDecoder"
namespace codec {

VideoDecoder::VideoDecoder(const char* path, std::function<void(long timestamp)> on_decode_frame)
        : on_decode_frame_(std::move(on_decode_frame)) {
    path_ = std::string(path);
}

void VideoDecoder::Decode(AVCodecContext* codec_ctx, AVPacket* pkt, AVFrame* frame, AVStream* stream,
                     std::unique_lock<std::mutex>& lock, SwsContext* sws_context,
                     AVFrame* rgba_frame) {

    int ret;
    /* send the packet with the compressed data to the decoder */
    ret = avcodec_send_packet(codec_ctx, pkt);
    if (ret == AVERROR(EAGAIN)) {
        LOGE(TAG,
             "Decode: Receive_frame and send_packet both returned EAGAIN, which is an API violation.");
    } else if (ret < 0) {
        return;
    }

    // read all the output frames (infile general there may be any number of them
    while (ret >= 0 && !is_stop_) {
        // 对于frame, avcodec_receive_frame内部每次都先调用
        ret = avcodec_receive_frame(codec_ctx, frame);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
            return;
        } else if (ret < 0) {
            return;
        }
        int64_t startTime = GetCurrentMilliTime();
        LOGD(TAG, "decodeStartTime: %ld", startTime);
        // 换算当前解码的frame时间戳
        auto decode_time_ms = frame->pts * 1000 / stream->time_base.den;
        LOGD(TAG, "decode_time_ms = %ld", decode_time_ms);
        if (decode_time_ms >= time_ms_) {
            LOGD(TAG, "decode decode_time_ms = %ld, time_ms_ = %ld", decode_time_ms, time_ms_);
            last_decode_time_ms_ = decode_time_ms;
            is_seeking_ = false;

            // 数据格式转换
            int result = sws_scale(
                    sws_context,
                    (const uint8_t* const*) frame->data, frame->linesize,
                    0, video_height_,
                    rgba_frame->data, rgba_frame->linesize);

            if (result <= 0) {
                LOGE(TAG, "Player Error : data convert fail");
                return;
            }

            // 播放
            result = ANativeWindow_lock(native_window_, &window_buffer_, nullptr);
            if (result < 0) {
                LOGE(TAG, "Player Error : Can not lock native window");
            } else {
                // 将图像绘制到界面上
                auto bits = (uint8_t*) window_buffer_.bits;
                for (int h = 0; h < video_height_; h++) {
                    memcpy(bits + h * window_buffer_.stride * 4,
                           out_buffer_ + h * rgba_frame->linesize[0],
                           rgba_frame->linesize[0]);
                }
                ANativeWindow_unlockAndPost(native_window_);
            }
            on_decode_frame_(decode_time_ms);
            int64_t endTime = GetCurrentMilliTime();
            LOGD(TAG, "decodeEndTime - decodeStartTime: %ld", endTime - startTime);
            LOGD(TAG, "finish decode frame");
            condition_.wait(lock);
        }
        // 主要作用是清理AVPacket中的所有空间数据,清理完毕后进行初始化操作,并且将 data 与 size 置为0,方便下次调用。
        // 释放 packet 引用
        av_packet_unref(pkt);
    }
}

void VideoDecoder::Prepare(ANativeWindow* window) {
    native_window_ = window;
    av_register_all();
    auto av_format_context = avformat_alloc_context();
    avformat_open_input(&av_format_context, path_.c_str(), nullptr, nullptr);
    avformat_find_stream_info(av_format_context, nullptr);
    int video_stream_index = -1;
    for (int i = 0; i < av_format_context->nb_streams; i++) {
        // 找到视频媒体流的下标
        if (av_format_context->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            video_stream_index = i;
            LOGD(TAG, "find video stream index = %d", video_stream_index);
            break;
        }
    }

    // run once
    do {
        if (video_stream_index == -1) {
            codec::LOGE(TAG, "Player Error : Can not find video stream");
            break;
        }
        std::unique_lock<std::mutex> lock(work_queue_mtx);

        // 获取视频媒体流
        auto stream = av_format_context->streams[video_stream_index];
        // 找到已注册的解码器
        auto codec = avcodec_find_decoder(stream->codecpar->codec_id);
        // 获取解码器上下文
        AVCodecContext* codec_ctx = avcodec_alloc_context3(codec);
        auto ret = avcodec_parameters_to_context(codec_ctx, stream->codecpar);

        if (ret >= 0) {
            // 打开
            avcodec_open2(codec_ctx, codec, nullptr);
            // 解码器打开后才有宽高的值
            video_width_ = codec_ctx->width;
            video_height_ = codec_ctx->height;

            AVFrame* rgba_frame = av_frame_alloc();
            int buffer_size = av_image_get_buffer_size(AV_PIX_FMT_RGBA, video_width_, video_height_,
                                                       1);
            // 分配内存空间给输出 buffer
            out_buffer_ = (uint8_t*) av_malloc(buffer_size * sizeof(uint8_t));
            av_image_fill_arrays(rgba_frame->data, rgba_frame->linesize, out_buffer_,
                                 AV_PIX_FMT_RGBA,
                                 video_width_, video_height_, 1);

            // 通过设置宽高限制缓冲区中的像素数量,而非屏幕的物理显示尺寸。
            // 如果缓冲区与物理屏幕的显示尺寸不相符,则实际显示可能会是拉伸,或者被压缩的图像
            int result = ANativeWindow_setBuffersGeometry(native_window_, video_width_,
                                                          video_height_, WINDOW_FORMAT_RGBA_8888);
            if (result < 0) {
                LOGE(TAG, "Player Error : Can not set native window buffer");
                avcodec_close(codec_ctx);
                avcodec_free_context(&codec_ctx);
                av_free(out_buffer_);
                break;
            }

            auto frame = av_frame_alloc();
            auto packet = av_packet_alloc();

            struct SwsContext* data_convert_context = sws_getContext(
                    video_width_, video_height_, codec_ctx->pix_fmt,
                    video_width_, video_height_, AV_PIX_FMT_RGBA,
                    SWS_BICUBIC, nullptr, nullptr, nullptr);
            while (!is_stop_) {
                LOGD(TAG, "front seek time_ms_ = %ld, last_decode_time_ms_ = %ld", time_ms_,
                     last_decode_time_ms_);
                if (!is_seeking_ && (time_ms_ > last_decode_time_ms_ + 1000 ||
                                     time_ms_ < last_decode_time_ms_ - 50)) {
                    is_seeking_ = true;
                    LOGD(TAG, "seek frame time_ms_ = %ld, last_decode_time_ms_ = %ld", time_ms_,
                         last_decode_time_ms_);
                    // 传进去的是指定帧带有 time_base 的时间戳,所以是要将原来的 times_ms 按照上面获取时的计算方式反推算出时间戳
                    av_seek_frame(av_format_context, video_stream_index,
                                  time_ms_ * stream->time_base.den / 1000, AVSEEK_FLAG_BACKWARD);
                }
                // 读取视频一帧(完整的一帧),获取的是一帧视频的压缩数据,接下来才能对其进行解码
                ret = av_read_frame(av_format_context, packet);
                if (ret < 0) {
                    avcodec_flush_buffers(codec_ctx);
                    av_seek_frame(av_format_context, video_stream_index,
                                  time_ms_ * stream->time_base.den / 1000, AVSEEK_FLAG_BACKWARD);
                    LOGD(TAG, "ret < 0, condition_.wait(lock)");
                    // 防止解码最后一帧时视频已经没有数据
                    on_decode_frame_(last_decode_time_ms_);
                    condition_.wait(lock);
                }
                if (packet->size) {
                    Decode(codec_ctx, packet, frame, stream, lock, data_convert_context,
                           rgba_frame);
                }
            }
            // 释放资源
            sws_freeContext(data_convert_context);
            av_free(out_buffer_);
            av_frame_free(&rgba_frame);
            av_frame_free(&frame);
            av_packet_free(&packet);

        }
        avcodec_close(codec_ctx);
        avcodec_free_context(&codec_ctx);

    } while (false);
    avformat_close_input(&av_format_context);
    avformat_free_context(av_format_context);
    ANativeWindow_release(native_window_);
    delete this;
}

bool VideoDecoder::DecodeFrame(long time_ms) {
    LOGD(TAG, "DecodeFrame time_ms = %ld", time_ms);
    if (last_decode_time_ms_ == time_ms || time_ms_ == time_ms) {
        LOGD(TAG, "DecodeFrame last_decode_time_ms_ == time_ms");
        return false;
    }
    if (last_decode_time_ms_ >= time_ms && last_decode_time_ms_ <= time_ms + 50) {
        return false;
    }
    time_ms_ = time_ms;
    condition_.notify_all();
    return true;
}

void VideoDecoder::Release() {
    is_stop_ = true;
    condition_.notify_all();
}

/**
 * 获取当前的毫秒级时间
 */
int64_t VideoDecoder::GetCurrentMilliTime(void) {
    struct timeval tv{};
    gettimeofday(&tv, nullptr);
    return tv.tv_sec * 1000.0 + tv.tv_usec / 1000.0;
}

}
Copy the code

Join us

As bytedance’s image team, we are currently developing a number of products, including Photoediting, CapCut, Light Color, Wake image and Faceu. Our business covers diversified image creation scenes. By June 2021, photoediting, Light Color Camera and CapCut have repeatedly topped the list of free apps in APP Store at home and abroad. And continue to maintain high growth. Join us to create the world’s most popular image creation products.

Club recruit delivery link: job.toutiao.com/s/NFYMcaq

Enrollment code: 5A38FTT

The school recruit delivery link: jobs.bytedance.com/campus/posi…

, recruiting – byte to beat each other entertainment research and development team: image bytedance. Feishu. Cn/docx/doxcnM…