The paper

This paper refers to the Android platform video decoder implemented by Dr. Lei Xiaohua’s blog, and uses FFmpeg to decode video files into YUV data. The code in this article is consistent with the original blog, replacing some API functions for FFmpeg4.2.2, sorting out the decoding process, and briefly analyzing the key functions.

To prepare

  • Cross-compile ffMPEG SO library for ARM platform
  • Includes native Android Project

Basic knowledge of

The process of audio and video from unpacking to playing

Protocol resolution is to parse the data in network streaming media format into the corresponding encapsulated data format. In the network, audio and video information is transmitted between communication parties by a certain protocol. The commonly used streaming media protocols include RTMP, MMS and HTTP.

Unencapsulation refers to the separation of encapsulated format data into audio and video parts, common files such as MP4, MKV, FLV, etc., are encoded compressed audio data and encoded compressed video data encapsulated together in a certain format.

Decoding is the most important link in audio and video processing. It decodes the encoded and compressed data into original data. Common audio compression coding standards such as AAC, MP3, etc., video compression coding standards are H.264, MPEG2, VC-1, etc. After decoding operation, the audio part gets the audio sampling data, such as PCM, and the video part gets the color data, such as YUV420P, RGB, etc.

Corresponding table of format and coding standard

Refer to Dr. Lei Xiaohua’s zero-basis learning method of video and audio codec technology

A brief introduction to the FFmpeg libraries used in this example

Libavformat

A decapsulator class library for multiplexing and demultiplexing (blending and deblending) audio, video, and subtitle streams. It contains a variety of multiplexers and demultiplexers for multimedia encapsulation formats.

Libavcodec

Codec/decoder class library, containing decoders and encoders for audio, video and subtitle streams, as well as several code flow filters.

Libswscale

Used for image scaling and color space and pixel format conversion operations.

Use FFmpeg for video decapsulation and decoding processes

Sample code snippet

Introduce FFmpeg related headers and Android log headers and define log printing macros.

#include <android/log.h>

extern "C" {
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
}

#define LOGD(format, ...)  __android_log_print(ANDROID_LOG_DEBUG,  "xeemon", format, ##__VA_ARGS__)
#define LOGE(format, ...)  __android_log_print(ANDROID_LOG_ERROR,  "xeemon", format, ##__VA_ARGS__)
Copy the code

Write the av_log() callback to write the log to the sdcard file.

//Output FFmpeg's av_log()
void custom_log(void *ptr, int level, const char *fmt, va_list vl) {
    FILE *fp = fopen("/storage/emulated/0/av_log.txt"."a+");
    if (fp) {
        vfprintf(fp, fmt, vl); fflush(fp); fclose(fp); }}Copy the code

The Java part defines the native method and receives two parameters, namely the path of the video file and the path of the YUV file to be output

public native void decode(String input_url, String output_url);
Copy the code

Write logic in C++

extern "C" JNIEXPORT jint JNICALL
Java_cn_helloworld_FFmpegUtils_decode(JNIEnv *env, jobject thiz, jstring input_url, jstring output_url) {

    // Define the associated struct variables
    AVFormatContext *pFormatCtx;
    int i, videoIndex;
    AVCodecContext *pCodecCtx;
    AVCodecParameters *pCodecpar;
    AVCodec *pCodec;
    AVFrame *pFrame, *pFrameYUV;
    uint8_t *out_buffer;
    AVPacket *packet;
    int y_size;
    int ret;
    struct SwsContext *img_convert_ctx;
    FILE *fp_yuv;
    int frame_cnt;
    clock_t time_start, time_finish;
    double time_duration = 0.0;
    
    char input_str[500] = {0};
    char output_str[500] = {0};
    char info[1000] = {0};
    // Receive the path of incoming video and the path of yuV file to be output
    sprintf(input_str, "%s", env->GetStringUTFChars(input_url, NULL));
    sprintf(output_str, "%s", env->GetStringUTFChars(output_url, NULL));

    / / av_log () writes sdcard
    av_log_set_callback(custom_log);
Copy the code

Initialize the resolver wrapper and find the Video Stream

    avformat_network_init(); // Initialize and start the underlying TLS library. This parameter is optional when network traffic is not enabled. It is recommended.
    pFormatCtx = avformat_alloc_context(); // Create AVFormatContext.
    // tips: Don't write it here, it will be called in the following avformat_open_input() function when it is found not initialized
    // When do I need to initialize AVFormatContext in advance?
    // official website: One case is when you need to use custom functions to read input data instead of the LAVF internal I/O layer

    // Avformat_open_input () allocates memory for AVFormatContext,
    // Detect the encapsulation format of the video file, add the video source to the internal buffer, and finally read the video header information
    if (avformat_open_input(&pFormatCtx, input_str, NULL.NULL) != 0) {
        LOGE("Couldn't open input stream.\n");
        return - 1;
    }

    // Some files have no header information or the header does not store enough information
    // Avformat_find_stream_info () is used to further parse the video file information. This function tries to read and decode some frames to find the missing information
    // The AVStream is used to parse AVFormatContext.
    // This function has done a complete set of decoding process, get the information of the multimedia stream
    if (avformat_find_stream_info(pFormatCtx, NULL) < 0) {
        LOGE("Couldn't find stream information.\n");
        return - 1;
    }

    videoIndex = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_VIDEO, - 1.- 1.NULL.0);
    if (videoIndex < 0) {
        LOGE("Couldn't find a video stream.\n");
        return - 1;
    }
Copy the code

AVFormatContext-> Streams -> coDECPAR to retrieve the codec_ID and find the corresponding decoder.

    pCodecpar = pFormatCtx->streams[videoIndex]->codecpar;
    pCodec = avcodec_find_decoder(pCodecpar->codec_id); // Find the appropriate decoder
    pCodecCtx = avcodec_alloc_context3(pCodec); // Allocate memory for AVCodecContext
    if (pCodec == NULL) {
        LOGE("Couldn't find Codec, codec is NULL.\n");
        return - 1;
    }
    if (pCodecCtx == NULL) {
        LOGE("Couldn't allocate decoder context.\n");
        return - 1;
    }
    // avCODEC_parameters_to_context () actually performs a content copy of the AVCodecContext
    if (avcodec_parameters_to_context(pCodecCtx, pCodecpar) < 0) {
        LOGE("Couldn't copy decoder context.\n");
        return - 1;
    }

    //avcodec_open2() opens the decoder
    if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) {
        LOGE("Couldn't open Codec.\n");
        return - 1;
    }
Copy the code

With the above steps, you are ready to start a loop calling av_read_frame() for parsing, but in order to output yuV data, you must also prepare the storage space to receive it.

    pFrame = av_frame_alloc();
    pFrameYUV = av_frame_alloc();

    out_buffer = (unsigned char *) av_malloc(
            av_image_get_buffer_size(AV_PIX_FMT_YUV420P, pCodecCtx->width, pCodecCtx->height, 1));
    //av_image_fill_arrays() associates the data members of AVFrame to an address space
    // This is a storage location for the output of frame information processed by av_read_frame() and sws_scale()
    av_image_fill_arrays(pFrameYUV->data, pFrameYUV->linesize, out_buffer,
                         AV_PIX_FMT_YUV420P, pCodecCtx->width, pCodecCtx->height, 1);

    packet = (AVPacket *) av_malloc(sizeof(AVPacket));
Copy the code

SwsContext is initialized in preparation for the subsequent use of sws_scale(). Since the YUV video data stored in AVFrame is not continuous valid pixels, it also contains some invalid data due to optimization and other reasons, so sws_scale() is called for conversion.

    //sws_getContext() initializes SwsContext.
    // srcW: width of source image
    // srcH: height of the source image
    // srcFormat: the pixel format of the source image
    // dstW: target image width
    // dstH: Target image height
    // dstFormat: The pixel format of the target image
    // flags: Sets the algorithm used for image stretching
    img_convert_ctx = sws_getContext(pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt,
                                     pCodecCtx->width, pCodecCtx->height, AV_PIX_FMT_YUV420P,
                                     SWS_BICUBIC, NULL.NULL.NULL);

    sprintf(info,   "[Input ]%s\n", input_str);
    sprintf(info, "%s[Output ]%s\n", info, output_str);
    sprintf(info, "%s[Format ]%s\n", info, pFormatCtx->iformat->name);
    sprintf(info, "%s[Codec ]%s\n", info, pCodecCtx->codec->name);
    sprintf(info, "%s[Resolution]%dx%d\n", info, pCodecCtx->width, pCodecCtx->height);
Copy the code

This is where the real decoding begins. Calling av_read_frame() returns an AVPacket on success, and then calling avcodec_send_packet() sends a packet to the decode queue, After decoding is successful, a decoded AVFrame will be returned from avCOdec_receive_frame ().

    fp_yuv = fopen(output_str, "wb+");
    if (fp_yuv == NULL) {
        LOGE("Cannot open output file.\n");
        return - 1;
    }
    frame_cnt = 0;
    time_start = clock();
    // Read data from the open AVFormatContext by repeatedly calling av_read_frame()
    // Av_read_frame () returns an AVPacket each time the call succeeds
    //AVPacket contains an AVStream encoding data
    while (av_read_frame(pFormatCtx, packet) >= 0) {
        if (packet->stream_index == videoIndex) {
            ret = avcodec_send_packet(pCodecCtx, packet);
            if (ret < 0) {
                LOGE("Decode error.\n");
                return - 1;
            }
            ret = avcodec_receive_frame(pCodecCtx, pFrame);
            if(ret ! =0) {
                // The first TODO call to avcodec_receive_frame() returns ret = -11
                continue;
            }
            // sws_scale() is used to convert images
            if (sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0, pCodecCtx->height,
                          pFrameYUV->data, pFrameYUV->linesize) > 0) {
                y_size = pCodecCtx->width * pCodecCtx->height;
                fwrite(pFrameYUV->data[0].1, y_size, fp_yuv);      // Y
                fwrite(pFrameYUV->data[1].1, y_size / 4, fp_yuv);  // U
                fwrite(pFrameYUV->data[2].1, y_size / 4, fp_yuv);  // V

                //Output info
                char pictype_str[10] = {0};
                switch (pFrame->pict_type) {
                    case AV_PICTURE_TYPE_I:
                        strcpy(pictype_str, "I");
                        break;
                    case AV_PICTURE_TYPE_P:
                        strcpy(pictype_str, "P");
                        break;
                    case AV_PICTURE_TYPE_B:
                        strcpy(pictype_str, "B");
                        break;
                    default:
                        strcpy(pictype_str, "Other");
                }
                LOGD("Frame Index: %5d. Type:%s", frame_cnt, pictype_str);
                frame_cnt++;
            }
        }
        av_packet_unref(packet);
    }
Copy the code

Here is not fully understood, it seems to refresh the output of the rest of the frame.

    //flush decoder
    while (true) {
        ret = avcodec_send_packet(pCodecCtx, packet);
        if (ret < 0) {
            break;
        }
        ret = avcodec_receive_frame(pCodecCtx, pFrame);
        if(ret ! =0) {
            continue;
        }
        sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0, pCodecCtx->height,
                  pFrameYUV->data, pFrameYUV->linesize);
        y_size = pCodecCtx->width * pCodecCtx->height;
        fwrite(pFrameYUV->data[0].1, y_size, fp_yuv);      // Y
        fwrite(pFrameYUV->data[1].1, y_size / 4, fp_yuv);  // U
        fwrite(pFrameYUV->data[2].1, y_size / 4, fp_yuv);  // V

        //Output info
        char pictype_str[10] = {0};
        switch (pFrame->pict_type) {
            case AV_PICTURE_TYPE_I:
                strcpy(pictype_str, "I");
                break;
            case AV_PICTURE_TYPE_P:
                strcpy(pictype_str, "P");
                break;
            case AV_PICTURE_TYPE_B:
                strcpy(pictype_str, "B");
                break;
            default:
                strcpy(pictype_str, "Other");
        }
        LOGD("Frame Index: %5d. Type:%s", frame_cnt, pictype_str);
        frame_cnt++;
    }
    time_finish = clock();
    time_duration = (double) (time_finish - time_start);

    sprintf(info, "%s[Time ]%fms\n", info, time_duration);
    sprintf(info, "%s[Count ]%d\n", info, frame_cnt);

    LOGD("Info:\n%s", info);
Copy the code

Close the previously opened resource. Many functions are used in conjunction with each other, so you should write the code in conjunction with each other, and then write the other logic in the middle so that you don’t forget.

    // Use with sws_getContext()
    sws_freeContext(img_convert_ctx);

    fclose(fp_yuv);

    av_frame_free(&pFrameYUV);
    av_frame_free(&pFrame);
    avcodec_close(pCodecCtx);
    // Used with avformat_open_input()
    avformat_close_input(&pFormatCtx);

    return 0;
}
Copy the code

The resources

  1. Lei Xiaohua. The simplest mobile terminal based on FFmpeg example: Android video decoder
  2. FFmpeg Documentation
  3. I am small north dig ha ha. Video and video frames: FFMPEG decoding routines
  4. Simple player from zero :4. Ffmpeg decodes video for YUV data