The paper
This paper refers to the Android platform video decoder implemented by Dr. Lei Xiaohua’s blog, and uses FFmpeg to decode video files into YUV data. The code in this article is consistent with the original blog, replacing some API functions for FFmpeg4.2.2, sorting out the decoding process, and briefly analyzing the key functions.
To prepare
- Cross-compile ffMPEG SO library for ARM platform
- Includes native Android Project
Basic knowledge of
The process of audio and video from unpacking to playing
Protocol resolution is to parse the data in network streaming media format into the corresponding encapsulated data format. In the network, audio and video information is transmitted between communication parties by a certain protocol. The commonly used streaming media protocols include RTMP, MMS and HTTP.
Unencapsulation refers to the separation of encapsulated format data into audio and video parts, common files such as MP4, MKV, FLV, etc., are encoded compressed audio data and encoded compressed video data encapsulated together in a certain format.
Decoding is the most important link in audio and video processing. It decodes the encoded and compressed data into original data. Common audio compression coding standards such as AAC, MP3, etc., video compression coding standards are H.264, MPEG2, VC-1, etc. After decoding operation, the audio part gets the audio sampling data, such as PCM, and the video part gets the color data, such as YUV420P, RGB, etc.
Corresponding table of format and coding standard
Refer to Dr. Lei Xiaohua’s zero-basis learning method of video and audio codec technology
A brief introduction to the FFmpeg libraries used in this example
Libavformat
A decapsulator class library for multiplexing and demultiplexing (blending and deblending) audio, video, and subtitle streams. It contains a variety of multiplexers and demultiplexers for multimedia encapsulation formats.
Libavcodec
Codec/decoder class library, containing decoders and encoders for audio, video and subtitle streams, as well as several code flow filters.
Libswscale
Used for image scaling and color space and pixel format conversion operations.
Use FFmpeg for video decapsulation and decoding processes
Sample code snippet
Introduce FFmpeg related headers and Android log headers and define log printing macros.
#include <android/log.h>
extern "C" {
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
}
#define LOGD(format, ...) __android_log_print(ANDROID_LOG_DEBUG, "xeemon", format, ##__VA_ARGS__)
#define LOGE(format, ...) __android_log_print(ANDROID_LOG_ERROR, "xeemon", format, ##__VA_ARGS__)
Copy the code
Write the av_log() callback to write the log to the sdcard file.
//Output FFmpeg's av_log()
void custom_log(void *ptr, int level, const char *fmt, va_list vl) {
FILE *fp = fopen("/storage/emulated/0/av_log.txt"."a+");
if (fp) {
vfprintf(fp, fmt, vl); fflush(fp); fclose(fp); }}Copy the code
The Java part defines the native method and receives two parameters, namely the path of the video file and the path of the YUV file to be output
public native void decode(String input_url, String output_url);
Copy the code
Write logic in C++
extern "C" JNIEXPORT jint JNICALL
Java_cn_helloworld_FFmpegUtils_decode(JNIEnv *env, jobject thiz, jstring input_url, jstring output_url) {
// Define the associated struct variables
AVFormatContext *pFormatCtx;
int i, videoIndex;
AVCodecContext *pCodecCtx;
AVCodecParameters *pCodecpar;
AVCodec *pCodec;
AVFrame *pFrame, *pFrameYUV;
uint8_t *out_buffer;
AVPacket *packet;
int y_size;
int ret;
struct SwsContext *img_convert_ctx;
FILE *fp_yuv;
int frame_cnt;
clock_t time_start, time_finish;
double time_duration = 0.0;
char input_str[500] = {0};
char output_str[500] = {0};
char info[1000] = {0};
// Receive the path of incoming video and the path of yuV file to be output
sprintf(input_str, "%s", env->GetStringUTFChars(input_url, NULL));
sprintf(output_str, "%s", env->GetStringUTFChars(output_url, NULL));
/ / av_log () writes sdcard
av_log_set_callback(custom_log);
Copy the code
Initialize the resolver wrapper and find the Video Stream
avformat_network_init(); // Initialize and start the underlying TLS library. This parameter is optional when network traffic is not enabled. It is recommended.
pFormatCtx = avformat_alloc_context(); // Create AVFormatContext.
// tips: Don't write it here, it will be called in the following avformat_open_input() function when it is found not initialized
// When do I need to initialize AVFormatContext in advance?
// official website: One case is when you need to use custom functions to read input data instead of the LAVF internal I/O layer
// Avformat_open_input () allocates memory for AVFormatContext,
// Detect the encapsulation format of the video file, add the video source to the internal buffer, and finally read the video header information
if (avformat_open_input(&pFormatCtx, input_str, NULL.NULL) != 0) {
LOGE("Couldn't open input stream.\n");
return - 1;
}
// Some files have no header information or the header does not store enough information
// Avformat_find_stream_info () is used to further parse the video file information. This function tries to read and decode some frames to find the missing information
// The AVStream is used to parse AVFormatContext.
// This function has done a complete set of decoding process, get the information of the multimedia stream
if (avformat_find_stream_info(pFormatCtx, NULL) < 0) {
LOGE("Couldn't find stream information.\n");
return - 1;
}
videoIndex = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_VIDEO, - 1.- 1.NULL.0);
if (videoIndex < 0) {
LOGE("Couldn't find a video stream.\n");
return - 1;
}
Copy the code
AVFormatContext-> Streams -> coDECPAR to retrieve the codec_ID and find the corresponding decoder.
pCodecpar = pFormatCtx->streams[videoIndex]->codecpar;
pCodec = avcodec_find_decoder(pCodecpar->codec_id); // Find the appropriate decoder
pCodecCtx = avcodec_alloc_context3(pCodec); // Allocate memory for AVCodecContext
if (pCodec == NULL) {
LOGE("Couldn't find Codec, codec is NULL.\n");
return - 1;
}
if (pCodecCtx == NULL) {
LOGE("Couldn't allocate decoder context.\n");
return - 1;
}
// avCODEC_parameters_to_context () actually performs a content copy of the AVCodecContext
if (avcodec_parameters_to_context(pCodecCtx, pCodecpar) < 0) {
LOGE("Couldn't copy decoder context.\n");
return - 1;
}
//avcodec_open2() opens the decoder
if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) {
LOGE("Couldn't open Codec.\n");
return - 1;
}
Copy the code
With the above steps, you are ready to start a loop calling av_read_frame() for parsing, but in order to output yuV data, you must also prepare the storage space to receive it.
pFrame = av_frame_alloc();
pFrameYUV = av_frame_alloc();
out_buffer = (unsigned char *) av_malloc(
av_image_get_buffer_size(AV_PIX_FMT_YUV420P, pCodecCtx->width, pCodecCtx->height, 1));
//av_image_fill_arrays() associates the data members of AVFrame to an address space
// This is a storage location for the output of frame information processed by av_read_frame() and sws_scale()
av_image_fill_arrays(pFrameYUV->data, pFrameYUV->linesize, out_buffer,
AV_PIX_FMT_YUV420P, pCodecCtx->width, pCodecCtx->height, 1);
packet = (AVPacket *) av_malloc(sizeof(AVPacket));
Copy the code
SwsContext is initialized in preparation for the subsequent use of sws_scale(). Since the YUV video data stored in AVFrame is not continuous valid pixels, it also contains some invalid data due to optimization and other reasons, so sws_scale() is called for conversion.
//sws_getContext() initializes SwsContext.
// srcW: width of source image
// srcH: height of the source image
// srcFormat: the pixel format of the source image
// dstW: target image width
// dstH: Target image height
// dstFormat: The pixel format of the target image
// flags: Sets the algorithm used for image stretching
img_convert_ctx = sws_getContext(pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt,
pCodecCtx->width, pCodecCtx->height, AV_PIX_FMT_YUV420P,
SWS_BICUBIC, NULL.NULL.NULL);
sprintf(info, "[Input ]%s\n", input_str);
sprintf(info, "%s[Output ]%s\n", info, output_str);
sprintf(info, "%s[Format ]%s\n", info, pFormatCtx->iformat->name);
sprintf(info, "%s[Codec ]%s\n", info, pCodecCtx->codec->name);
sprintf(info, "%s[Resolution]%dx%d\n", info, pCodecCtx->width, pCodecCtx->height);
Copy the code
This is where the real decoding begins. Calling av_read_frame() returns an AVPacket on success, and then calling avcodec_send_packet() sends a packet to the decode queue, After decoding is successful, a decoded AVFrame will be returned from avCOdec_receive_frame ().
fp_yuv = fopen(output_str, "wb+");
if (fp_yuv == NULL) {
LOGE("Cannot open output file.\n");
return - 1;
}
frame_cnt = 0;
time_start = clock();
// Read data from the open AVFormatContext by repeatedly calling av_read_frame()
// Av_read_frame () returns an AVPacket each time the call succeeds
//AVPacket contains an AVStream encoding data
while (av_read_frame(pFormatCtx, packet) >= 0) {
if (packet->stream_index == videoIndex) {
ret = avcodec_send_packet(pCodecCtx, packet);
if (ret < 0) {
LOGE("Decode error.\n");
return - 1;
}
ret = avcodec_receive_frame(pCodecCtx, pFrame);
if(ret ! =0) {
// The first TODO call to avcodec_receive_frame() returns ret = -11
continue;
}
// sws_scale() is used to convert images
if (sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0, pCodecCtx->height,
pFrameYUV->data, pFrameYUV->linesize) > 0) {
y_size = pCodecCtx->width * pCodecCtx->height;
fwrite(pFrameYUV->data[0].1, y_size, fp_yuv); // Y
fwrite(pFrameYUV->data[1].1, y_size / 4, fp_yuv); // U
fwrite(pFrameYUV->data[2].1, y_size / 4, fp_yuv); // V
//Output info
char pictype_str[10] = {0};
switch (pFrame->pict_type) {
case AV_PICTURE_TYPE_I:
strcpy(pictype_str, "I");
break;
case AV_PICTURE_TYPE_P:
strcpy(pictype_str, "P");
break;
case AV_PICTURE_TYPE_B:
strcpy(pictype_str, "B");
break;
default:
strcpy(pictype_str, "Other");
}
LOGD("Frame Index: %5d. Type:%s", frame_cnt, pictype_str);
frame_cnt++;
}
}
av_packet_unref(packet);
}
Copy the code
Here is not fully understood, it seems to refresh the output of the rest of the frame.
//flush decoder
while (true) {
ret = avcodec_send_packet(pCodecCtx, packet);
if (ret < 0) {
break;
}
ret = avcodec_receive_frame(pCodecCtx, pFrame);
if(ret ! =0) {
continue;
}
sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0, pCodecCtx->height,
pFrameYUV->data, pFrameYUV->linesize);
y_size = pCodecCtx->width * pCodecCtx->height;
fwrite(pFrameYUV->data[0].1, y_size, fp_yuv); // Y
fwrite(pFrameYUV->data[1].1, y_size / 4, fp_yuv); // U
fwrite(pFrameYUV->data[2].1, y_size / 4, fp_yuv); // V
//Output info
char pictype_str[10] = {0};
switch (pFrame->pict_type) {
case AV_PICTURE_TYPE_I:
strcpy(pictype_str, "I");
break;
case AV_PICTURE_TYPE_P:
strcpy(pictype_str, "P");
break;
case AV_PICTURE_TYPE_B:
strcpy(pictype_str, "B");
break;
default:
strcpy(pictype_str, "Other");
}
LOGD("Frame Index: %5d. Type:%s", frame_cnt, pictype_str);
frame_cnt++;
}
time_finish = clock();
time_duration = (double) (time_finish - time_start);
sprintf(info, "%s[Time ]%fms\n", info, time_duration);
sprintf(info, "%s[Count ]%d\n", info, frame_cnt);
LOGD("Info:\n%s", info);
Copy the code
Close the previously opened resource. Many functions are used in conjunction with each other, so you should write the code in conjunction with each other, and then write the other logic in the middle so that you don’t forget.
// Use with sws_getContext()
sws_freeContext(img_convert_ctx);
fclose(fp_yuv);
av_frame_free(&pFrameYUV);
av_frame_free(&pFrame);
avcodec_close(pCodecCtx);
// Used with avformat_open_input()
avformat_close_input(&pFormatCtx);
return 0;
}
Copy the code
The resources
- Lei Xiaohua. The simplest mobile terminal based on FFmpeg example: Android video decoder
- FFmpeg Documentation
- I am small north dig ha ha. Video and video frames: FFMPEG decoding routines
- Simple player from zero :4. Ffmpeg decodes video for YUV data