Cross-platform Player Development (iv) What FFmpeg knowledge is needed to develop a player

preface

Our first three articles focused on configuring the development environment on each of the major platforms, so it’s time to get down to coding. Since the series is defined as writing a cross-platform player from 0 to 1, I’m going to go from shallow to deep, from basic to advanced.

Let’s start with a flow chart:

This series of articles has broken down the figure into specific code modules, so this article will focus on how to use the FFmpeg API to decapple an input data, read the raw audio and video information, and then do some basic operations with the audio and video. Basically in the player module used in the FFmpeg API we have to have an understanding of it.

Ps: If the Mac OS, Windows, Linux do not know how to configure QT & FFmpeg environment can refer to the following article

QT for MAC OS & FFmpeg environment

Cross-platform Player development (2) QT for Linux & FFmpeg environment build

Cross-platform Player development (3) QT for Windows & FFmpeg environment build

FFmpeg basics

decapsulation

Use the FFmpeg API to decapsulate the input video. Let’s first look at the process of using the API

Do you have a general idea of the decapsulation API? From an input URL to read the compressed data stream is a few steps, very simple, let’s use code to demonstrate the actual:

1. Register all functions

av_register_all()
Copy the code

In fact, this function is obsolete in the latest version and must be called in the lowest version.

2. Register the network module

// Initialize the network library (can open RTMP, RTSP, HTTP and other protocols streaming video)
avformat_network_init(a);Copy the code

3. Open the input stream and read the header

		// Set parameters
    AVDictionary *opts = NULL;
    // Set the RTSP stream to TCP enabled
    av_dict_set(&opts, "rtsp_transport"."tcp".0);
    // Network delay time
    av_dict_set(&opts, "max_delay"."1000".0);

    // Decapsulate the context
    AVFormatContext *ic = NULL;
    int re = avformat_open_input(
            &ic,
            inpath,
            0.// 0 indicates automatic selection of decapsulator
            &opts // Parameter Settings, such as RTSP delay time
    );
		// Return the value 0 successfully
    if(re ! =0) {
        char buf[1024] = {0};
        av_strerror(re, buf, sizeof(buf) - 1);
        cout << "open " << inpath << " failed! :" << buf << endl;
        getchar(a);return - 1;
    }
Copy the code

Notice here that if you call this function you must call avformat_close_input() at the end

4. Read media file packets

//return >=0 if OK, AVERROR_xxx on error 
re = avformat_find_stream_info(ic, 0);

// Print the video stream details
av_dump_format(ic, 0, inpath, 0);
Copy the code

5. Obtain audio and video stream information

You get it by traversal

    // Get audio/video stream information (traversal, function fetch)
    for (int i = 0; i < ic->nb_streams; i++) {
        AVStream *as = ic->streams[i];
        cout << "codec_id = " << as->codecpar->codec_id << endl;
        cout << "format = " << as->codecpar->format << endl;

        / / audio AVMEDIA_TYPE_AUDIO
        if (as->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            audioStream = i;
            cout << i << "Audio message" << endl;
            cout << "sample_rate = " << as->codecpar->sample_rate << endl;
            //AVSampleFormat;
            cout << "channels = " << as->codecpar->channels << endl;
            // A frame of data?? Number of single-channel samples
            cout << "frame_size = " << as->codecpar->frame_size << endl;
            //1024 * 2 * 2 = 4096 fps = sample_rate/frame_size

        }
            / / video AVMEDIA_TYPE_VIDEO
        else if (as->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            videoStream = i;
            cout << i << "Video message" << endl;
            cout << "width=" << as->codecpar->width << endl;
            cout << "height=" << as->codecpar->height << endl;
            // Frame rate FPS score conversion
            cout << "video fps = " << r2d(as->avg_frame_rate) << endl; }}Copy the code

Obtain this parameter using the API

// Get the video stream
videoStream = av_find_best_stream(ic, AVMEDIA_TYPE_VIDEO, - 1.- 1.NULL.0);
AVStream *as = ic->streams[videoStream];
cout << i << "Video message" << endl;
cout << "width=" << as->codecpar->width << endl;
cout << "height=" << as->codecpar->height << endl;
// Frame rate FPS score conversion
cout << "video fps = " << r2d(as->avg_frame_rate) << endl;    
Copy the code

6. Read the compressed data packet

    AVPacket *pkt = av_packet_alloc(a);for (;;) {
        int re = av_read_frame(ic, pkt);
        if(re ! =0) {
            // Loop
            cout << "==============================end==============================" << endl;
            break;
        }
        cout << "pkt->size = " << pkt->size << endl;
        // Show the time
        cout << "pkt->pts = " << pkt->pts << endl;

        // Convert to milliseconds for easy synchronization
        cout << "pkt->pts ms = " << pkt->pts * (r2d(ic->streams[pkt->stream_index]->time_base) * 1000) << endl;

        // Decoding time
        cout << "pkt->dts = " << pkt->dts << endl;
        if (pkt->stream_index == videoStream) {
            cout << "Image" << endl;
        }
        if (pkt->stream_index == audioStream) {
            cout << "Audio" << endl;
        }
        // Free, reference count -1 is 0 to free space
        av_packet_unref(pkt);
    }
Copy the code

Log after debugging

decoding

Call · FFMPEG API to decode audio and video compression data, in fact, it is very simple, mainly using the following apis, see the following picture:

We then add to the decapsulated code as follows:

    // Find the video decoder
    AVCodec *vcodec = avcodec_find_decoder(ic->streams[videoStream]->codecpar->codec_id);
    if(! vcodec) { cout <<"can't find the codec id" << ic->streams[videoStream]->codecpar->codec_id << endl;
        getchar(a);return - 1;
    }
    // Create the video decoder context
    AVCodecContext *vctx = avcodec_alloc_context3(vcodec);
    // Configure the decoder context parameters
    avcodec_parameters_to_context(vctx, ic->streams[videoStream]->codecpar);
    // Configure the decoder thread
    vctx->thread_count = 8;
    // Open the decoder context
    re = avcodec_open2(vctx, 0.0);
    if(re ! =0) {
        char buf[1024] = {0};
        av_strerror(re, buf, sizeof(buf) - 1);
        cout << "video avcodec_open2 failed!" << buf << endl;
        getchar(a);return - 1;
    }
    cout << "video avcodec_open2 success!" << endl;
Copy the code

Finding the decoder can also be done using the following API form:

AVCodec *avcodec_find_decoder_by_name(const char *name);
Copy the code

If you want to open the audio decoder, the code is the same, change the parameters, the following for real decoding:

    //malloc AVPacket and initialize AVPacket * PKT = av_packet_alloc(); AVFrame *frame = av_frame_alloc(); for (;;) { int re = av_read_frame(ic, pkt); if (re ! = 0) { break; Re = avCODEC_send_packet (avcc, PKT); Av_packet_unref (PKT); av_packet_unref(PKT); // Receive for (;;) { re = avcodec_receive_frame(avcc, frame); if (re ! = 0)break; Av_frame_unref (frame); }
Copy the code

Now we can add some print parameters, such as audio sampling information, video width and height information:

Total duration :totalMs = 10534ms Bitrate =907_500fps = 30.0003codec_id = 86018format = AV_PIX_FMT_YUV420P 11521080-1920pict_type = AV_PICTURE_TYPE_I Audio information: sample_rate = 48000channels = 2Copy the code

Video pixel format conversion

Video pixel format is actually a YUV to RGB process, FFmpeg also provides the corresponding API, it is the use of CPU computing power to convert, the efficiency is relatively low. Our player uses OpenGL GPU to turn, the efficiency is higher. Although the FFmpeg API is less efficient, we can still learn about it. The use process is as follows:

YUV can be converted or trimmed using only 2 apis. Examples of this code:

                const int in_width = frame->width;                const int in_height = frame->height;                const int out_width = in_width / 2;                const int out_height = in_height / 2;                							/** * @param context: If the context is NULL, then the internal context is created. * If the context is NULL, then the internal context is created. * If the context is NULL, then the current context is returned. @param srcW: The width of the input * @param srcH: the height of the input * @param srcFormat: the format of the input * @param dstW: the width of the output * @param dstH: the width of the input * @param srcH: the height of the input * @param srcFormat: the format of the input * @param dstW: the width of the output * @param dstH: Output high * @param dstFormat: output format * @param Flags: provides a series of algorithms, fast linear, difference, matrix, different algorithm performance is different, fast linear algorithm performance is relatively high. Only for size changes. * @param srcFilter: input filter * @param dstFilter: output filter * @param param: this is related to the flags algorithm, usually passed O * @return: zoom context */                vsctx = sws_getCachedContext(                        vsctx,In_width, in_height, (AVPixelFormat) frame->format, Out_width, out_height, AV_PIX_FMT_RGBA, // output width and height, SWS_BILINEAR, // size transform algorithm 0, 0, 0); /** * @param srcSlice YUV * @param srcSlice YUV @param srcStride * @param srcSliceY * @param srcSliceH * @param srcSliceH YUV * @param DST * @param srcSliceH YUV * @param srcSliceH YUV * @param DST * @param srcSliceH YUV * @param srcSliceH * @param dstStride * @return convert height */ re = sws_scale(VSCTX, frame->data, // Uint8_t *const *) (data) // Uint8_t *const *) (data) // Uint8_t *const *)
Copy the code

The above comments are very detailed, I believe you can see clearly, finally we look at the debug log, as follows:

Pixel format size conversion context created or captured successfully! in_width=1080in_height=1920out_width=540out_height=960sws_scale success! return height of the output slice =960==============================end==============================
Copy the code

resampling

Resampling means that the audio input parameters are uniformly output to a specific value. This has the advantage of normalizing the player’s sound output. So how do you use the FFmpeg API for resampling? Let’s start with a flow chart:

So let’s continue with the same code that we did before,

Our uniform output parameters are sample_rate=48000,sample_channel=2, and sample_fml=AV_SAMPLE_FMT_S16

.SwrContext *asctx = swr_alloc(); // Set the resampling parameter asctx = sWR_alloc_set_opts (asCTx // resampling context, av_get_default_channel_layout(2)// Output channel format, // Output sampling rate, av_get_default_channel_layout(actx->channels) Actx ->sample_fmt // Input audio sample format, actx->sample_rate, 0, 0 // Input audio sample rate); // Initialize the sampling context re = swr_init(asctx); if (re ! = 0) { char buf[1024] = {0}; av_strerror(re, buf, sizeof(buf) - 1); cout << "audio swr_init failed!" << buf << endl; return -1; }... // Unsigned char * PCM = NULL; for (;;) { int re = av_read_frame(ic, pkt); if (re ! = 0) {/ / looping cout < < "= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = end = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =" < < endl; // int ms = 3000; // long long pos = (double) ms/(double) 1000 * r2d(IC ->streams[PKT ->stream_index]->time_base); // av_seek_frame(ic, videoStream, pos, AVSEEK_FLAG_BACKWARD | AVSEEK_FLAG_FRAME); // continue; break; } cout << "pkt->size = " << pkt->size << endl; / / display the time cout < < "PKT - > PTS =" < < PKT - > PTS < < endl; Cout << "PKT -> PTS ms =" << PKT -> PTS * (r2d(IC ->streams[PKT ->stream_index]->time_base)) << endl; / / decoding time cout < < "PKT - > DTS =" < < PKT - > DTS < < endl; AVCodecContext *avcc = NULL; If (PKT ->stream_index == videoStream) {cout << "image" << endl; avcc = vctx; } if (PKT ->stream_index == audioStream) {cout << "audio" << endl; avcc = actx; Re = avCODEC_send_packet (avcc, PKT); Av_packet_unref (PKT); av_packet_unref(PKT); if (re ! = 0) { char buf[1024] = {0}; av_strerror(re, buf, sizeof(buf) - 1); cout << "video avcodec_send_packet failed!" << buf << endl; continue; } // Receive for (;;) { re = avcodec_receive_frame(avcc, frame); if (re ! = 0)break; . If (avcc == actx) {// uint8_t *data[2] = {0}; if (! pcm) pcm = new uint8_t[frame->nb_samples * 16/8 * 2]; data[0] = {pcm}; Int len = swr_convert(asctx, data, frame->nb_samples // output, (const uint8_t **) frame->data, frame->nb_samples // input); if (len >= 0) { cout << "swr_convert success return len = " << len << endl; } else { cout << "swr_convert failed return len = " << len << endl; }}}... } if (asctx)swr_close(asctx); if (asctx)swr_free(&asctx);
Copy the code

Log after conversion:

swr_convert success return len = 1024
Copy the code

The seek operation

FFmpeg provides the av_seek_frame function to jump to the video. It has 4 input parameters. The meanings are as follows:

/** * seek to key frame operations based on timestamp and audio/video index ** @param s media format context * @param stream_index Stream index, * @param timestamp * @param flags seek * @return >= 0 on success */ int av_seek_frame(AVFormatContext *s, int stream_index, int64_t timestamp, int flags);Copy the code

Let’s focus on the last flags argument

//AVSEEK_FLAG_BACKWARD

Seek to the following key frame

//AVSEEK_FLAG_BYTE

Based on location in bytes

//AVSEEK_FLAG_ANY

Seek to any frame, note that it is not a key frame, then there is a possibility of splashes.

//AVSEEK_FLAG_FRAME

Seek to the position of the key frame

We usually seek in this form:

int ms = 3000; // The three second position is converted according to the time base (score)
long long pos = (double) ms / (double) 1000 * r2d(ic->streams[pkt->stream_index]->time_base);
av_seek_frame(ic, videoStream, pos, AVSEEK_FLAG_BACKWARD | AVSEEK_FLAG_FRAME);
Copy the code

The above means that the play starts at the key frame behind the 3000 ms position. After the player seek function when we will introduce how accurate seek operation

That’s all we have to say about FFmpeg in this article, and I’ll go into more detail later on in the development process if there are new ones.

conclusion

That’s about all you need to know about FFmpeg, and you can see that these apis are pretty simple. At this point, I’m sure you have an impression and an understanding of the apis.

The next article will show you how QT renders PCM and YUV data.

The above code can be accessed from this address