preface
Our first three articles focused on configuring the development environment on each of the major platforms, so it’s time to get down to coding. Since the series is defined as writing a cross-platform player from 0 to 1, I’m going to go from shallow to deep, from basic to advanced.
Let’s start with a flow chart:
This series of articles has broken down the figure into specific code modules, so this article will focus on how to use the FFmpeg API to decapple an input data, read the raw audio and video information, and then do some basic operations with the audio and video. Basically in the player module used in the FFmpeg API we have to have an understanding of it.
Ps: If the Mac OS, Windows, Linux do not know how to configure QT & FFmpeg environment can refer to the following article
QT for MAC OS & FFmpeg environment
Cross-platform Player development (2) QT for Linux & FFmpeg environment build
Cross-platform Player development (3) QT for Windows & FFmpeg environment build
FFmpeg basics
decapsulation
Use the FFmpeg API to decapsulate the input video. Let’s first look at the process of using the API
Do you have a general idea of the decapsulation API? From an input URL to read the compressed data stream is a few steps, very simple, let’s use code to demonstrate the actual:
1. Register all functions
av_register_all()
Copy the code
In fact, this function is obsolete in the latest version and must be called in the lowest version.
2. Register the network module
// Initialize the network library (can open RTMP, RTSP, HTTP and other protocols streaming video)
avformat_network_init(a);Copy the code
3. Open the input stream and read the header
// Set parameters
AVDictionary *opts = NULL;
// Set the RTSP stream to TCP enabled
av_dict_set(&opts, "rtsp_transport"."tcp".0);
// Network delay time
av_dict_set(&opts, "max_delay"."1000".0);
// Decapsulate the context
AVFormatContext *ic = NULL;
int re = avformat_open_input(
&ic,
inpath,
0.// 0 indicates automatic selection of decapsulator
&opts // Parameter Settings, such as RTSP delay time
);
// Return the value 0 successfully
if(re ! =0) {
char buf[1024] = {0};
av_strerror(re, buf, sizeof(buf) - 1);
cout << "open " << inpath << " failed! :" << buf << endl;
getchar(a);return - 1;
}
Copy the code
Notice here that if you call this function you must call avformat_close_input() at the end
4. Read media file packets
//return >=0 if OK, AVERROR_xxx on error
re = avformat_find_stream_info(ic, 0);
// Print the video stream details
av_dump_format(ic, 0, inpath, 0);
Copy the code
5. Obtain audio and video stream information
- You get it by traversal
// Get audio/video stream information (traversal, function fetch)
for (int i = 0; i < ic->nb_streams; i++) {
AVStream *as = ic->streams[i];
cout << "codec_id = " << as->codecpar->codec_id << endl;
cout << "format = " << as->codecpar->format << endl;
/ / audio AVMEDIA_TYPE_AUDIO
if (as->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
audioStream = i;
cout << i << "Audio message" << endl;
cout << "sample_rate = " << as->codecpar->sample_rate << endl;
//AVSampleFormat;
cout << "channels = " << as->codecpar->channels << endl;
// A frame of data?? Number of single-channel samples
cout << "frame_size = " << as->codecpar->frame_size << endl;
//1024 * 2 * 2 = 4096 fps = sample_rate/frame_size
}
/ / video AVMEDIA_TYPE_VIDEO
else if (as->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
videoStream = i;
cout << i << "Video message" << endl;
cout << "width=" << as->codecpar->width << endl;
cout << "height=" << as->codecpar->height << endl;
// Frame rate FPS score conversion
cout << "video fps = " << r2d(as->avg_frame_rate) << endl; }}Copy the code
- Obtain this parameter using the API
// Get the video stream
videoStream = av_find_best_stream(ic, AVMEDIA_TYPE_VIDEO, - 1.- 1.NULL.0);
AVStream *as = ic->streams[videoStream];
cout << i << "Video message" << endl;
cout << "width=" << as->codecpar->width << endl;
cout << "height=" << as->codecpar->height << endl;
// Frame rate FPS score conversion
cout << "video fps = " << r2d(as->avg_frame_rate) << endl;
Copy the code
6. Read the compressed data packet
AVPacket *pkt = av_packet_alloc(a);for (;;) {
int re = av_read_frame(ic, pkt);
if(re ! =0) {
// Loop
cout << "==============================end==============================" << endl;
break;
}
cout << "pkt->size = " << pkt->size << endl;
// Show the time
cout << "pkt->pts = " << pkt->pts << endl;
// Convert to milliseconds for easy synchronization
cout << "pkt->pts ms = " << pkt->pts * (r2d(ic->streams[pkt->stream_index]->time_base) * 1000) << endl;
// Decoding time
cout << "pkt->dts = " << pkt->dts << endl;
if (pkt->stream_index == videoStream) {
cout << "Image" << endl;
}
if (pkt->stream_index == audioStream) {
cout << "Audio" << endl;
}
// Free, reference count -1 is 0 to free space
av_packet_unref(pkt);
}
Copy the code
Log after debugging
decoding
Call · FFMPEG API to decode audio and video compression data, in fact, it is very simple, mainly using the following apis, see the following picture:
We then add to the decapsulated code as follows:
// Find the video decoder
AVCodec *vcodec = avcodec_find_decoder(ic->streams[videoStream]->codecpar->codec_id);
if(! vcodec) { cout <<"can't find the codec id" << ic->streams[videoStream]->codecpar->codec_id << endl;
getchar(a);return - 1;
}
// Create the video decoder context
AVCodecContext *vctx = avcodec_alloc_context3(vcodec);
// Configure the decoder context parameters
avcodec_parameters_to_context(vctx, ic->streams[videoStream]->codecpar);
// Configure the decoder thread
vctx->thread_count = 8;
// Open the decoder context
re = avcodec_open2(vctx, 0.0);
if(re ! =0) {
char buf[1024] = {0};
av_strerror(re, buf, sizeof(buf) - 1);
cout << "video avcodec_open2 failed!" << buf << endl;
getchar(a);return - 1;
}
cout << "video avcodec_open2 success!" << endl;
Copy the code
Finding the decoder can also be done using the following API form:
AVCodec *avcodec_find_decoder_by_name(const char *name);
Copy the code
If you want to open the audio decoder, the code is the same, change the parameters, the following for real decoding:
//malloc AVPacket and initialize AVPacket * PKT = av_packet_alloc(); AVFrame *frame = av_frame_alloc(); for (;;) { int re = av_read_frame(ic, pkt); if (re ! = 0) { break; Re = avCODEC_send_packet (avcc, PKT); Av_packet_unref (PKT); av_packet_unref(PKT); // Receive for (;;) { re = avcodec_receive_frame(avcc, frame); if (re ! = 0)break; Av_frame_unref (frame); }
Copy the code
Now we can add some print parameters, such as audio sampling information, video width and height information:
Total duration :totalMs = 10534ms Bitrate =907_500fps = 30.0003codec_id = 86018format = AV_PIX_FMT_YUV420P 11521080-1920pict_type = AV_PICTURE_TYPE_I Audio information: sample_rate = 48000channels = 2Copy the code
Video pixel format conversion
Video pixel format is actually a YUV to RGB process, FFmpeg also provides the corresponding API, it is the use of CPU computing power to convert, the efficiency is relatively low. Our player uses OpenGL GPU to turn, the efficiency is higher. Although the FFmpeg API is less efficient, we can still learn about it. The use process is as follows:
YUV can be converted or trimmed using only 2 apis. Examples of this code:
const int in_width = frame->width; const int in_height = frame->height; const int out_width = in_width / 2; const int out_height = in_height / 2; /** * @param context: If the context is NULL, then the internal context is created. * If the context is NULL, then the internal context is created. * If the context is NULL, then the current context is returned. @param srcW: The width of the input * @param srcH: the height of the input * @param srcFormat: the format of the input * @param dstW: the width of the output * @param dstH: the width of the input * @param srcH: the height of the input * @param srcFormat: the format of the input * @param dstW: the width of the output * @param dstH: Output high * @param dstFormat: output format * @param Flags: provides a series of algorithms, fast linear, difference, matrix, different algorithm performance is different, fast linear algorithm performance is relatively high. Only for size changes. * @param srcFilter: input filter * @param dstFilter: output filter * @param param: this is related to the flags algorithm, usually passed O * @return: zoom context */ vsctx = sws_getCachedContext( vsctx,In_width, in_height, (AVPixelFormat) frame->format, Out_width, out_height, AV_PIX_FMT_RGBA, // output width and height, SWS_BILINEAR, // size transform algorithm 0, 0, 0); /** * @param srcSlice YUV * @param srcSlice YUV @param srcStride * @param srcSliceY * @param srcSliceH * @param srcSliceH YUV * @param DST * @param srcSliceH YUV * @param srcSliceH YUV * @param DST * @param srcSliceH YUV * @param srcSliceH * @param dstStride * @return convert height */ re = sws_scale(VSCTX, frame->data, // Uint8_t *const *) (data) // Uint8_t *const *) (data) // Uint8_t *const *)
Copy the code
The above comments are very detailed, I believe you can see clearly, finally we look at the debug log, as follows:
Pixel format size conversion context created or captured successfully! in_width=1080in_height=1920out_width=540out_height=960sws_scale success! return height of the output slice =960==============================end==============================
Copy the code
resampling
Resampling means that the audio input parameters are uniformly output to a specific value. This has the advantage of normalizing the player’s sound output. So how do you use the FFmpeg API for resampling? Let’s start with a flow chart:
So let’s continue with the same code that we did before,
Our uniform output parameters are sample_rate=48000,sample_channel=2, and sample_fml=AV_SAMPLE_FMT_S16
.SwrContext *asctx = swr_alloc(); // Set the resampling parameter asctx = sWR_alloc_set_opts (asCTx // resampling context, av_get_default_channel_layout(2)// Output channel format, // Output sampling rate, av_get_default_channel_layout(actx->channels) Actx ->sample_fmt // Input audio sample format, actx->sample_rate, 0, 0 // Input audio sample rate); // Initialize the sampling context re = swr_init(asctx); if (re ! = 0) { char buf[1024] = {0}; av_strerror(re, buf, sizeof(buf) - 1); cout << "audio swr_init failed!" << buf << endl; return -1; }... // Unsigned char * PCM = NULL; for (;;) { int re = av_read_frame(ic, pkt); if (re ! = 0) {/ / looping cout < < "= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = end = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =" < < endl; // int ms = 3000; // long long pos = (double) ms/(double) 1000 * r2d(IC ->streams[PKT ->stream_index]->time_base); // av_seek_frame(ic, videoStream, pos, AVSEEK_FLAG_BACKWARD | AVSEEK_FLAG_FRAME); // continue; break; } cout << "pkt->size = " << pkt->size << endl; / / display the time cout < < "PKT - > PTS =" < < PKT - > PTS < < endl; Cout << "PKT -> PTS ms =" << PKT -> PTS * (r2d(IC ->streams[PKT ->stream_index]->time_base)) << endl; / / decoding time cout < < "PKT - > DTS =" < < PKT - > DTS < < endl; AVCodecContext *avcc = NULL; If (PKT ->stream_index == videoStream) {cout << "image" << endl; avcc = vctx; } if (PKT ->stream_index == audioStream) {cout << "audio" << endl; avcc = actx; Re = avCODEC_send_packet (avcc, PKT); Av_packet_unref (PKT); av_packet_unref(PKT); if (re ! = 0) { char buf[1024] = {0}; av_strerror(re, buf, sizeof(buf) - 1); cout << "video avcodec_send_packet failed!" << buf << endl; continue; } // Receive for (;;) { re = avcodec_receive_frame(avcc, frame); if (re ! = 0)break; . If (avcc == actx) {// uint8_t *data[2] = {0}; if (! pcm) pcm = new uint8_t[frame->nb_samples * 16/8 * 2]; data[0] = {pcm}; Int len = swr_convert(asctx, data, frame->nb_samples // output, (const uint8_t **) frame->data, frame->nb_samples // input); if (len >= 0) { cout << "swr_convert success return len = " << len << endl; } else { cout << "swr_convert failed return len = " << len << endl; }}}... } if (asctx)swr_close(asctx); if (asctx)swr_free(&asctx);
Copy the code
Log after conversion:
swr_convert success return len = 1024
Copy the code
The seek operation
FFmpeg provides the av_seek_frame function to jump to the video. It has 4 input parameters. The meanings are as follows:
/** * seek to key frame operations based on timestamp and audio/video index ** @param s media format context * @param stream_index Stream index, * @param timestamp * @param flags seek * @return >= 0 on success */ int av_seek_frame(AVFormatContext *s, int stream_index, int64_t timestamp, int flags);Copy the code
Let’s focus on the last flags argument
//AVSEEK_FLAG_BACKWARD
Seek to the following key frame
//AVSEEK_FLAG_BYTE
Based on location in bytes
//AVSEEK_FLAG_ANY
Seek to any frame, note that it is not a key frame, then there is a possibility of splashes.
//AVSEEK_FLAG_FRAME
Seek to the position of the key frame
We usually seek in this form:
int ms = 3000; // The three second position is converted according to the time base (score)
long long pos = (double) ms / (double) 1000 * r2d(ic->streams[pkt->stream_index]->time_base);
av_seek_frame(ic, videoStream, pos, AVSEEK_FLAG_BACKWARD | AVSEEK_FLAG_FRAME);
Copy the code
The above means that the play starts at the key frame behind the 3000 ms position. After the player seek function when we will introduce how accurate seek operation
That’s all we have to say about FFmpeg in this article, and I’ll go into more detail later on in the development process if there are new ones.
conclusion
That’s about all you need to know about FFmpeg, and you can see that these apis are pretty simple. At this point, I’m sure you have an impression and an understanding of the apis.
The next article will show you how QT renders PCM and YUV data.
The above code can be accessed from this address