1. FFmpeg introduction and clipping
1.1 introduction of FFmpeg
FFmpegFast ForWord (MPEG) is an open source, free cross-platform video and audio streaming solution that provides complete solutions for recording/audio/video codec, conversion and streaming.ffmpeg4.0.2The source directory structure is as follows:Catalog Description:FFmpeg| – compat directory to store is the compatible file, in order to compatible with earlier versions | | – doc specifications document – ffbuild | –libavcodec
Audio and video codec core library | –libavdevice
Input and output devices, such as the Video4Linux2, VfW, DShow and with ALSA | –libavfilter
Filter effects processing | –libavformat
I/O operations and encapsulation format (muxer/demuxer) | — –libavswresample
Audio resampling, format conversion and mixing | – (1) resampling: change the audio sampling rate, such as reduced from 44100 hz to 8000 hz | – (2) matrix again: Change the number of audio channels, such as from stereo sound channel (stereo) into a single word (mono) | – (3) format conversion: change the audio sampling size, such as cut from each sample size 16 bits to 8 bits | –libavutil
Tool library, such as arithmetic operation, operation characters | –libpostproc
Late effects of treatment, such as image to block effect | –libswscale
Video pixel processing, including scaling transformation, image size, color map pixel color space conversion | – presets | — – | – configure tests test instance configuration file, compiled using ffmpeg
1.2 Command Line Tools
FFmpeg framework also provides several tools for executing the command line to complete audio and video data processing, including FFplay, FFProbe, FFServer, explained as follows:
- ffplay
Fast forWord Play, with FFMPEG implementation of the player
- ffserver
Fast Forword Server, RTSP server with FFMPEG implementation
- ffprobe
Fat Forword probe is used to input analysis input streams
2. FFmpeg architecture analysis
In section 1.1, we give a brief introduction to the overall ARCHITECTURE of FFmpeg and describe the functions of each module in the framework. On this basis, this section will focus on the use of FFmpeg audio and video development involved in the important steps, data structures and related functions.
2.1 Main points of FFmpeg processing
Generally speaking, FFmpeg framework is mainly used for multimedia data decompression, decomsealing, decoding and transcoding operations. In order to have a more intuitive understanding of FFmpeg in the application of video and audio, the following flow chart is given to parse RTSP network flow. The figure demonstrates that from opening RTSP stream, The general process of extracting decoded data or transcoding is as follows:Explanation of terms:
muxer
: Video and audio multiplexer (wrapper), which combines video files, audio files, and subtitle files (if any) into a video format, such as a.avi, A.MP3, a.srt into MKV video files;demuxer
: Audio-visual separator (unpacker), the reverse of Muxer;transcode
Transcoding, the conversion of video and audio data from one format to another format;- Real-time Transport Protocol (RTP) is a UDP-based network Transport Protocol. It is located between the application layer and the Transport layer and is responsible for streaming media data encapsulation and real-time transmission of media streams.
ES flow
Elementary Streams, also known as video/audio raw Streams, are data Streams directly output from an encoder. They can be video Streams (such as H.264 and MJPEG) or audio Streams (such as AAC).PES stream
: Packetized Elementary Streams, a PES stream is a data structure used for conveying ES, which is formed by grouping, packaging, adding header information, and so on.Solutions of the agreement
: To retrieve irrelevant packet information of network data flow to obtain real video and audio data. Common protocols include RTSP, RTMP, HTTP, and MMS.decapsulation
: Demuxer, the package format can be.mp4/.avi/.flv/.mkv, etc.decoding
: Restore the encoded data to the original content, such as h.264 decoded to YUV, AAC decoded to PCM, etc.
2.1 Important structures of FFmpeg
There are many important structures in FFmpeg, such as the I/O structures AVIOContext, URLContext, URLProtocol, AVFormatContext, AVInputFormat, AVOutputFormat, AVCodec, AVCodecContext AVStream, AVPacket, AVFrame and other structures related to audio and video data. When I first came into contact with FFmpeg, I felt it was a little difficult to distinguish between these structures. Fortunately, there was a “big brother” among these structures, AVFormatContext, which can be said to be throughout the development of FFmpeg, “like a god”. The AVFormatContext structure is used to define the difference between AVFormatContext and AVFormatContext.
AVFormatContext
The AVFormatContext structure describes the composition and basic information of a multimedia file or stream. It is the most basic structure in FFmpeg and the root of all other structures. Among them, the member variables iformat and oformat point to the corresponding demuxing(unencapsulation) and Muxing (encapsulation) Pointers, and the variable types are AVInputFormat and AVOutputFormat respectively. Pb is a pointer to the control underlying data read and write, and the variable type is AVIOContext. Nb_streams indicates the number of data streams in a multimedia file or multimedia stream. Streams is a secondary pointer to all streams. The variable type is AVStream. Video_codec and audio_codec represent video and audio codecs, respectively, with variables of type AVCodec and so on. AVFormatContext libavformat/avformat.h AVFormatContext libavformat/avformat.h
typedef struct AVFormatContext {
const AVClass *av_class;
// Enter the container format
// Set only when avformat_open_input() is called, Demuxing only
struct AVInputFormat *iformat;
// Output the container format
// Set only when avformat_alloc_output_context2() is called, and encapsulation only (Muxing)
struct AVOutputFormat *oformat;
/** * Format private data. This is an AVOptions-enabled struct * if and only if iformat/oformat.priv_class is not NULL. * * - muxing: set by avformat_write_header() * - demuxing: set by avformat_open_input() */
void *priv_data;
// Input/output (I/O) cache
Demuxing: The value is set by avformat_open_input()
// Wrap (muxing) : The value is set by avio_open2 and must precede avformat_write_header()
AVIOContext *pb;
// stream info
int ctx_flags;
// The number of data streams in avformatContext. streams
The value is set by avformat_new_stream()
unsigned int nb_streams;
// List of all streams in the file. Create a new stream by calling avformat_new_stream()
// When avformat_free_context() is called, the Streams resource is released
Demuxing: When avformat_open_input() is called, the streams value is populated
// Wrap (muxing) : Streams are created by the user before calling avformat_write_header()
//
AVStream **streams;
/ / input or output file name, such as input: RTSP: / / 184.72.239.149 / vod/mp4: BigBuckBunny_115k. Mov
Demuxing: set when avformat_open_input() is called
// muxing: set after calling avformat_alloc_output_context2(), and before calling avformat_write_header()
char filename[1024];
// The first frame position of component, set by libavformat only when Demuxing
int64_t start_time;
// The stream duration, set by libavformat only when Demuxing
int64_t duration;
// Total bit rate (bit/s), including audio and audio
int64_tbit_rate; .// Video codec ID
// Note: Demuxing is set by the user
enum AVCodecID video_codec_id;
// Audio codec ID
// Note: Demuxing is set by the user
enum AVCodecID audio_codec_id;
// Subtitle codec ID
// Note: Demuxing is set by the user
enum AVCodecID subtitle_codec_id;.// File Metadata
Demuxing: set when avformat_open_input is called
// muxing: set before calling avformat_write_header()
// Note: Metadata resources are freed by libavformat when avformat_free_context() is called
AVDictionary *metadata;
// The real time when the stream starts
int64_tstart_time_realtime; .// Video codec, specified by user when Demuxing
AVCodec *video_codec;
// Audio codec, specified by user when Demuxing
AVCodec *audio_codec;
// Subtitle codec, specified by user when Demuxing
AVCodec *subtitle_codec;
// Data codec, specified by user when DemuxingAVCodec *data_codec; .// Data codec ID
enum AVCodecID data_codec_id;. } AVFormatContextCopy the code
1. Reuse (muxing)/ dereuse (demuxing)
(1) AVInputFormat structure
AVInputFormat for demultiplexing/decapsulating (Demuxing) object, it contains information about the multiplexer and operation functions, such as name member variable for the name of the specified packaging format, such as “AAC”, “mov”, etc. The read_header member function reads the encapsulated header data; The read_packet member function reads an AVPacket and so on. AVInputFormat (in libavformat/avformat.h)
typedef struct AVInputFormat {
// Package format name, such as "mp4", "mov", etc
const char *name;
// Package format alias
const char *long_name;
int flags;
const char *extensions;
const struct AVCodecTag * const *codec_tag;
const AVClass *priv_class;
const char *mime_type;
//
struct AVInputFormat *next;
int raw_codec_id;
// The format corresponds to the size of the Context, such as MovContext
int priv_data_size;
int (*read_probe)(AVProbeData *);
// Read the format header and initialize the AVFormatContext structure
// If successful, return 0
int (*read_header)(struct AVFormatContext *);
// Read the packet size data and store it in the memory pointed by PKT
// If successful, return 0; On failure, a negative number is returned and PKT will not be allocated memory
int (*read_packet)(struct AVFormatContext *, AVPacket *pkt);
// Close the stream without freeing AVFormatContext and AVStreams
int (*read_close)(struct AVFormatContext *);
* @param stream_index Stream_index, cannot be -1 * @param flags for direction, if there is no exact match * @return >= 0 successfully */
int (*read_seek)(struct AVFormatContext *,
int stream_index, int64_t timestamp, int flags);
/** * Get the next timestamp for the stream [stream_index] * @return timestamp or AV_NOPTS_VALUE(when an error occurs) */
int64_t (*read_timestamp)(struct AVFormatContext *s, int stream_index,
int64_t *pos, int64_t pos_limit);
// Start/resume playing - RTSP only
int (*read_play)(struct AVFormatContext *);
// Pause playing - RTSP only
int (*read_pause)(struct AVFormatContext *);
Avdevice_list_devices ()
int (*get_device_list)(struct AVFormatContext *s, struct AVDeviceInfoList *device_list);
// Initialize the device function submodule, see avdevice_capabilities_create()
int (*create_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
For details, see avdevice_capabilities_free()
int (*free_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
} AVInputFormat;
Copy the code
By calling av_register_all(), all of FFmpeg’s demultiplexers are stored in a linked list with first_iformat as a header pointer and last_iformat as a trailing pointer. Libavformat/aACdec.c: AAC demultiplexer: AAC demultiplexer: AAC demultiplexer: AAC demultiplexer: AAC demultiplexer
AVInputFormat ff_aac_demuxer = {
.name = "aac".// Specify the name of the demultiplexer
.long_name = NULL_IF_CONFIG_SMALL("raw ADTS AAC (Advanced Audio Coding)"), // Specify the file format for AAC
.read_probe = adts_aac_probe, // Probe function
.read_header = adts_aac_read_header, // Read the header data function
.read_packet = adts_aac_read_packet, // Read the packet function
.flags = AVFMT_GENERIC_INDEX,
.extensions = "aac"./ / the suffix
.mime_type = "audio/aac,audio/aacp,audio/x-aac",
.raw_codec_id = AV_CODEC_ID_AAC,// AAC decoder ID
};
Copy the code
(2) AVOutputFormat structure
In contrast to AVInputFormat, AVOtputFormat is a multiplexer/encapsulation (MuXING) object, which contains the multiplexer related information and operation functions, such as the name member variable is the name of the specified encapsulation format, such as “MP4”, “3GP”, etc. The write_header member function reads the encapsulated header data. The write_packet member function writes an AVPacket and so on. AVOutputFormat (libavformat/avformat.h)
typedef struct AVOutputFormat {
// Package format name, such as "mp4"
const char *name;
// File format
const char *long_name;
/ / the mime type
const char *mime_type;
const char *extensions; /**< comma-separated file extension */
/* output support */
enum AVCodecID audio_codec; /**< default audio codec */
enum AVCodecID video_codec; /**< default video codec */
enum AVCodecID subtitle_codec; /**< default subtitle codec */
/** * flags Can be set to: AVFMT_NOFILE, AVFMT_NEEDNUMBER, * AVFMT_GLOBALHEADER, AVFMT_NOTIMESTAMPS, AVFMT_VARIABLE_FPS, * AVFMT_NODIMENSIONS, AVFMT_NOSTREAMS, AVFMT_ALLOW_FLUSH, * AVFMT_TS_NONSTRICT, AVFMT_TS_NEGATIVE */
int flags;
const struct AVCodecTag * const *codec_tag;
const AVClass *priv_class; ///< AVClass for the private context
struct AVOutputFormat *next;
// Size of private data
int priv_data_size;
/ / write the header
int (*write_header)(struct AVFormatContext *);
// Write a packet. If flags=AVFMT_ALLOW_FLUSH, PKT can be NULL to flush the buffered data in the muxer
// Return 0, indicating that the buffer is still flush; Returns 1, indicating no flush data in the buffer
int (*write_packet)(struct AVFormatContext *, AVPacket *pkt);
int (*write_trailer)(struct AVFormatContext *);
// If it is not YUV420P, only pixel format is currently supported
int (*interleave_packet)(struct AVFormatContext *, AVPacket *out,
AVPacket *in, int flush);
// Tests whether the given codec can be stored in the container
int (*query_codec)(enum AVCodecID id, int std_compliance);
void (*get_output_timestamp)(struct AVFormatContext *s, int stream,
int64_t *dts, int64_t *wall);
int (*control_message)(struct AVFormatContext *s, int type,
void *data, size_t data_size);
// Write unencoded AVFrame frame data. See av_write_uncoded_frame().
int (*write_uncoded_frame)(struct AVFormatContext *, int stream_index,
AVFrame **frame, unsigned flags);
/** * Returns device list with it properties. * @see avdevice_list_devices() for more details. */
int (*get_device_list)(struct AVFormatContext *s, struct AVDeviceInfoList *device_list);
/** * Initialize device capabilities submodule. * @see avdevice_capabilities_create() for more details. */
int (*create_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
// Release device function submodule, see avdevice_capabilities_free().
int (*free_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
enum AVCodecID data_codec; /**< default data codec */
/** * Initialize the format. Allocate the data memory, set the AVFormatContext or AVStream parameter, and use it with deinit() to release the allocated memory resources. If 1 is returned, the configuration fails. * /
int (*init)(struct AVFormatContext *);
/** Release the memory resources allocated by init, regardless of whether init() is called successfully */
void (*deinit)(struct AVFormatContext *);
/** Check for bitstreams * If 0 is returned, more packets need to check for streams; Return -1, then */ is not required
int (*check_bitstream)(struct AVFormatContext *, const AVPacket *pkt);
} AVOutputFormat;
Copy the code
Similarly, by calling av_register_all(), all of FFmpeg’s multiplexers are stored in a linked list with first_oformat as the header pointer and last_oformat as the tail pointer. Libavformat/movenc.c initializing AVOutputFormat libavformat/ movenc.c
AVOutputFormat ff_mp4_muxer = {
.name = "mp4".// The name of the multiplexer
.long_name = NULL_IF_CONFIG_SMALL("MP4 (MPEG-4 Part 14)"), // The file format of mp4
.mime_type = "video/mp4"./ / the MIME type
.extensions = "mp4".// File name extension
.priv_data_size = sizeof(MOVMuxContext),
.audio_codec = AV_CODEC_ID_AAC,// Audio encoder ID
.video_codec = CONFIG_LIBX264_ENCODER ?
AV_CODEC_ID_H264 : AV_CODEC_ID_MPEG4,// Video encoder ID
.init = mov_init, // Initialize the function
.write_header = mov_write_header, // Write the header
.write_packet = mov_write_packet, / / write Packet
.write_trailer = mov_write_trailer,
.deinit = mov_free, // Release resources
.flags = AVFMT_GLOBALHEADER | AVFMT_ALLOW_FLUSH | AVFMT_TS_NEGATIVE,
.codec_tag = (const AVCodecTag* const []){ codec_mp4_tags, 0 },
.check_bitstream = mov_check_bitstream,
.priv_class = &mp4_muxer_class,
};
Copy the code
Input/Output (I/O)
(1) AVIOContext structure
AVIOContext is an FFmpeg structure that manages input/output (I/O) data. It is the top layer of protocol (file) operations and provides buffered read/write operations. For the meaning of read and write operations and member variables, you can see the following comment diagram in the source code:
- Read data:
- Write data:
AVIOContext is in libavformat/avio.h.
typedef struct AVIOContext {
const AVClass *av_class;
unsigned char *buffer; // Data buffer
int buffer_size; // Cache size
unsigned char *buf_ptr; // The pointer points to the current position of the buffer area, which can be smaller than buffer+buffer.size
unsigned char *buf_end; // Reads/writes the end of the data in the cache
// Private pointer, associated with URLContext structure, as read/write/seek/... Function parameters
// It is used to read and write generalized input files, pointing to an URLContext object
void *opaque;
// Read the packet data
int (*read_packet)(void *opaque, uint8_t *buf, int buf_size);
// Write data to packet
int (*write_packet)(void *opaque, uint8_t *buf, int buf_size);
/ / position
int64_t (*seek)(void *opaque, int64_t offset, int whence);
int64_t pos; // The location of the current cache area in the file
int eof_reached; // If the end of the file is reached, true indicates that the end is reached
int write_flag; // Writable flag, true indicates that open is writable
int max_packet_size; // Maximum size of packet
unsigned long checksum;
unsigned char *checksum_ptr;
unsigned long (*update_checksum)(unsigned long checksum, const uint8_t *buf, unsigned int size);
// Error code, 0 means no error
int error;
// The network streaming protocol pauses or resumes playback
int (*read_pause)(void *opaque, int pause);
int64_t (*read_seek)(void *opaque, int stream_index,
int64_t timestamp, int flags);
// 0 indicates that network traffic cannot seek
int seekable;
// Write buffer to look back to the previous maximum reached location, used to keep track of written data for later flushing
unsigned char *buf_ptr_max;
// The minimum size of the packet
int min_packet_size;
// Most of the following fields are only used internally by libavformat, or are only used sparingly, and are not explained here
int64_t maxsize;
int direct;
int64_t bytes_read;
int seek_count;
int writeout_count;
int orig_buffer_size;
int short_seek_threshold;
const char *protocol_whitelist;
const char *protocol_blacklist;
int (*write_data_type)(void *opaque, uint8_t *buf, int buf_size,
enum AVIODataMarkerType type, int64_t time);
int ignore_boundary_point;
enum AVIODataMarkerType current_type;
int64_t last_time;
int (*short_seek_get)(void *opaque);
int64_t written;
} AVIOContext;
Copy the code
Opaque, a member variable of AVIOContext, points to an URLContext object. URLContext is the context for operations on specific resource files. It includes a pointer variable prot of the URLProtocol structure type. URLProtocol is a set of functions that operate on a certain type of resources on the basis of classifying resources. The URLContext structure reads as follows:
typedef struct URLContext {
const AVClass *av_class;
// Correlate/points to the corresponding generalized input file
const struct URLProtocol *prot;
// Associate a handle to a specific generalized input file, such as fd for a file handle and socket for a network handle
void *priv_data;
char *filename; // The specified URL
int flags;
int max_packet_size;
int is_streamed; // True is a stream. The default is false
int is_connected;
AVIOInterruptCB interrupt_callback;
int64_t rw_timeout; // Read /write operation timeout period
const char *protocol_whitelist;
const char *protocol_blacklist;
int min_packet_size;
} URLContext;
Copy the code
(2) URLProtocol structure
URLProtocol structure represents the generalized input file, is FFmpeg operation I/O structure, including files, network data streams (TCP, RTP,…) And so on, each protocol corresponds to a URLProtocol structure. This structure is located in libavformat/url.h file, including open, close, read, write, seek, etc., part of the source code is as follows:
typedef struct URLProtocol {
// Protocol name
const char *name;
int (*url_open)( URLContext *h, const char *url, int flags);
int (*url_open2)(URLContext *h, const char *url, int flags, AVDictionary **options);
int (*url_accept)(URLContext *s, URLContext **c);
int (*url_handshake)(URLContext *c);
/** * Read data from the protocol. */
int (*url_read)( URLContext *h, unsigned char *buf, int size);
int (*url_write)(URLContext *h, const unsigned char *buf, int size);
int64_t (*url_seek)( URLContext *h, int64_t pos, int whence);
int (*url_close)(URLContext *h);
int (*url_read_pause)(URLContext *h, int pause);
int64_t (*url_read_seek)(URLContext *h, int stream_index,
int64_t timestamp, int flags);
int (*url_get_file_handle)(URLContext *h);
int (*url_get_multi_file_handle)(URLContext *h, int **handles,
int *numhandles);
int (*url_get_short_seek)(URLContext *h);
int (*url_shutdown)(URLContext *h, int flags);
int priv_data_size;
const AVClass *priv_data_class;
int flags;
int (*url_check)(URLContext *h, int mask);
int (*url_open_dir)(URLContext *h);
int (*url_read_dir)(URLContext *h, AVIODirEntry **next);
int (*url_close_dir)(URLContext *h);
int (*url_delete)(URLContext *h);
int (*url_move)(URLContext *h_src, URLContext *h_dst);
const char *default_whitelist;
} URLProtocol;
Copy the code
Next, the HTTP protocol is taken as an example to describe the initialization process of the URLProtocol structure. At the same time, it is proved that each protocol (including files) corresponds to an URLProtocol object. Libavformat/http.c:
const URLProtocol ff_http_protocol = {
.name = "http".// Protocol name
.url_open2 = http_open, / / the open operation
.url_accept = http_accept,/ / the accept operation
.url_handshake = http_handshake,// Handle the handshake
.url_read = http_read, // Read data operation
.url_write = http_write, // Write data operation
.url_seek = http_seek, / / the seek operations
.url_close = http_close, / / close operation
.url_get_file_handle = http_get_file_handle,
.url_get_short_seek = http_get_short_seek,
.url_shutdown = http_shutdown,
.priv_data_size = sizeof(HTTPContext),
.priv_data_class = &http_context_class,
.flags = URL_PROTOCOL_FLAG_NETWORK,
.default_whitelist = "http,https,tls,rtp,tcp,udp,crypto,httpproxy"
};
Copy the code
3. Codec/decode
(1) AVCodec structure
AVCodec is a data structure closely related to codec (CODEC). It contains coDEC-related attribute parameters and codec operation functions, such as the name of CODEC, pix_FMTS video frame pixel format of CODEC, etc. Each CODEC corresponds to an AVCodec structure. The AVCodec structure is as follows:
typedef struct AVCodec {
// Codec name
const char *name;
// Describe the name of the codec
const char *long_name;
// media type
enum AVMediaType type;
// The ID of the codec
enum AVCodecID id;
int capabilities;
// The parameters associated with the CODEC
const AVRational *supported_framerates;
// This coDEC supports pixel formats for video frames/images
const enum AVPixelFormat *pix_fmts;
// The sampling rate that the CODEC supports for audio
const int *supported_samplerates;
// The coDEC supports the sampling format for audio
const enum AVSampleFormat *sample_fmts;
// The channel layout of the CODEC
const uint64_t *channel_layouts;
// Maximum low resolution supported by the decoder
uint8_t max_lowres;
const AVClass *priv_class;
const AVProfile *profiles;
const char *wrapper_name;
int priv_data_size;
struct AVCodec *next;
int (*init_thread_copy)(AVCodecContext *);
int (*update_thread_context)(AVCodecContext *dst, const AVCodecContext *src);
const AVCodecDefault *defaults;
// The avcodec_register() function is called,
// Static data used to initialize coDEC
void (*init_static_data)(struct AVCodec *codec);
/ / initialization
int (*init)(AVCodecContext *);
int (*encode_sub)(AVCodecContext *, uint8_t *buf, int buf_size,
const struct AVSubtitle *sub);
/** * Save encoded data to AVPacket * * @param avctx Codec context * @param AVpkt output AVPacket * @param[in] frame * @param[out] got_packet_ptr encoder set to 0 or 1 to indicate non-empty packets returned in AVPKt * @return 0 operation successful */
int (*encode2)(AVCodecContext *avctx, AVPacket *avpkt, const AVFrame *frame,
int *got_packet_ptr);
// Decode operation
int (*decode)(AVCodecContext *, void *outdata, int *outdata_size, AVPacket *avpkt);
/ / close the codec
int (*close)(AVCodecContext *);
// Encode API with decoupled packet/frame dataflow.
int (*send_frame)(AVCodecContext *avctx, const AVFrame *frame);
int (*receive_packet)(AVCodecContext *avctx, AVPacket *avpkt);
// Decode API with decoupled packet/frame dataflow.
int (*receive_frame)(AVCodecContext *avctx, AVFrame *frame);
// Flush buffer, where the seeking operation is called
void(*flush)(AVCodecContext *); . } AVCodec;Copy the code
(2) AVCodecContext structure
You may notice that in addition to AVCodec, which is very important for codecs, there is another structure that occurs very frequently in the member functions of AVCodec. It is safe to say that most codec-related functions need to pass in a structure parameter, and that structure is AVCodecContext. The AVCodecContext structure stores information about codecs used by video or audio streams, such as codec_type for the type of codec, codec for the codec used, and so on. The AVCodecContext structure is as follows:
typedef struct AVCodecContext {
enum AVMediaType codec_type; /* Type of codec (video, audio...) * /
const struct AVCodec *codec;// AVCodec (H.264,MPEG2...)
enum AVCodecID codec_id; /* see AV_CODEC_ID_xxx */
// Bitrate (average bitrate of audio and video)
int64_t bit_rate;
// The level of compression encoding
int compression_level;
// Additional information contained for a specific encoder (e.g., store SPS, PPS, etc., for h.264 decoders)
uint8_t *extradata;
int extradata_size;
/ / time base
// With this parameter, PTS can be converted to the actual time (in seconds).
AVRational time_base;
// Image width, height, for video
int width, height;
// Pixel format, for video
enum AVPixelFormat pix_fmt;
// Get the pixel format
enum AVPixelFormat (*get_format)(struct AVCodecContext *s, const enum AVPixelFormat * fmt);
// The maximum number of b-frames between non-B-frames
int max_b_frames;
// Qscale factor between I/P frames and B frames
float b_quant_factor;
// Sample aspect ratio
AVRational sample_aspect_ratio;
// Number of audio samples per frame
int frame_size;
// Audio channel layout
uint64_t channel_layout;
/ / frame rateAVRational framerate; . } AVCodecContext;Copy the code
4. Data related structures
(1) AVStream structure
The AVStream structure is used to store information about a video or audio stream, where the field NB_frames indicates how many frames the stream contains, the field duration indicates the length of the stream, whether the field index is an audio or video stream, and so on.
typedef struct AVStream {
// Flags the video stream or audio stream, stored in AVFormatContext
int index; /**< stream index in AVFormatContext */
// Point to the AVCodecContext of the video/audio stream
// @deprecated use the codecpar struct instead
AVCodecContext *codec;
/ / time base. This value can be used to convert PTS, DTS to real time
AVRational time_base;
// The length of the video/audio stream
int64_t duration;
// The number of frames for this video/audio stream
int64_t nb_frames;
// Metadata information
AVDictionary *metadata;
// Frame rate (important for video)
AVRational avg_frame_rate;
// The accompanying image. For example, some MP3, AAC audio files come with album coversAVPacket attached_pic; .// The Codec parameter associated with the video stream or audio stream
// Allocated by avformat_new_stream() and released by avformat_free_context()
AVCodecParameters *codecpar;
} AVStream;
Copy the code
(2) AVPacket structure
The AVPacket structure is used to store information related to compressed encoded video or audio data, where the field Stream_index indicates whether AVPacket belongs to an audio stream or video stream. For h.264, for example, one AVPacket data corresponds to one NAL, and one NAL stores one image. AVPacket structure source code is as follows:
typedef struct AVPacket {
AVBufferRef *buf;
/** * Presentation timestamp in AVStream->time_base units; the time at which * the decompressed packet will be presented to the user. */
// Presentation timestamp, i.e
int64_t pts;
/** * Decompression timestamp in AVStream->time_base units; the time at which * the packet is decompressed. */
// Decompression timestamp
int64_t dts;
// Compress encoded video or audio data
uint8_t *data;
// Size of data
int size;
// Indicates whether the AVPacket belongs to an audio stream or a video stream
int stream_index;
int flags;
AVPacketSideData *side_data;
int side_data_elems;
/** * Duration of this packet in AVStream->time_base units, 0 if unknown. * Equals next_pts - this_pts in presentation order. */
// The AVPacket length
int64_t duration;
// The byte position of the AVPacket in the stream. -1 indicates unknown
int64_t pos;
} AVPacket;
Copy the code
(3) AVFrame structure
The AVFrame structure is used to store information related to decoded video/audio data, representing a frame of data. If AVFrame is a video frame data structure, the field data array stores a frame image, the field width and height are the width and height of the image, key_frame is the key frame flag, and so on. If AVFrame is an audio data structure, the field data array stores audio data, which can contain multiple frames of audio, sample_rate is the sampling rate of audio, channels is the number of audio channels, and so on. AVFrame structure source code is as follows:
typedef struct AVFrame {
// Decoded raw data (video-yuV or RGB; Audio - PCM)
// For packed data (e.g. RGB24), it is stored in data[0]
// For planNAR data (e.g. YUV420P), Y component is stored in data[0], U component is stored in data[1], and V component is stored in data[2].
uint8_t *data[AV_NUM_DATA_POINTERS];
// data Indicates the length of a row
// Note: if the image is not necessarily equal to the width of the image, usually greater than the width of the image
int linesize[AV_NUM_DATA_POINTERS];
// Width and height of the video frame
int width, height;
// This AVFrame contains several audio frames
int nb_samples;
// The original data type after decoding, such as YUV420, RGB..
// For audio, see AVSampleFormat
// For video, see AVPixelFormat
int format;
// Whether it is a keyframe is very important for video
// 1 -> keyframe, 0-> not
int key_frame;
// Frame type, such as I frame, B frame, P frame...
enum AVPictureType pict_type;
// Video frame width ratio, e.g. 16:9, 4:3...
AVRational sample_aspect_ratio;
// Display the timestamp
int64_t pts;
// Encode the image frame number
int coded_picture_number;
// Display the image frame number
int display_picture_number;
// Audio sampling rate
int sample_rate;
// Audio channel Layout
uint64_t channel_layout;
// YUV color space type
enum AVColorSpace colorspace;
/ / metadata
AVDictionary *metadata;
// Number of audio channels
intchannels; . } AVFrame;Copy the code
At this point, FFmpeg framework in the most important structure, we basically explain the comb out. Finally, borrow againThe thortheFFmpeg key structure diagramAs the end, one is to make this article can echo before and after, two is to pay tribute to god!
Making practical project: https://github.com/jiangdongguo/FFMPEG4Android