FFmpeg from the beginning to the magic (1) : a preliminary study of FFmpeg framework

1. FFmpeg introduction and clipping

1.1 introduction of FFmpeg

FFmpegFast ForWord (MPEG) is an open source, free cross-platform video and audio streaming solution that provides complete solutions for recording/audio/video codec, conversion and streaming.ffmpeg4.0.2The source directory structure is as follows:Catalog Description:FFmpeg| – compat directory to store is the compatible file, in order to compatible with earlier versions | | – doc specifications document – ffbuild | –libavcodecAudio and video codec core library | –libavdeviceInput and output devices, such as the Video4Linux2, VfW, DShow and with ALSA | –libavfilterFilter effects processing | –libavformatI/O operations and encapsulation format (muxer/demuxer) | — –libavswresampleAudio resampling, format conversion and mixing | – (1) resampling: change the audio sampling rate, such as reduced from 44100 hz to 8000 hz | – (2) matrix again: Change the number of audio channels, such as from stereo sound channel (stereo) into a single word (mono) | – (3) format conversion: change the audio sampling size, such as cut from each sample size 16 bits to 8 bits | –libavutilTool library, such as arithmetic operation, operation characters | –libpostprocLate effects of treatment, such as image to block effect | –libswscaleVideo pixel processing, including scaling transformation, image size, color map pixel color space conversion | – presets | — – | – configure tests test instance configuration file, compiled using ffmpeg

1.2 Command Line Tools

FFmpeg framework also provides several tools for executing the command line to complete audio and video data processing, including FFplay, FFProbe, FFServer, explained as follows:

ffplay

Fast forWord Play, with FFMPEG implementation of the player

ffserver

Fast Forword Server, RTSP server with FFMPEG implementation

ffprobe

Fat Forword probe is used to input analysis input streams

2. FFmpeg architecture analysis

In section 1.1, we give a brief introduction to the overall ARCHITECTURE of FFmpeg and describe the functions of each module in the framework. On this basis, this section will focus on the use of FFmpeg audio and video development involved in the important steps, data structures and related functions.

2.1 Main points of FFmpeg processing

Generally speaking, FFmpeg framework is mainly used for multimedia data decompression, decomsealing, decoding and transcoding operations. In order to have a more intuitive understanding of FFmpeg in the application of video and audio, the following flow chart is given to parse RTSP network flow. The figure demonstrates that from opening RTSP stream, The general process of extracting decoded data or transcoding is as follows:Explanation of terms:

muxer: Video and audio multiplexer (wrapper), which combines video files, audio files, and subtitle files (if any) into a video format, such as a.avi, A.MP3, a.srt into MKV video files;
demuxer: Audio-visual separator (unpacker), the reverse of Muxer;
transcodeTranscoding, the conversion of video and audio data from one format to another format;
Real-time Transport Protocol (RTP) is a UDP-based network Transport Protocol. It is located between the application layer and the Transport layer and is responsible for streaming media data encapsulation and real-time transmission of media streams.
ES flowElementary Streams, also known as video/audio raw Streams, are data Streams directly output from an encoder. They can be video Streams (such as H.264 and MJPEG) or audio Streams (such as AAC).
PES stream: Packetized Elementary Streams, a PES stream is a data structure used for conveying ES, which is formed by grouping, packaging, adding header information, and so on.
Solutions of the agreement: To retrieve irrelevant packet information of network data flow to obtain real video and audio data. Common protocols include RTSP, RTMP, HTTP, and MMS.
decapsulation: Demuxer, the package format can be.mp4/.avi/.flv/.mkv, etc.
decoding: Restore the encoded data to the original content, such as h.264 decoded to YUV, AAC decoded to PCM, etc.

2.1 Important structures of FFmpeg

There are many important structures in FFmpeg, such as the I/O structures AVIOContext, URLContext, URLProtocol, AVFormatContext, AVInputFormat, AVOutputFormat, AVCodec, AVCodecContext AVStream, AVPacket, AVFrame and other structures related to audio and video data. When I first came into contact with FFmpeg, I felt it was a little difficult to distinguish between these structures. Fortunately, there was a “big brother” among these structures, AVFormatContext, which can be said to be throughout the development of FFmpeg, “like a god”. The AVFormatContext structure is used to define the difference between AVFormatContext and AVFormatContext.

AVFormatContext

The AVFormatContext structure describes the composition and basic information of a multimedia file or stream. It is the most basic structure in FFmpeg and the root of all other structures. Among them, the member variables iformat and oformat point to the corresponding demuxing(unencapsulation) and Muxing (encapsulation) Pointers, and the variable types are AVInputFormat and AVOutputFormat respectively. Pb is a pointer to the control underlying data read and write, and the variable type is AVIOContext. Nb_streams indicates the number of data streams in a multimedia file or multimedia stream. Streams is a secondary pointer to all streams. The variable type is AVStream. Video_codec and audio_codec represent video and audio codecs, respectively, with variables of type AVCodec and so on. AVFormatContext libavformat/avformat.h AVFormatContext libavformat/avformat.h

typedef struct AVFormatContext {
    const AVClass *av_class;
	// Enter the container format
	// Set only when avformat_open_input() is called, Demuxing only
    struct AVInputFormat *iformat;
	// Output the container format
	// Set only when avformat_alloc_output_context2() is called, and encapsulation only (Muxing)
    struct AVOutputFormat *oformat;

    /** * Format private data. This is an AVOptions-enabled struct * if and only if iformat/oformat.priv_class is not NULL. * * - muxing: set by avformat_write_header() * - demuxing: set by avformat_open_input() */
    void *priv_data;
	// Input/output (I/O) cache
	Demuxing: The value is set by avformat_open_input()
	// Wrap (muxing) : The value is set by avio_open2 and must precede avformat_write_header()
    AVIOContext *pb;
	// stream info
    int ctx_flags;
	// The number of data streams in avformatContext. streams
	The value is set by avformat_new_stream()
    unsigned int nb_streams;
	// List of all streams in the file. Create a new stream by calling avformat_new_stream()
	// When avformat_free_context() is called, the Streams resource is released
	Demuxing: When avformat_open_input() is called, the streams value is populated
	// Wrap (muxing) : Streams are created by the user before calling avformat_write_header()
	// 
    AVStream **streams;
	/ / input or output file name, such as input: RTSP: / / 184.72.239.149 / vod/mp4: BigBuckBunny_115k. Mov
	Demuxing: set when avformat_open_input() is called
	// muxing: set after calling avformat_alloc_output_context2(), and before calling avformat_write_header()
    char filename[1024];
	// The first frame position of component, set by libavformat only when Demuxing
    int64_t start_time;
	// The stream duration, set by libavformat only when Demuxing
    int64_t duration;
	// Total bit rate (bit/s), including audio and audio
    int64_tbit_rate; .// Video codec ID
	// Note: Demuxing is set by the user
    enum AVCodecID video_codec_id;

	// Audio codec ID
	// Note: Demuxing is set by the user
    enum AVCodecID audio_codec_id;

	// Subtitle codec ID
	// Note: Demuxing is set by the user
    enum AVCodecID subtitle_codec_id;.// File Metadata
	Demuxing: set when avformat_open_input is called
	// muxing: set before calling avformat_write_header()
	// Note: Metadata resources are freed by libavformat when avformat_free_context() is called
    AVDictionary *metadata;

	// The real time when the stream starts
    int64_tstart_time_realtime; .// Video codec, specified by user when Demuxing
    AVCodec *video_codec;
	// Audio codec, specified by user when Demuxing
    AVCodec *audio_codec;
	// Subtitle codec, specified by user when Demuxing
    AVCodec *subtitle_codec;

    // Data codec, specified by user when DemuxingAVCodec *data_codec; .// Data codec ID
    enum AVCodecID data_codec_id;. } AVFormatContextCopy the code

1. Reuse (muxing)/ dereuse (demuxing)

(1) AVInputFormat structure

AVInputFormat for demultiplexing/decapsulating (Demuxing) object, it contains information about the multiplexer and operation functions, such as name member variable for the name of the specified packaging format, such as “AAC”, “mov”, etc. The read_header member function reads the encapsulated header data; The read_packet member function reads an AVPacket and so on. AVInputFormat (in libavformat/avformat.h)

typedef struct AVInputFormat {
    // Package format name, such as "mp4", "mov", etc
    const char *name;
	// Package format alias
    const char *long_name;
    int flags;
    const char *extensions;
    const struct AVCodecTag * const *codec_tag;
    const AVClass *priv_class; 
    const char *mime_type;
	// 
    struct AVInputFormat *next;
    int raw_codec_id;
    // The format corresponds to the size of the Context, such as MovContext
    int priv_data_size;
    int (*read_probe)(AVProbeData *);
     // Read the format header and initialize the AVFormatContext structure
     // If successful, return 0
    int (*read_header)(struct AVFormatContext *);
    // Read the packet size data and store it in the memory pointed by PKT
    // If successful, return 0; On failure, a negative number is returned and PKT will not be allocated memory
    int (*read_packet)(struct AVFormatContext *, AVPacket *pkt);
     // Close the stream without freeing AVFormatContext and AVStreams
    int (*read_close)(struct AVFormatContext *);
    * @param stream_index Stream_index, cannot be -1 * @param flags for direction, if there is no exact match * @return >= 0 successfully */
    int (*read_seek)(struct AVFormatContext *,
                     int stream_index, int64_t timestamp, int flags);
    /** * Get the next timestamp for the stream [stream_index] * @return timestamp or AV_NOPTS_VALUE(when an error occurs) */
    int64_t (*read_timestamp)(struct AVFormatContext *s, int stream_index,
                              int64_t *pos, int64_t pos_limit);
     // Start/resume playing - RTSP only
    int (*read_play)(struct AVFormatContext *);
    // Pause playing - RTSP only
    int (*read_pause)(struct AVFormatContext *);
	Avdevice_list_devices ()
    int (*get_device_list)(struct AVFormatContext *s, struct AVDeviceInfoList *device_list);
     // Initialize the device function submodule, see avdevice_capabilities_create()
    int (*create_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
     For details, see avdevice_capabilities_free()
    int (*free_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
} AVInputFormat;
Copy the code

By calling av_register_all(), all of FFmpeg’s demultiplexers are stored in a linked list with first_iformat as a header pointer and last_iformat as a trailing pointer. Libavformat/aACdec.c: AAC demultiplexer: AAC demultiplexer: AAC demultiplexer: AAC demultiplexer: AAC demultiplexer

AVInputFormat ff_aac_demuxer = {
    .name         = "aac".// Specify the name of the demultiplexer
    .long_name    = NULL_IF_CONFIG_SMALL("raw ADTS AAC (Advanced Audio Coding)"),  // Specify the file format for AAC
    .read_probe   = adts_aac_probe, // Probe function
    .read_header  = adts_aac_read_header, // Read the header data function
    .read_packet  = adts_aac_read_packet, // Read the packet function
    .flags        = AVFMT_GENERIC_INDEX,	
    .extensions   = "aac"./ / the suffix
    .mime_type    = "audio/aac,audio/aacp,audio/x-aac",
    .raw_codec_id = AV_CODEC_ID_AAC,// AAC decoder ID
};
Copy the code

(2) AVOutputFormat structure

In contrast to AVInputFormat, AVOtputFormat is a multiplexer/encapsulation (MuXING) object, which contains the multiplexer related information and operation functions, such as the name member variable is the name of the specified encapsulation format, such as “MP4”, “3GP”, etc. The write_header member function reads the encapsulated header data. The write_packet member function writes an AVPacket and so on. AVOutputFormat (libavformat/avformat.h)

typedef struct AVOutputFormat {
	// Package format name, such as "mp4"
    const char *name;
    // File format
    const char *long_name;
    / / the mime type
    const char *mime_type;
    const char *extensions; /**< comma-separated file extension */
    /* output support */
    enum AVCodecID audio_codec;    /**< default audio codec */
    enum AVCodecID video_codec;    /**< default video codec */
    enum AVCodecID subtitle_codec; /**< default subtitle codec */
    /** * flags Can be set to: AVFMT_NOFILE, AVFMT_NEEDNUMBER, * AVFMT_GLOBALHEADER, AVFMT_NOTIMESTAMPS, AVFMT_VARIABLE_FPS, * AVFMT_NODIMENSIONS, AVFMT_NOSTREAMS, AVFMT_ALLOW_FLUSH, * AVFMT_TS_NONSTRICT, AVFMT_TS_NEGATIVE */
    int flags;
    const struct AVCodecTag * const *codec_tag;
    const AVClass *priv_class; ///< AVClass for the private context
    struct AVOutputFormat *next;
    // Size of private data
    int priv_data_size;
	/ / write the header
    int (*write_header)(struct AVFormatContext *);
    // Write a packet. If flags=AVFMT_ALLOW_FLUSH, PKT can be NULL to flush the buffered data in the muxer
    // Return 0, indicating that the buffer is still flush; Returns 1, indicating no flush data in the buffer
    int (*write_packet)(struct AVFormatContext *, AVPacket *pkt);
    int (*write_trailer)(struct AVFormatContext *);
    // If it is not YUV420P, only pixel format is currently supported
    int (*interleave_packet)(struct AVFormatContext *, AVPacket *out,
                             AVPacket *in, int flush);
    // Tests whether the given codec can be stored in the container
    int (*query_codec)(enum AVCodecID id, int std_compliance);

    void (*get_output_timestamp)(struct AVFormatContext *s, int stream,
                                 int64_t *dts, int64_t *wall);
    int (*control_message)(struct AVFormatContext *s, int type,
                           void *data, size_t data_size);
    // Write unencoded AVFrame frame data. See av_write_uncoded_frame().
    int (*write_uncoded_frame)(struct AVFormatContext *, int stream_index,
                               AVFrame **frame, unsigned flags);
    /** * Returns device list with it properties. * @see avdevice_list_devices() for more details. */
    int (*get_device_list)(struct AVFormatContext *s, struct AVDeviceInfoList *device_list);
    /** * Initialize device capabilities submodule. * @see avdevice_capabilities_create() for more details. */
    int (*create_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
     // Release device function submodule, see avdevice_capabilities_free().
    int (*free_device_capabilities)(struct AVFormatContext *s, struct AVDeviceCapabilitiesQuery *caps);
    enum AVCodecID data_codec; /**< default data codec */
    /** * Initialize the format. Allocate the data memory, set the AVFormatContext or AVStream parameter, and use it with deinit() to release the allocated memory resources. If 1 is returned, the configuration fails. * /
    int (*init)(struct AVFormatContext *);
    /** Release the memory resources allocated by init, regardless of whether init() is called successfully */
    void (*deinit)(struct AVFormatContext *);
    /** Check for bitstreams * If 0 is returned, more packets need to check for streams; Return -1, then */ is not required
    int (*check_bitstream)(struct AVFormatContext *, const AVPacket *pkt);
} AVOutputFormat;
Copy the code

Similarly, by calling av_register_all(), all of FFmpeg’s multiplexers are stored in a linked list with first_oformat as the header pointer and last_oformat as the tail pointer. Libavformat/movenc.c initializing AVOutputFormat libavformat/ movenc.c

AVOutputFormat ff_mp4_muxer = {
    .name              = "mp4".// The name of the multiplexer
    .long_name         = NULL_IF_CONFIG_SMALL("MP4 (MPEG-4 Part 14)"),				// The file format of mp4
    .mime_type         = "video/mp4"./ / the MIME type
    .extensions        = "mp4".// File name extension
    .priv_data_size    = sizeof(MOVMuxContext),
    .audio_codec       = AV_CODEC_ID_AAC,// Audio encoder ID
    .video_codec       = CONFIG_LIBX264_ENCODER ?
                         AV_CODEC_ID_H264 : AV_CODEC_ID_MPEG4,// Video encoder ID
    .init              = mov_init,	// Initialize the function
    .write_header      = mov_write_header, // Write the header
    .write_packet      = mov_write_packet, / / write Packet
    .write_trailer     = mov_write_trailer,
    .deinit            = mov_free, // Release resources
    .flags             = AVFMT_GLOBALHEADER | AVFMT_ALLOW_FLUSH | AVFMT_TS_NEGATIVE,
    .codec_tag         = (const AVCodecTag* const []){ codec_mp4_tags, 0 },
    .check_bitstream   = mov_check_bitstream,
    .priv_class        = &mp4_muxer_class,
};
Copy the code

Input/Output (I/O)

(1) AVIOContext structure

AVIOContext is an FFmpeg structure that manages input/output (I/O) data. It is the top layer of protocol (file) operations and provides buffered read/write operations. For the meaning of read and write operations and member variables, you can see the following comment diagram in the source code:

Read data:

Write data:

AVIOContext is in libavformat/avio.h.

typedef struct AVIOContext {
    const AVClass *av_class;
    unsigned char *buffer;  // Data buffer
    int buffer_size;        // Cache size
    unsigned char *buf_ptr; // The pointer points to the current position of the buffer area, which can be smaller than buffer+buffer.size
    unsigned char *buf_end; // Reads/writes the end of the data in the cache
    // Private pointer, associated with URLContext structure, as read/write/seek/... Function parameters
    // It is used to read and write generalized input files, pointing to an URLContext object
    void *opaque;   		
    // Read the packet data
    int (*read_packet)(void *opaque, uint8_t *buf, int buf_size);
    // Write data to packet
    int (*write_packet)(void *opaque, uint8_t *buf, int buf_size);
    / / position
    int64_t (*seek)(void *opaque, int64_t offset, int whence);
    int64_t pos;      // The location of the current cache area in the file
    int eof_reached;  // If the end of the file is reached, true indicates that the end is reached
    int write_flag;   // Writable flag, true indicates that open is writable
    int max_packet_size; // Maximum size of packet
    unsigned long checksum;
    unsigned char *checksum_ptr;
    unsigned long (*update_checksum)(unsigned long checksum, const uint8_t *buf, unsigned int size);
    // Error code, 0 means no error
    int error;      
    // The network streaming protocol pauses or resumes playback
    int (*read_pause)(void *opaque, int pause); 
    int64_t (*read_seek)(void *opaque, int stream_index,
                         int64_t timestamp, int flags);
    // 0 indicates that network traffic cannot seek
    int seekable;
    // Write buffer to look back to the previous maximum reached location, used to keep track of written data for later flushing
    unsigned char *buf_ptr_max;
    // The minimum size of the packet
    int min_packet_size;
    // Most of the following fields are only used internally by libavformat, or are only used sparingly, and are not explained here
    int64_t maxsize;
    int direct;
    int64_t bytes_read;
    int seek_count;
    int writeout_count;
    int orig_buffer_size;
    int short_seek_threshold;
    const char *protocol_whitelist;
    const char *protocol_blacklist;
    int (*write_data_type)(void *opaque, uint8_t *buf, int buf_size,
                           enum AVIODataMarkerType type, int64_t time);
    int ignore_boundary_point;
    enum AVIODataMarkerType current_type;
    int64_t last_time;
    int (*short_seek_get)(void *opaque);
    int64_t written;
} AVIOContext;
Copy the code

Opaque, a member variable of AVIOContext, points to an URLContext object. URLContext is the context for operations on specific resource files. It includes a pointer variable prot of the URLProtocol structure type. URLProtocol is a set of functions that operate on a certain type of resources on the basis of classifying resources. The URLContext structure reads as follows:

typedef struct URLContext {
    const AVClass *av_class;   
    // Correlate/points to the corresponding generalized input file
    const struct URLProtocol *prot;  
    // Associate a handle to a specific generalized input file, such as fd for a file handle and socket for a network handle
    void *priv_data; 			
    char *filename;             // The specified URL
    int flags;
    int max_packet_size;        
    int is_streamed;            // True is a stream. The default is false
    int is_connected;
    AVIOInterruptCB interrupt_callback;
    int64_t rw_timeout;        // Read /write operation timeout period
    const char *protocol_whitelist;
    const char *protocol_blacklist;
    int min_packet_size;        
} URLContext;

Copy the code

(2) URLProtocol structure

URLProtocol structure represents the generalized input file, is FFmpeg operation I/O structure, including files, network data streams (TCP, RTP,…) And so on, each protocol corresponds to a URLProtocol structure. This structure is located in libavformat/url.h file, including open, close, read, write, seek, etc., part of the source code is as follows:

typedef struct URLProtocol {
	// Protocol name
    const char *name;
    int     (*url_open)( URLContext *h, const char *url, int flags);
    int     (*url_open2)(URLContext *h, const char *url, int flags, AVDictionary **options);
    int     (*url_accept)(URLContext *s, URLContext **c);
    int     (*url_handshake)(URLContext *c);

    /** * Read data from the protocol. */
    int     (*url_read)( URLContext *h, unsigned char *buf, int size);
    int     (*url_write)(URLContext *h, const unsigned char *buf, int size);
    int64_t (*url_seek)( URLContext *h, int64_t pos, int whence);
    int     (*url_close)(URLContext *h);
    int (*url_read_pause)(URLContext *h, int pause);
    int64_t (*url_read_seek)(URLContext *h, int stream_index,
                             int64_t timestamp, int flags);
    int (*url_get_file_handle)(URLContext *h);
    int (*url_get_multi_file_handle)(URLContext *h, int **handles,
                                     int *numhandles);
    int (*url_get_short_seek)(URLContext *h);
    int (*url_shutdown)(URLContext *h, int flags);
    int priv_data_size;
    const AVClass *priv_data_class;
    int flags;
    int (*url_check)(URLContext *h, int mask);
    int (*url_open_dir)(URLContext *h);
    int (*url_read_dir)(URLContext *h, AVIODirEntry **next);
    int (*url_close_dir)(URLContext *h);
    int (*url_delete)(URLContext *h);
    int (*url_move)(URLContext *h_src, URLContext *h_dst);
    const char *default_whitelist;
} URLProtocol;
Copy the code

Next, the HTTP protocol is taken as an example to describe the initialization process of the URLProtocol structure. At the same time, it is proved that each protocol (including files) corresponds to an URLProtocol object. Libavformat/http.c:

const URLProtocol ff_http_protocol = {
    .name                = "http".// Protocol name
    .url_open2           = http_open, / / the open operation
    .url_accept          = http_accept,/ / the accept operation
    .url_handshake       = http_handshake,// Handle the handshake
    .url_read            = http_read,	// Read data operation
    .url_write           = http_write,	// Write data operation
    .url_seek            = http_seek,	/ / the seek operations
    .url_close           = http_close,	/ / close operation
    .url_get_file_handle = http_get_file_handle,
    .url_get_short_seek  = http_get_short_seek,
    .url_shutdown        = http_shutdown,
    .priv_data_size      = sizeof(HTTPContext),
    .priv_data_class     = &http_context_class,
    .flags               = URL_PROTOCOL_FLAG_NETWORK,
    .default_whitelist   = "http,https,tls,rtp,tcp,udp,crypto,httpproxy"
};
Copy the code

3. Codec/decode

(1) AVCodec structure

AVCodec is a data structure closely related to codec (CODEC). It contains coDEC-related attribute parameters and codec operation functions, such as the name of CODEC, pix_FMTS video frame pixel format of CODEC, etc. Each CODEC corresponds to an AVCodec structure. The AVCodec structure is as follows:

typedef struct AVCodec {
    // Codec name
    const char *name;
    // Describe the name of the codec
    const char *long_name;
    // media type
    enum AVMediaType type;
	// The ID of the codec
    enum AVCodecID id;
    int capabilities;
	// The parameters associated with the CODEC
    const AVRational *supported_framerates; 
	// This coDEC supports pixel formats for video frames/images
	const enum AVPixelFormat *pix_fmts;    
	// The sampling rate that the CODEC supports for audio
	const int *supported_samplerates;      
	// The coDEC supports the sampling format for audio
	const enum AVSampleFormat *sample_fmts; 
	// The channel layout of the CODEC
	const uint64_t *channel_layouts;      
	// Maximum low resolution supported by the decoder
	uint8_t max_lowres;                     
    const AVClass *priv_class;             
    const AVProfile *profiles;             
    const char *wrapper_name;
    int priv_data_size;
    struct AVCodec *next;
    int (*init_thread_copy)(AVCodecContext *);

    int (*update_thread_context)(AVCodecContext *dst, const AVCodecContext *src);
    const AVCodecDefault *defaults;
    // The avcodec_register() function is called,
    // Static data used to initialize coDEC
    void (*init_static_data)(struct AVCodec *codec);

	/ / initialization
    int (*init)(AVCodecContext *);
    int (*encode_sub)(AVCodecContext *, uint8_t *buf, int buf_size,
                      const struct AVSubtitle *sub);
    /** * Save encoded data to AVPacket * * @param avctx Codec context * @param AVpkt output AVPacket * @param[in] frame * @param[out] got_packet_ptr encoder set to 0 or 1 to indicate non-empty packets returned in AVPKt * @return 0 operation successful */
    int (*encode2)(AVCodecContext *avctx, AVPacket *avpkt, const AVFrame *frame,
                   int *got_packet_ptr);
	// Decode operation
    int (*decode)(AVCodecContext *, void *outdata, int *outdata_size, AVPacket *avpkt);
	/ / close the codec
	int (*close)(AVCodecContext *);
    // Encode API with decoupled packet/frame dataflow. 
    int (*send_frame)(AVCodecContext *avctx, const AVFrame *frame);
    int (*receive_packet)(AVCodecContext *avctx, AVPacket *avpkt);

    // Decode API with decoupled packet/frame dataflow. 
    int (*receive_frame)(AVCodecContext *avctx, AVFrame *frame);
    // Flush buffer, where the seeking operation is called
    void(*flush)(AVCodecContext *); . } AVCodec;Copy the code

(2) AVCodecContext structure

You may notice that in addition to AVCodec, which is very important for codecs, there is another structure that occurs very frequently in the member functions of AVCodec. It is safe to say that most codec-related functions need to pass in a structure parameter, and that structure is AVCodecContext. The AVCodecContext structure stores information about codecs used by video or audio streams, such as codec_type for the type of codec, codec for the codec used, and so on. The AVCodecContext structure is as follows:

typedef struct AVCodecContext {
    enum AVMediaType codec_type; /* Type of codec (video, audio...) * /
    const struct AVCodec  *codec;// AVCodec (H.264,MPEG2...)
    enum AVCodecID     codec_id; /* see AV_CODEC_ID_xxx */
    // Bitrate (average bitrate of audio and video)
    int64_t bit_rate;
    // The level of compression encoding
    int compression_level;
     // Additional information contained for a specific encoder (e.g., store SPS, PPS, etc., for h.264 decoders)
    uint8_t *extradata; 
    int extradata_size;
    / / time base
    // With this parameter, PTS can be converted to the actual time (in seconds).
    AVRational time_base;
    // Image width, height, for video
    int width, height;
    // Pixel format, for video
    enum AVPixelFormat pix_fmt;
	// Get the pixel format
    enum AVPixelFormat (*get_format)(struct AVCodecContext *s, const enum AVPixelFormat * fmt);
    // The maximum number of b-frames between non-B-frames
    int max_b_frames;
     // Qscale factor between I/P frames and B frames
    float b_quant_factor;
     // Sample aspect ratio
    AVRational sample_aspect_ratio;
    // Number of audio samples per frame
    int frame_size;
    // Audio channel layout
    uint64_t channel_layout;
    / / frame rateAVRational framerate; . } AVCodecContext;Copy the code

4. Data related structures

(1) AVStream structure

The AVStream structure is used to store information about a video or audio stream, where the field NB_frames indicates how many frames the stream contains, the field duration indicates the length of the stream, whether the field index is an audio or video stream, and so on.

typedef struct AVStream {
	// Flags the video stream or audio stream, stored in AVFormatContext
    int index;    /**< stream index in AVFormatContext */
    
    // Point to the AVCodecContext of the video/audio stream
	// @deprecated use the codecpar struct instead
    AVCodecContext *codec;
	/ / time base. This value can be used to convert PTS, DTS to real time
    AVRational time_base;

    // The length of the video/audio stream
    int64_t duration;
	// The number of frames for this video/audio stream
    int64_t nb_frames;              
	// Metadata information
    AVDictionary *metadata;
	// Frame rate (important for video)
    AVRational avg_frame_rate;
	// The accompanying image. For example, some MP3, AAC audio files come with album coversAVPacket attached_pic; .// The Codec parameter associated with the video stream or audio stream
	// Allocated by avformat_new_stream() and released by avformat_free_context()
    AVCodecParameters *codecpar;
} AVStream;
Copy the code

(2) AVPacket structure

The AVPacket structure is used to store information related to compressed encoded video or audio data, where the field Stream_index indicates whether AVPacket belongs to an audio stream or video stream. For h.264, for example, one AVPacket data corresponds to one NAL, and one NAL stores one image. AVPacket structure source code is as follows:

typedef struct AVPacket {
    
    AVBufferRef *buf;
    /** * Presentation timestamp in AVStream->time_base units; the time at which * the decompressed packet will be presented to the user. */
	// Presentation timestamp, i.e
    int64_t pts;
    /** * Decompression timestamp in AVStream->time_base units; the time at which * the packet is decompressed. */
	// Decompression timestamp
    int64_t dts;
	// Compress encoded video or audio data
    uint8_t *data;
	// Size of data
    int   size;
	// Indicates whether the AVPacket belongs to an audio stream or a video stream
    int   stream_index;

    int   flags;
    AVPacketSideData *side_data;
    int side_data_elems;

    /** * Duration of this packet in AVStream->time_base units, 0 if unknown. * Equals next_pts - this_pts in presentation order. */
	// The AVPacket length
    int64_t duration;
	// The byte position of the AVPacket in the stream. -1 indicates unknown
    int64_t pos;                           
} AVPacket;
Copy the code

(3) AVFrame structure

The AVFrame structure is used to store information related to decoded video/audio data, representing a frame of data. If AVFrame is a video frame data structure, the field data array stores a frame image, the field width and height are the width and height of the image, key_frame is the key frame flag, and so on. If AVFrame is an audio data structure, the field data array stores audio data, which can contain multiple frames of audio, sample_rate is the sampling rate of audio, channels is the number of audio channels, and so on. AVFrame structure source code is as follows:

typedef struct AVFrame {
	// Decoded raw data (video-yuV or RGB; Audio - PCM)
	// For packed data (e.g. RGB24), it is stored in data[0]
	// For planNAR data (e.g. YUV420P), Y component is stored in data[0], U component is stored in data[1], and V component is stored in data[2].
    uint8_t *data[AV_NUM_DATA_POINTERS];
	// data Indicates the length of a row
	// Note: if the image is not necessarily equal to the width of the image, usually greater than the width of the image
    int linesize[AV_NUM_DATA_POINTERS];
    // Width and height of the video frame
    int width, height;
	// This AVFrame contains several audio frames
    int nb_samples;
	// The original data type after decoding, such as YUV420, RGB..
	// For audio, see AVSampleFormat
	// For video, see AVPixelFormat
    int format;
	// Whether it is a keyframe is very important for video
	// 1 -> keyframe, 0-> not
    int key_frame;
    // Frame type, such as I frame, B frame, P frame...
    enum AVPictureType pict_type;
    // Video frame width ratio, e.g. 16:9, 4:3...
    AVRational sample_aspect_ratio;
    // Display the timestamp
    int64_t pts;
	// Encode the image frame number
    int coded_picture_number;
    // Display the image frame number
    int display_picture_number;
    // Audio sampling rate
    int sample_rate;
    // Audio channel Layout
    uint64_t channel_layout;
    // YUV color space type
    enum AVColorSpace colorspace;
	/ / metadata
    AVDictionary *metadata;
	// Number of audio channels
    intchannels; . } AVFrame;Copy the code

At this point, FFmpeg framework in the most important structure, we basically explain the comb out. Finally, borrow againThe thortheFFmpeg key structure diagramAs the end, one is to make this article can echo before and after, two is to pay tribute to god!

Making practical project: https://github.com/jiangdongguo/FFMPEG4Android