FFmpeg module classification

Open the FFmpeg source code, and you’ll find a series of libavxxx modules that define the structure and division of the code.

  • Libavformat, format, format encapsulation
  • Libavcodec, codec, encoding, decoding
  • Libavutil, util, universal audio and video tools, pixel, IO, time and other tools
  • Libavfilter, filter, filter, can be used for audio and video effects processing
  • Libavdevice, device, device
  • Libswscale, scale, video image scaling, pixel format interchange
  • Libavresample, resample, resample
  • Libswresample, also resampling, similar to image scaling
  • Libpostproc, post processing

The first three libraries, format, codec, and util, are the most important for getting started. These are the basic libraries. Let’s take a look at the basic structure of these libraries:

The Context of FFmpeg

If you look at the FFmpeg code, it’s easy to see that there are various constructs in FFmpeg, one of which has a similar naming convention: XxxxContext.

  • AVFormatContext
  • AVCodecContext
  • AVCodecParserContext
  • AVIOContext
  • AVFilterContext

Of course, there are many contexts out there, but this is just a list of the typical ones. At first glance, this naming convention is very similar to object-oriented naming. Context is the holding Context, the object that holds the data during the data link transmission. In fact, this is FFmpeg in the use of object-oriented thinking to programming. XxxxContext can be thought of as an implementation of a C language “class”. C does not have the syntax of classes, but can use a struct to describe a set of elements. If we think of XxxxContext as a class, member variables can obviously be simulated using structs. Here’s a simple example:

struct AVFormatContext {
    iformat;
    oformat;
}
avformat_alloc_context(); 
avformat_free_context();
Copy the code
class AVFormatContext {
    private:
        iformat;
        oformat;
 
    public:
        AVFormatContext();
        ~AVFormatContext();
}
Copy the code

In fact, FFmpeg XxxxContext is written in accordance with object-oriented syntax design. Those of you who are familiar with object orientation should actually be familiar with these names.

AVFormatContext

AVFormatContext is a structure required to open a file in FFmpeg. As I said before, Format is a core concept for audio and video, so in FFmpeg you’ll need to work with AVFormatContext a lot. This is because the decapsulator Demuxer and the wrapper Muxer are not operated directly, but rather by AVFormatContext.

There are three common types of AVFormatContext operations:

  • Generic functions, such as create and destroy, are equivalent to C++ constructors and destructors.
  • A read operation on an input video stream used for input processing, i.eDemuxerTo operate on a video stream, it’s a read operation.
  • A write operation to an output video stream for output processing, i.eWrapper MuxerAn operation on a video stream is a write operation.

Iformat corresponding is AVInputFormat, oformat corresponding is AVOutputFormat, just say it AVFormatContext and AVInputFormat/AVOutputFormat difference. AVFormatContext holding in the process of passing data, these data exists in the whole flow path, or can be reuse, AVInputFormat/AVOutputFormat is included in the action, contains how to parse the data.

AVStream **streams; Is the stream data contained in the media file, several streams, in the media stream is audio, video, subtitles, and so on.

  • Avformat_alloc_context () Creates the AVFormatContext for the input media file
  • Avformat_alloc_output_context2 () Creates the AVFormatContext for the output media file
  • Av_dump_format () Prints the format details
  • Avformat_open_input () Opens the media file to find out the encapsulation format of the media file.
  • Avformat_close_input () Closes the media file
  • Avformat_find_stream_info () finds information about streams in a media file, several streams, and basic information about each stream.
  • Av_read_frame () reads every frame in the media file, which is the frame before decoding
  • Avformat_write_header () Writes the media header information of the output file
  • Av_interleaved_write_frame () The frame information that is written to the output file has been adjusted for the association between frames.
  • Av_write_uncoded_frame () Unencoded frame information to write to the output file
  • Av_write_frame () Encoded frame information to write to the output file
  • Av_write_trailer () writes the media tail information of the output file

To use AVFormatContext, we mainly read and write videos. The following is the basic process:

Video reading process:

  • 1. Create avFormat context

AVFormatContext *ifmt_ctx = avformat_alloc_context()

  • 2. Open the video file

avformat_open_input(&ifmt_ctx, in_filename, 0, 0)

  • 3. Continuously read video frames

while(…) { av_read_frame(ifmt_ctx, &pkt) }

  • 4. Close the AvFormat context

avformat_close_input(&ifmt_ctx)

Video writing process:

  • 1. Create an output context

avformat_alloc_output_context2(&ofmt_ctx, NULL, NULL, out_filename)

  • 2. Write the format header

avformat_write_header(ofmt_ctx, NULL)

  • 3. Continuously output frames

while(…) { av_interleaved_write_frame(ofmt_ctx, &pkt) }

  • 4. Write the end of the format

av_write_trailer(ofmt_ctx)

  • 5. Close the context

avformat_free_context(ofmt_ctx)

AVInputFormat

Decapsulator Demuxer, the formal structure is AVInputFormat, is actually an interface, the function is to unlock the format container after the encapsulation to obtain the encoded audio and video tools. In short, unpacking tools.

We know that various multimedia formats, such as MP4, MP3, FLV format read, there are AVInputFormat specific implementation.

There are many demuxer types, and can be configured, how many demuxer, you can see the demuxer_list.c file, too many, let’s take an example of mp4 demuxer.

Here is the decamper for the MP4 video format FF_mov_demuxer, in mov.c:

AVInputFormat ff_mov_demuxer = {
    .name           = "mov,mp4,m4a,3gp,3g2,mj2",
    .long_name      = NULL_IF_CONFIG_SMALL("QuickTime / MOV"),
    .priv_class     = &mov_class,
    .priv_data_size = sizeof(MOVContext),
    .extensions     = "mov,mp4,m4a,3gp,3g2,mj2",
    .read_probe     = mov_probe,
    .read_header    = mov_read_header,
    .read_packet    = mov_read_packet,
    .read_close     = mov_read_close,
    .read_seek      = mov_read_seek,
    .flags          = AVFMT_NO_BYTE_SEEK | AVFMT_SEEK_TO_PTS,
};
Copy the code

We see several function Pointers:

  • read_probe

What encapsulation format to explore

  • read_header

Read format header data

  • read_packet

Read the decapsulated packet

  • read_close

Close the object

  • read_seek

Format seek read control

As you can see, the AVInputFormat provides interface-like functionality, and FF_mov_demuxer is a concrete implementation of it. The logic of FFmpeg itself isn’t really complicated, but it’s the richness of the supported formats that gives it so much code. FFmpeg can be better understood if we ignore most of the formats and focus on FFmpeg’s implementation of a few of them.

AVOutputFormat

Wrapper Muxer, the corresponding structure is AVOutputFormat, is also an interface, the function is to encode the audio and video package into the format container tool. In short, it’s a packaging tool.

Similar to Demuxer, it is an implementation of MP4, MP3, FLV and other formats. The difference is that Muxer is used for output.

Like demuxer, there are many types of muxer. Take a look at the muxer_list.c file. Let’s take a look at the mp3 muxer, in mp3enc.c:

AVOutputFormat ff_mp3_muxer = {
    .name              = "mp3",
    .long_name         = NULL_IF_CONFIG_SMALL("MP3 (MPEG audio layer 3)"),
    .mime_type         = "audio/mpeg",
    .extensions        = "mp3",
    .priv_data_size    = sizeof(MP3Context),
    .audio_codec       = AV_CODEC_ID_MP3,
    .video_codec       = AV_CODEC_ID_PNG,
    .write_header      = mp3_write_header,
    .write_packet      = mp3_write_packet,
    .write_trailer     = mp3_write_trailer,
    .query_codec       = query_codec,
    .flags             = AVFMT_NOTIMESTAMPS,
    .priv_class        = &mp3_muxer_class,
};
Copy the code

There is also a pointer function above, which is the inverse of demuxer.

AVCodecContext

Just like AVFormatContext, we do AVCodecContextThe Encoder EncoderandDecoder DecoderOperation, generally does not directly operate the codec. So you have to implement the codec, and generally you have to work with AVCodecContext.

Like demuxer and muxer, codec can also be classified into decode and encode files. For details, see ff_libx264_encoder in libx264.c:

AVCodec ff_libx264_encoder = {
    .name             = "libx264",
    .long_name        = NULL_IF_CONFIG_SMALL("libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"),
    .type             = AVMEDIA_TYPE_VIDEO,
    .id               = AV_CODEC_ID_H264,
    .priv_data_size   = sizeof(X264Context),
    .init             = X264_init,
    .encode2          = X264_frame,
    .close            = X264_close,
    .capabilities     = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_AUTO_THREADS |
                        AV_CODEC_CAP_ENCODER_REORDERED_OPAQUE,
    .priv_class       = &x264_class,
    .defaults         = x264_defaults,
    .init_static_data = X264_init_static,
    .caps_internal    = FF_CODEC_CAP_INIT_CLEANUP,
    .wrapper_name     = "libx264",
};
Copy the code

The core function is encode2, which corresponds to the X264_frame function

The Parser FFmpeg

Since the input of the decoder is a complete frame packet, both network transmission and file reading are generally read with a fixed buffer, rather than the frame size of the installation format. So we need the Parser to organize the stream into Frame packets one by one.

The global declaration of the parser is in parsers.c, and the specific definition is in list_parser.c. Take a look at the ff_h264_parser example in h264_parser.c:

AVCodecParser ff_h264_parser = {
    .codec_ids      = { AV_CODEC_ID_H264 },
    .priv_data_size = sizeof(H264ParseContext),
    .parser_init    = init,
    .parser_parse   = h264_parse,
    .parser_close   = h264_close,
    .split          = h264_split,
};
Copy the code

The H264ParseContext structure is the frame data definition in H264 format.

typedef struct H264ParseContext {
    ParseContext pc;
    H264ParamSets ps;
    H264DSPContext h264dsp;
    H264POCContext poc;
    H264SEIContext sei;
    int is_avc;
    int nal_length_size;
    int got_first;
    int picture_structure;
    uint8_t parse_history[6];
    int parse_history_count;
    int parse_last_mb;
    int64_t reference_dts;
    int last_frame_num, last_picture_structure;
} H264ParseContext;
Copy the code

H264ParamSets are defined here: SPS and PPS are defined here, so you can quickly find the properties of the current frame in the macro block.

typedef struct H264ParamSets {
    AVBufferRef *sps_list[MAX_SPS_COUNT];
    AVBufferRef *pps_list[MAX_PPS_COUNT];

    AVBufferRef *pps_ref;
    AVBufferRef *sps_ref;
    /* currently active parameters sets */
    const PPS *pps;
    const SPS *sps;
} H264ParamSets;
Copy the code

summary

  • FFmpeg learning process is difficult, comb clear structure, the overall code context is more clear, but libavFilter and other core modules this article does not talk about. This module is very large and can be customized, which will be covered separately later.
  • Learning FFmpeg by doing will make progress faster. Here are some practical ideas.
  • FFmpeg code structure
  • FFmpeg cross-compile
  • FFmpeg decapsulation
  • FFmpeg heavy package
  • FFmpeg decoding
  • FFmpeg separates audio and video streams