IOS uses FFmpeg to parse audio and video streams

demand

Use FFmpeg to parse audio and video streams. Audio and video streams can come from a standard RTMP URL or a file. The audio and video streams are then parsed and decoded. The video is then rendered on the screen and the audio is output through a speaker.

Realize the principle of

The libavformat module in FFmpeg framework can parse the audio and video data of audio and video streams through the function AV_read_frame. If FFmpeg is directly used for hard solution, it only needs to be parsed to AVPacket and sent to the decoding module for use. If hard solution in VideoToolbox is used, For video data, you also need to obtain the (VPS) SPS, PPS from its NALU Header for subsequent use.

Prerequisites for reading:

Build FFmpeg environment in iOS
FFmpeg basics
Fundamentals of Audio and Video

GitHub address (with code)iOS Parse

Address of nuggets:iOS Parse

Letter Address:iOS Parse

Blog Address:iOS Parse

Simple and easy process

Using the process

Initial resolution parser:- (instancetype)initWithPath:(NSString *)path;
Begin to parse:startParseWithCompletionHandler
Get parsed data: from the previous stepstartParseWithCompletionHandlerThe Block in the method gets parsed audio and video data.

FFmpeg parse process

Create the format context:avformat_alloc_context
Open file stream:avformat_open_input
Looking for streaming information:avformat_find_stream_info
Get the index value of audio and video stream:formatContext->streams[i]->codecpar->codec_type == (isVideoStream ? AVMEDIA_TYPE_VIDEO : AVMEDIA_TYPE_AUDIO)
Get audio and video streams:m_formatContext->streams[m_audioStreamIndex]
Parsing audio and video data frames:av_read_frame
Get extra data:av_bitstream_filter_filter

Specific steps

1. Import FFmpeg framework into the project

The link below contains detailed steps for setting up the FFmpeg environment required by iOS, which can be read in advance.

The iOS build FFmpeg

After importing FFmpeg framework, you need to rename the file that uses FFmpeg to.mm, because it involves C,C++ co-writing, so you need to change the file name

Then import the FFmpeg header file into the header file.

// FFmpeg Header File
#ifdef __cplusplus
extern "C" {
#endif
    
#include "libavformat/avformat.h"
#include "libavcodec/avcodec.h"
#include "libavutil/avutil.h"
#include "libswscale/swscale.h"
#include "libswresample/swresample.h"
#include "libavutil/opt.h"
    
#ifdef __cplusplus
};
#endif
Copy the code

Note: FFmpeg is a widely circulated framework with a complex structure. Generally, the import format is as above, and the folder name is the root directory for import. For details, please refer to the link above.

2. The initialization

2.1 registered FFmpeg

void av_register_all(void);Initialize libavFormat and register all muxers, demuxers and protocols. If you do not call this feature, you can select a specific format that you want to support.

In the main function in the program or agents in main program startup method – (BOOL) application: (UIApplication *) application didFinishLaunchingWithOptions: (NSDictionary *)launchOptions to initialize FFmpeg.

av_register_all();
Copy the code

2.2. Generate format context objects using video files

avformat_alloc_context(): initializes the AvFormat context object.
int avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options)function
- fmt: If non-null means to force the format of an input stream, setting it to null automatically selects it.
int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options);: Reads data packets from media files to obtain stream information

- (AVFormatContext *)createFormatContextbyFilePath:(NSString *)filePath {
    if (filePath == nil) {
        log4cplus_error(kModuleName, "%s: file path is NULL",__func__);
        return NULL;
    }
    
    AVFormatContext  *formatContext = NULL;
    AVDictionary     *opts          = NULL;
    
    av_dict_set(&opts, "timeout"."1000000", 0); // Set timeout to 1 second formatContext = avformat_alloc_context(); BOOL isSuccess = avformat_open_input(&formatContext, [filePath cStringUsingEncoding:NSUTF8StringEncoding], NULL, &opts) < 0 ? NO : YES; av_dict_free(&opts);if(! isSuccess) {if (formatContext) {
            avformat_free_context(formatContext);
        }
        return NULL;
    }
    
    if (avformat_find_stream_info(formatContext, NULL) < 0) {
        avformat_close_input(&formatContext);
        return NULL;
    }
    
    return formatContext;
}
Copy the code

2.3. Get the index of the Audio/Video stream.

You can find the audio or video stream index from the NB_STREAMS array by traversing the Format Context object for later use.

Note: The following code only needs to know the audio and video indexes to quickly read information about the corresponding stream in the Format Context object.

- (int)getAVStreamIndexWithFormatContext:(AVFormatContext *)formatContext isVideoStream:(BOOL)isVideoStream {
    int avStreamIndex = -1;
    for (int i = 0; i < formatContext->nb_streams; i++) {
        if((isVideoStream ? AVMEDIA_TYPE_VIDEO : AVMEDIA_TYPE_AUDIO) == formatContext->streams[i]->codecpar->codec_type) { avStreamIndex = i; }}if (avStreamIndex == -1) {
        log4cplus_error(kModuleName, "%s: Not find video stream",__func__);
        return NULL;
    }else {
        returnavStreamIndex; }}Copy the code

2.4. Whether audio and video streams are supported

Currently, videos only support H264 and H265 encoding formats. In the actual process, the rotation Angle of the decoded video may be different, and different models can support different decoded file formats, so you can use this method to manually filter some unsupported situations. Please download the code to see the details. Here is only a list of supported tests in action.

/* maximum resolution and FPS combination supported by each model: iPhone 6S: 60fps -> 720P 30fps -> 4K iPhone 7P: 60fps -> 1080p 30fps -> 4K iPhone 8: 60fps -> 1080p 30fps -> 4K iPhone 8P: 60fps -> 1080p 30fps -> 4K iPhone X: 60fps -> 1080p 30fps -> 4K iPhone XS: 60fps -> 1080p 30fps -> 4K */Copy the code

Audio In this example, only THE AAC format is supported. Other formats can be changed as required.

3. Start parsing

Initializes AVPacket to store parsed data

The structure AVPacket is used to store compressed data. For video, it usually contains one compressed frame, and for audio, it may contain multiple compressed frames. The structure type is allocated by av_malloc(), copied by av_packet_ref(), and by av_packet_unref(). Function frees memory.

AVPacket    packet;
av_init_packet(&packet);

Copy the code

Analytical data

int av_read_frame(AVFormatContext *s, AVPacket *pkt); : This function returns the contents stored in the file and does not validate what the valid frame of the decoder is. It splits the contents of the file into frames and returns one for each call. It does not omit invalid data between valid frames in order to provide the decoder with the maximum information possible when decoding.

            int size = av_read_frame(formatContext, &packet);
            if (size < 0 || packet.size < 0) {
                handler(YES, YES, NULL, NULL);
                log4cplus_error(kModuleName, "%s: Parse finish",__func__);
                break;
            }
Copy the code

Obtain NALU headers such as SPS and PPS

By calling AV_bitSTREAM_filter_filter, NALU headers such as SPS and PPS can be filtered from the code stream.

Av_bitstream_filter_init: Creates and initializes a bitstream_context with the given bitstream_noun.

Av_bitstream_filter_filter: This function places the filtered data in the poutbuf parameter by filtering the data in the BUF parameter. The output buffer must be released by the caller.

This function filters the buffer buf using buf_size and places the filtered buffer in the buffer pointed to by poutbuf.

attribute_deprecated int av_bitstream_filter_filter ( AVBitStreamFilterContext * bsfc, AVCodecContext * avctx, // Uint8_t ** poutbuf // uint8_t ** poutbuf // // const uint8_t * buf,// the original data provided to the filter int buf_size, // the original data provided to the filter int keyframe // if the buffer to be filtered corresponds to the keyframe grouping data, Is set to non-zero.)Copy the code

Note: New_packet is used below to solve the problem of av_bitstream_filter_filter causing memory leaks. After each use, new_packet will be used to release.

if (packet.stream_index == videoStreamIndex) {
    static char filter_name[32];
    if (formatContext->streams[videoStreamIndex]->codecpar->codec_id == AV_CODEC_ID_H264) {
        strncpy(filter_name, "h264_mp4toannexb", 32);
        videoInfo.videoFormat = XDXH264EncodeFormat;
    } else if (formatContext->streams[videoStreamIndex]->codecpar->codec_id == AV_CODEC_ID_HEVC) {
        strncpy(filter_name, "hevc_mp4toannexb", 32);
        videoInfo.videoFormat = XDXH265EncodeFormat;
    } else {
        break;
    }
    
    AVPacket new_packet = packet;
    if (self->m_bitFilterContext == NULL) {
        self->m_bitFilterContext = av_bitstream_filter_init(filter_name);
    }
    av_bitstream_filter_filter(self->m_bitFilterContext, formatContext->streams[videoStreamIndex]->codec, NULL, &new_packet.data, &new_packet.size, packet.data, packet.size, 0);
    
}

Copy the code

Generate timestamps based on specific rules

You can customize the timestamp generation rules to your own needs. The timestamp is generated using the current system timestamp plus the PTS/DTS inherent in the packet.

    CMSampleTimingInfo timingInfo;
    CMTime presentationTimeStamp     = kCMTimeInvalid;
    presentationTimeStamp            = CMTimeMakeWithSeconds(current_timestamp + packet.pts * av_q2d(formatContext->streams[videoStreamIndex]->time_base), fps);
    timingInfo.presentationTimeStamp = presentationTimeStamp;
    timingInfo.decodeTimeStamp       = CMTimeMakeWithSeconds(current_timestamp + av_rescale_q(packet.dts, formatContext->streams[videoStreamIndex]->time_base, input_base), fps);
Copy the code

Get the data to parse

This example puts the retrieved data in a custom structure and passes it to the method caller via a block callback, where the caller can process the parsed-in video data.

struct XDXParseVideoDataInfo { uint8_t *data; int dataSize; uint8_t *extraData; int extraDataSize; Float64 pts; Float64 time_base; int videoRotate; int fps; CMSampleTimingInfo timingInfo; XDXVideoEncodeFormat videoFormat; }; . videoInfo.data = video_data; videoInfo.dataSize = video_size; videoInfo.extraDataSize = formatContext->streams[videoStreamIndex]->codec->extradata_size; videoInfo.extraData = (uint8_t *)malloc(videoInfo.extraDataSize); videoInfo.timingInfo = timingInfo; videoInfo.pts = packet.pts * av_q2d(formatContext->streams[videoStreamIndex]->time_base); videoInfo.fps = fps; memcpy(videoInfo.extraData, formatContext->streams[videoStreamIndex]->codec->extradata, videoInfo.extraDataSize); av_free(new_packet.data); // send videoInfoif (handler) {
        handler(YES, NO, &videoInfo, NULL);
    }
    
    free(videoInfo.extraData);
    free(videoInfo.data);
Copy the code

Get the audio data to parse

struct XDXParseAudioDataInfo { uint8_t *data; int dataSize; int channel; int sampleRate; Float64 pts; }; .if (packet.stream_index == audioStreamIndex) {
        XDXParseAudioDataInfo audioInfo = {0};
        audioInfo.data = (uint8_t *)malloc(packet.size);
        memcpy(audioInfo.data, packet.data, packet.size);
        audioInfo.dataSize = packet.size;
        audioInfo.channel = formatContext->streams[audioStreamIndex]->codecpar->channels;
        audioInfo.sampleRate = formatContext->streams[audioStreamIndex]->codecpar->sample_rate;
        audioInfo.pts = packet.pts * av_q2d(formatContext->streams[audioStreamIndex]->time_base);
        
        // send audio info
        if (handler) {
            handler(NO, NO, NULL, &audioInfo);
        }
        
        free(audioInfo.data);
    }
Copy the code

Release the packet

Since we have copied the key data in packet to our custom structure, we need to release the packet after using it.

    av_packet_unref(&packet);
Copy the code

Parse releases related resources after completion

- (void)freeAllResources {
    if (m_formatContext) {
        avformat_close_input(&m_formatContext);
        m_formatContext = NULL;
    }
    
    if(m_bitFilterContext) { av_bitstream_filter_close(m_bitFilterContext); m_bitFilterContext = NULL; }}Copy the code

Note: If the FFmpeg hard solution is used, only the AVPacket data structure is needed. There is no need to encapsulate data in a custom structure

4. External calls

After the above operations are completed, parsed data can be obtained through the following blocks. Generally, audio and video decoding operations need to be continued. Stay tuned for more on this later.

    XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    [parseHandler startParseGetAVPackeWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, AVPacket packet) {
        if (isFinish) {
            // parse finish
            ...
            return;
        }
        
        if (isVideoFrame) {
            // decode video
            ...
        }else{ // decode audio ... }}];Copy the code