Recently, I encountered a problem in the development, that is, the key frames of THE H264 stream in MP4 cannot be extracted for processing, and the AAC audio saved locally cannot play normally. After debugging analysis, it is found that this is because the H264 and AAC obtained by unsealing MP4 are ES streams, and they lack the necessary starting code /SPS/PPS and ADTS header when decoding. Although in Android live development tour (3) : AAC encoding format analysis and MP4 file packaging article on MP4 has a simple introduction, but in order to find out the origin of this problem, it is necessary to further elaborate the MP4 format packaging rules at the beginning of this article, and then give solutions to the above problems and practical cases.

1. MP4 format analysis

1.1 introduction of MP4

MP4 package format is defined based on QuickTime container format, media description and media data are separated, currently widely used in h.264 video and AAC audio package, is the representative of HD video /HDV. All data in an MP4 file is encapsulated in boxes (D corresponds to Atom in QuickTime). That is, an MP4 file is made up of several boxes, each of which has a length and type. Each Box can also contain additional sub-boxes. The basic structure of Box is shown below:

As you can see from the figure above, the basic structure of a Box consists of two parts: BoxHeader and BoxData. BoxHeader consists of size, type, and largesize(defined by the size value). These bytes are 4bytes, 4bytes, and 8bytes, respectively. Size represents the size of the entire Box (BoxHeader+BoxData). If the size of the Box exceeds the maximum uint32, size is set to 1. In this case, largesize is used to represent the size of the Box. Type Indicates the type of Box, including FTYP, MOOV, and MDAT. Largesize: when size=1, use it instead of size to store the size of the Box. BoxData stores real data (not necessarily audio and video), and its size is determined by real data.

1.2 MP4 structural analysis

Box is the basic unit of MP4 file. An MP4 file is composed of several boxes, and each Box can also include other sub-boxes. The format structure of MP4 consists of three uppermost boxes, namely FTYP, MOOV and MDAT. Ftyp is the first and only Box of the entire MP4 file, which is mainly used to determine the current file type (such as MP4). Moov stores basic video information, such as time information, TRAk information and media index. Mdat saves video and audio data. Note that the order in which moov Box and MDat Box appear in the file is not fixed, but the FTYP Box must appear first. The structure of the MP4 file is as follows:

Of course, we can also use itMP4InfoThe software opens an MP4 file to view the structure of MP4. As you can see below, the software not only sees the Box structure of MP4 files, but also lists the format of the audio data (mp4a), sampling rate, number of channels, bit rate and video format (AVC1), width and height, bit rate, frame rate and other information. It should be noted that due to different recording devices, the generated MP4 files may contain free boxes. These boxes usually appear between MOOV and Mdata, and their data is usually all zeros, which act as placeholders. As mooV data increases in real-time video shooting, they will be allocated to MOOV. If there is no space reserved for free, the MDAT data needs to be constantly moved backwards to make more and more space for mooV.

  • ftype box

Ftyp is a four-character code word used to identify the encoding type, compatibility, or purpose of a media file. It exists in MP4 and MOV files, and of course in 3GP files. Therefore, in MP4 files, the FTYP type Box is placed at the beginning of the file to mark the file type as MP4. There is only one such Box in the file. We useWinHex toolsOpen an MP4 file and you can see the details of the FTYP Box as shown below:

Box consists of BoxHeader and BoxData. The BoxHeader consists of size, type, and (optionally) largesize. The fTYp Box header is 0x00 00 00 18 66 74 79 70, where 0x00 00 00 00 18 represents the size of the fTYp Box. Size =24 bytes; 0x66 74 79 70 indicates that the Box is of type FTYP, and they form the fTYP header. 0x6D 70 34 32(hexadecimal) indicates major brand. The value is mp42 and may vary with different files. 0x00 00 00 00 indicates minor Version.

  • moov box

Moov box is mainly used to store media time information, TRAK information, media index and other information. fromMP4InfoMoov Box is followed by FTYp Box, so the Box header is0x00 00 28 D1 6D 6F 6F 76, among them,0x00 00 28 D1Size =10449 bytes,0x6D 6F 6F 76Indicates that the current Box is of moov type and the remaining bytes are BoxData. In addition, Moov Box also containsmvhd,trak, where, MVHD Box is marked as0x6D 76 68 64, the Box stores the general information of the file, such as duration, creation time, etc. Trak Box type flag bit0x74 72 61 6BThis type of Box stores video or audio index information. Moov Box structure is shown in the figure below:

Generally speaking, when parsing a media file, the most important parts of the video file are the width and height, length, bit rate, encoding format, frame list, key frame list, and corresponding timestamp and position in the file. This information, in MP4, is stored in several boxes under the STBL Box according to a specific algorithm. All boxes under STBL need to be parsed to restore the media information. The following table describes the storage information of the above important boxes:

  • mdat box

The mDATA type Box stores all media data and its type flag bit0x 6D 64 61 74. Media data in MDATA has no synchronization word, no delimiter, only according toThe index(located in a MOOV) for an interview. The location of the MDAT is flexible. It can be before or after the MOOV, but it must be consistent with the information in the STBL. The BoxHeader of the MDAT Box looks like this:

Analysis of H.264 bit stream in 1.3 MP4

In the analysis of MP4 file structure, we can know that all multimedia data in MP4 file are stored in Mdata Box, and the media data in MDATA has no synchronization word, no delimiter, only according toThe index(in MOOV), which means that the H264 and AAC streams stored in the MData Box may not use the start code (0x00 00 00 01 or 0x00 00 01) or ADTS header for segmentation, this can be through the MP4INFO software to parse MP4 files to get its encapsulated audio and video data formatmp4aandAVC1Confirmed. According to h.264 encoding formatThe relevant dataIt can be seen that H.264 video encoding format is mainly divided into two forms, namelyH.264 stream with start codeandH.264 stream without start codeAmong them, the former is the one we are more familiar withH264,X264; The latter refers toAVC1. Media subtypes in H.264 encoding format:**MP4 container format stores H.264 data without starting code. Instead, each NALU is prefixed with a length field that gives the length of the NALU in bytes. Length fields can vary in size, typically 1, 2, or 4 bytes. ** In addition, in standard H264, SPS and PPS exist in NALU headers, while in MP4 files, SPS and PPS existAVCDecoderConfigurationRecord structure, the sequence parameter set SPS acts on a series of consecutive encoded images, while the image parameter set PPS acts on one or more independent images in the encoded video sequence. If the decoder does not receive these two parameter sets correctly, the other NALUs are also undecoded. Specifically, H.264’s SPS and PPS are stored in MP4 filesavcC Box(moov – > trak – > mdia – > minf – > STBL – > STSD – > avc1 – > avcC). AVCDecoderConfigurationRecordj structure is as follows:From the picture above we know that:

  • 0x00 00 00 2E: represents the length size of avcC Box, that is, 46 bytes;
  • 0x61 76 63 43: is the type flag of avcC Box, which together with 0x00 00 00 2E forms the HeaderData of avcC Box.
  • 0x00 17: indicates the SPS length, that is, 23 bytes.
  • 0x67 64 ... 80 01: SPS content;
  • 0x00 04: indicates the length of the PPS, which is 4 bytes.
  • 0x68 EF BC B0: contents of PPS;

2. Disassemble MP4 with FFmpeg

If we need to extract the H.264 stream from MP4 and save it to a local file, the local file should not be decoded to play because the saved H.264 file has no SPS, PPS, and no start code for each NALU. Fortunately, FFmpeg provides us with a program calledh264_mp4toannexbThe filter realizes the extraction of SPS, PPS and the addition of the start code. For MP4 files, in FFmpegAn AVPacket may contain one or more NALUsFor example, SPS, PPS and I frames may exist in the same NALU, and there is no start code before each NALU. Instead, the length information of the NALU is expressed, which takes up 4 bytes. Avpacket. data structure is as follows:2.1 H264_MP4TOANNEXB filter

FFmpeg provides a variety of processing optionsEncapsulating transformations for certain formatstheBit Stream filter, such asaac_adtstoasc,h264_mp4toannexb/configure –list-bsfs** This section explains how to use ith264_mp4toannexbThe filter converts the MP4 package format of the H264 bit stream to annexB format, i.e. AVC1->H264.(1) Initialize h264_MP4TOAnnexb filter

This process mainly includes creating the AVBitStreamFilter with the specified name, creating the context structure AVBSFContext for the filter, copying the context parameters, and initializing the AVBSFContext. The specific code is as follows:

/** (1) create the h264_mp4TOAnnexb bit flow filter structure AVBitStreamFilter * declaration located in.. /libavcodec/avcodec.h * typedef struct AVBitStreamFilter {* / const char *name; * // List of encoder ids supported by filter * const enum AVCodecID * CODEC_ids; * const AVClass *priv_class; *... *} * * /
const AVBitStreamFilter *avBitStreamFilter = av_bsf_get_by_name("h264_mp4toannexb");
if(! avBitStreamFilter) {
    RLOG_E("get AVBitStreamFilter failed");
    return - 1;
}
/** (2) create AVBSFContext for the given filter context, which stores the filter state * declared in.. /libavcodec/avcodec.h * typedef struct AVBSFContext { * ... * const struct AVBitStreamFilter *filter; * // Input/output stream parameter information * // av_bsf_ALLOc () is created after allocation * // av_bsf_init() is initialized after call * AVCodecParameters *par_in; * AVCodecParameters *par_out; * // The time base of input/output packet * / is set before calling AV_bsF_init () * AVRational time_base_in; * AVRational time_base_out; *} * * /
ret = av_bsf_alloc(avBitStreamFilter, &avBSFContext);
if(ret < 0) {
    RLOG_E_("av_bsf_alloc failed,err = %d", ret);
    return ret;
}
/** (3) copy the input stream related parameters to the filter AVBSFContext*/
int ret = avcodec_parameters_copy(gavBSFContext->par_in,
                                  inputFormatCtx->streams[id_video_stream] ->codecpar);
if(ret < 0) {
    RLOG_E_("copy codec params to filter failed,err = %d", ret);
    return ret;
}
/**(4) Set the filter to the ready state. Call */ after all parameters have been set
ret = av_bsf_init(avBSFContext);
if(ret < 0) {
    RLOG_E_("Prepare the filter failed,err = %d", ret);
    return ret;
}
Copy the code

(2) Processing AVPackt

In this process, av_BsF_SEND_packet is submitted to the filter for processing. After the processing is complete, av_BSF_Receive_packet is called to read the processed data. It should be noted that input of one packet may generate multiple output packets. Therefore, we may need to call AV_BsF_Receive_packet repeatedly until all output packets are read, that is, wait for the function to return 0. The specific code is as follows:

/**(5) send the input packet to the filter */
int ret = av_bsf_send_packet(avBSFContext, avPacket);
if(ret < 0) {
    av_packet_unref(avPacket);
    av_init_packet(avPacket);
    return ret;
}
/**(6) loop through the filter until a return of 0 indicates that the read is complete */
for(;;) {
    int flags = av_bsf_receive_packet(avBSFContext, avPacket);
    if(flags == EAGAIN) {
        continue;
    } else {
        break; }}Copy the code

(3) Release all resources allocated by the filter

/**(7) Release the filter resource */
if(avBSFContext) {
    av_bsf_free(&avBSFContext);
}
Copy the code
2.2 Practice: Save H.264 and AAC in MP4 to local files

(1) Execution flow chart(2) Code implementation

  • Ffmepeg_dexmuxer. CPP: FFmpeg function
// ffmpeg calls the function function
// Created by Jiangdg on 2019/9/25.
//

#include "ffmpeg_demuxer.h"

FFmpegDexmuer g_demuxer;

int createDemuxerFFmpeg(char * url) {
    if(! url) {
        RLOG_E("createRenderFFmpeg failed,url can not be null");
        return - 1;
    }
    // Initialize the FFmPEG engine
    av_register_all();
    avcodec_register_all();
    av_log_set_level(AV_LOG_VERBOSE);
    g_demuxer.avPacket = av_packet_alloc();
    av_init_packet(g_demuxer.avPacket);
    g_demuxer.id_video_stream = - 1;
    g_demuxer.id_audio_stream = - 1;

    // Open the input URL
    g_demuxer.inputFormatCtx = avformat_alloc_context();
    if(! g_demuxer.inputFormatCtx) {
        releaseDemuxerFFmpeg();
        RLOG_E("avformat_alloc_context failed.");
        return - 1;
    }
    int ret = avformat_open_input(&g_demuxer.inputFormatCtx, url, NULL.NULL);
    if(ret < 0) {
        releaseDemuxerFFmpeg();
        RLOG_E_("avformat_open_input failed,err=%d", ret);
        return - 1;
    }
    ret = avformat_find_stream_info(g_demuxer.inputFormatCtx, NULL);
    if(ret < 0) {
        releaseDemuxerFFmpeg();
        RLOG_E_("avformat_find_stream_info failed,err=%d", ret);
        return - 1;
    }
    // Get the stream ID
    for(int i=0; i<g_demuxer.inputFormatCtx->nb_streams; i++) {
        AVStream *avStream = g_demuxer.inputFormatCtx->streams[i];
        if(! avStream) {
            continue;
        }
        AVMediaType type = avStream ->codecpar->codec_type;
        if(g_demuxer.id_video_stream == - 1 || g_demuxer.id_audio_stream == - 1) {
            if(type == AVMEDIA_TYPE_VIDEO) {
                g_demuxer.id_video_stream = i;
            }
            if(type == AVMEDIA_TYPE_AUDIO) { g_demuxer.id_audio_stream = i; }}}// Initialize h264_MP4TOAnnexb filter
    // This filter is used to convert the package format of H264 from MP4 mode to AnnexB mode
    const AVBitStreamFilter *avBitStreamFilter = av_bsf_get_by_name("h264_mp4toannexb");
    if(! avBitStreamFilter) {
        releaseDemuxerFFmpeg();
        RLOG_E("get AVBitStreamFilter failed");
        return - 1;
    }
    ret = av_bsf_alloc(avBitStreamFilter, &g_demuxer.avBSFContext);
    if(ret < 0) {
        releaseDemuxerFFmpeg();
        RLOG_E_("av_bsf_alloc failed,err = %d", ret);
        return ret;
    }
    ret = avcodec_parameters_copy(g_demuxer.avBSFContext->par_in,
          g_demuxer.inputFormatCtx->streams[g_demuxer.id_video_stream] ->codecpar);
    if(ret < 0) {
        releaseDemuxerFFmpeg();
        RLOG_E_("copy codec params to filter failed,err = %d", ret);
        return ret;
    }
    ret = av_bsf_init(g_demuxer.avBSFContext);
    if(ret < 0) {
        releaseDemuxerFFmpeg();
        RLOG_E_("Prepare the filter failed,err = %d", ret);
        return ret;
    }
    return ret;
}

int readDataFromAVPacket(a) {
    int ret = - 1;
    // Success, return AVPacket data size
    if(g_demuxer.avPacket) {
        ret = av_read_frame(g_demuxer.inputFormatCtx, g_demuxer.avPacket);
        if(ret == 0) {
            returng_demuxer.avPacket->size; }}return ret;
}

int handlePacketData(uint8_t *out, int size) {
    if(! g_demuxer.avPacket || ! out) {return - 1;
    }
    // H264 package format conversion: MP4 mode -> AnnexB mode
    int stream_index = g_demuxer.avPacket->stream_index;
    if(stream_index == getVideoStreamIndex()) {
        int ret = av_bsf_send_packet(g_demuxer.avBSFContext, g_demuxer.avPacket);
        if(ret < 0) {
            av_packet_unref(g_demuxer.avPacket);
            av_init_packet(g_demuxer.avPacket);
            return ret;
        }

        for(;;) {
            int flags = av_bsf_receive_packet(g_demuxer.avBSFContext, g_demuxer.avPacket);
            if(flags == EAGAIN) {
                continue;
            } else {
                break; }}memcpy(out, g_demuxer.avPacket->data, size);
    } else if(stream_index == getAudioStreamIndex()){
        memcpy(out, g_demuxer.avPacket->data, size);
    }
    av_packet_unref(g_demuxer.avPacket);
    av_init_packet(g_demuxer.avPacket);
    // Returns the data type of AVPacket
    return stream_index;
}

void releaseDemuxerFFmpeg(a) {
    if(g_demuxer.inputFormatCtx) {
        avformat_close_input(&g_demuxer.inputFormatCtx);
        avformat_free_context(g_demuxer.inputFormatCtx);
    }
    if(g_demuxer.avPacket) {
        av_packet_free(&g_demuxer.avPacket);
        g_demuxer.avPacket = NULL;
    }
    if(g_demuxer.avBSFContext) {
        av_bsf_free(&g_demuxer.avBSFContext);
    }
    RLOG_I("release FFmpeg engine over!");
}

int getVideoStreamIndex(a) {
    return g_demuxer.id_video_stream;
}

int getAudioStreamIndex(a) {
    return g_demuxer.id_audio_stream;
}

int getAudioSampleRateIndex(a) {
    int rates[] = {96000.88200.64000.48000.44100.32000.24000.22050.16000.12000.11025.8000.7350.- 1.- 1.- 1};
    int sampe_rate = g_demuxer.inputFormatCtx->streams[getAudioStreamIndex()]
        ->codecpar->sample_rate;
    for (int index = 0; index < 16; index++) {
        if(sampe_rate == rates[index]) {
            returnindex; }}return - 1;
}

int getAudioProfile(a) {
    return g_demuxer.inputFormatCtx->streams[getAudioStreamIndex()]->codecpar->profile;
}

int getAudioChannels(a) {
    return g_demuxer.inputFormatCtx->streams[getAudioStreamIndex()]->codecpar->channels;
}
Copy the code
  • Native_dexmuxer. CPP: Java layer call interface, processing MP4 child threads
// Decode, render child thread (part of code)
/ / ffmpeg error code: https://my.oschina.net/u/3700450/blog/1545657
// Created by Jiangdg on 2019/9/23.
//

void *save_thread(void * args) {
    // The main thread is separated from the child thread
    // When the child thread ends, the resource is automatically reclaimed
    pthread_detach(pthread_self());
    DemuxerThreadParams * params = (DemuxerThreadParams *)args;
    if(! params) {
        RLOG_E("pass parms to demuxer thread failed");
        return NULL;
    }
    // Bind the current thread to JavaVM and get JNIEnv* from the JVM
    JNIEnv *env = NULL;
    jmethodID id_cb = NULL;
    if(g_jvm && global_cb_obj) {
        if(g_jvm->GetEnv(reinterpret_cast<void **>(env), JNI_VERSION_1_4) > 0) {
            RLOG_E("get JNIEnv from JVM failed.");
            return NULL;
        }
        if(JNI_OK ! = g_jvm->AttachCurrentThread(&env,NULL)) {
            RLOG_E("attach thread failed");
            return NULL;
        }
        jclass cb_cls = env->GetObjectClass(global_cb_obj);
        id_cb = env->GetMethodID(cb_cls, "onCallback"."(I)V");
    }

    // Open the input stream
    RLOG_I_("#### input url = %s", params->url);
    int ret = createDemuxerFFmpeg(params->url);
    if(ret < 0) {
        if(params) {
            free(params->url);
            free(params->h264path);
            free(params);
        }
        if(id_cb && g_jvm) {
            env->CallVoidMethod(global_cb_obj, id_cb, - 1);
            env->DeleteGlobalRef(global_cb_obj);
            g_jvm->DetachCurrentThread();
        }
        return NULL;
    }
    // Open the file
    RLOG_I_("#### h264 save path = %s", params->h264path);
    RLOG_I_("#### aac save path = %s", params->aacpath);
    FILE * h264file = fopen(params->h264path, "wb+");
    FILE * aacfile = fopen(params->aacpath, "wb+");
    if(h264file == NULL || aacfile == NULL) {
        RLOG_E("open save file failed");
        if(params) {
            free(params->url);
            free(params->h264path);
            free(params->aacpath);
            free(params);
        }
        releaseDemuxerFFmpeg();
        if(id_cb && g_jvm) {
            env->CallVoidMethod(global_cb_obj, id_cb, 2 -);
            env->DeleteGlobalRef(global_cb_obj);
            g_jvm->DetachCurrentThread();
        }
        return NULL;
    }
    int size = - 1;
    int audio_profile = getAudioProfile();
    int rate_index = getAudioSampleRateIndex();
    int audio_channels = getAudioChannels();
    if(id_cb) {
        env->CallVoidMethod(global_cb_obj, id_cb, 0);
    }
    bool is_reading = false;
    while ((size = readDataFromAVPacket()) > 0) {
        if(g_exit) {
            break;
        }
        if(! is_reading) {
            is_reading = true;
            if(id_cb) {
                env->CallVoidMethod(global_cb_obj, id_cb, 1); }}uint8_t *out_buffer = (uint8_t *)malloc(size * sizeof(uint8_t));
        memset(out_buffer, 0, size * sizeof(uint8_t));
        int stream_index = handlePacketData(out_buffer, size);
        if(stream_index < 0) {
            continue;
        }
        if(stream_index == getVideoStreamIndex()) {
            fwrite(out_buffer, size,1, h264file);
            RLOG_I_("-- >write a video data, size=%d", size);
        } else if(stream_index == getAudioStreamIndex()) {
            // Add the ADTS header
            int adtslen = 7;
            uint8_t *ret = (uint8_t *)malloc(size * sizeof(uint8_t) + adtslen * sizeof(char));
            memset(ret, 0, size * sizeof(uint8_t) + adtslen * sizeof(char));
            char * adts = (char *)malloc(adtslen * sizeof(char));
            adts[0] = 0xFF;
            adts[1] = 0xF1;
            adts[2] = (((audio_profile - 1) < <6) + (rate_index << 2) + (audio_channels >> 2));
            adts[3] = (((audio_channels & 3) < <6) + (size >> 11));
            adts[4] = ((size & 0x7FF) > >3);
            adts[5] = (((size & 7) < <5) + 0x1F);
            adts[6] = 0xFC;

            memcpy(ret, adts, adtslen);
            memcpy(ret+adtslen, out_buffer, size);
            fwrite(ret, size+adtslen, 1, aacfile);
            free(adts);
            free(ret);
            RLOG_I_("--->write a AUDIO data, header=%d, size=%d", adtslen, size);
        }
        free(out_buffer);
    }
    // Release resources
    if(h264file) {
        fclose(h264file);
    }
    if(aacfile) {
        fclose(aacfile);
    }
    if(params) {
        free(params->url);
        free(params->h264path);
        free(params->aacpath);
        free(params);
    }
    releaseDemuxerFFmpeg();
    if(id_cb && g_jvm) {
        env->CallVoidMethod(global_cb_obj, id_cb, 2);
        env->DeleteGlobalRef(global_cb_obj);
        g_jvm->DetachCurrentThread();
    }
    RLOG_I("##### stop save success.");
    return NULL;
}
Copy the code

Note: Only the core code is posted here, please see Github for details:DemoDemuxerMP4