1. Analysis of AAC coding format

1.1 introduction of AAC

AdvancedAudio Coding (AAC) is an audio Coding technology based on mpeg-4. It is jointly developed by dolby LABS, AT&T and other companies to replace the MP3 encoding method. As a high compression ratio audio compression algorithm, AAC’s data compression ratio is about 18:1, and the compressed sound quality is comparable to uncompressed CD sound quality. Therefore, compared with MP3, WMA and other audio coding standards, the bit rate is lower under the same quality, effectively saving the transmission bandwidth, is widely used in Internet streaming media, IPTV and other fields (low bit rate, high sound quality). The main features are as follows:

  • Bit rate: AAC- up to 512kbps (dual-channel) /MP3- 32~320kbps
  • Sampling rate: AAC- up to 96kHz/MP3 – up to 48kHz
  • Number of channels: AAC – up to 48 full-range channels /MP3 – two channels
  • Sampling accuracy: AAC- maximum 32bit/MP3 – maximum 16bit

The disadvantage of AAC is that it belongs to lossy compression format, and compared with APE and FLAC and other mainstream lossless compression, there is a large gap in the “fullness” of sound. In addition, in addition to streaming media network transmission, it can support fewer devices.

1.2 AAC encoding package format

Before compression and coding, audio data should be sampled and quantized in the form of sample values. An output stream encoded by audio compression in the form of an audio frame. Each audio frame contains several compressed data of audio samples. An AAC audio frame contains 960 or 1024 sample values. These compressed and encoded audio frames are called RawData blocks, which are referred to as original frames because the original data blocks exist in the form of frames. The original frame is variable. If the original frame is encapsulated by ADTS, the original frame is ADTS frame. If the original frame is ADIF encapsulated, the original frame is ADIF frame. The differences are as follows:

  • ADIF: audio odata Interchange Format, AudioData Interchange Format. This format explicitly decodes must be performed at the beginning of a well-defined audio data stream and is often used in disk files;

  • ADTS: AudioData Transport Stream. This format is characterized by a bit stream with synchronization words and allows decoding at any frame of the audio data stream, that is, it has a message header for each frame.

The length of an AAC raw database is variable. ADTS frames are formed by ADTS encapsulation of the raw frame with the ADTS header. Each FRAME (ADTS frame) of AAC Audio consists of ADTS Header and AAC Audio Data(including 1 to 4 original Audio frames). ADTS Header consists of 7 bytes or 9 bytes and consists of two parts: Fixed header information (ADTS_fixed_header), variable header information (adTS_variABLE_header) The data in the fixed header information is the same in each frame, which mainly defines the key information of the audio sampling rate, number of sound channels, frame length and so on, which is the key information required for decoding AAC. Variable header information is variable from frame to frame.

The AAC data flow structure composed of multiple ADTS frames is shown as follows:

A) Fixed headers

Description:

  • Syncword: contains 12bits. The synchronization header, indicating the start of an ADTS frame, is always 0xFFF. Because of its existence, it supports decoding any frame;
  • ID: 1bit. In the MPEG version, 0 is MPGE-4, and 1 is MPGE-2.
  • Layer: occupies 2bits. Always “00”;
  • Protection_absent: contains 1bit. If =0, the LENGTH of ADTS Header is 9 bytes. If =1, the ADTS Header takes up 7 bytes
  • Profile: occupies 2 bits. Which level of AAC should be used? Values 00, 01 and 10 correspond to Mainprofile, LC and SSR respectively.
  • Sampling_frequency_index: contains 4bits. Denotes the Sampling rate subscript to be used. Through this subscript, find the Sampling rate value in the Sampling array [], such as 0xb, and the corresponding Sampling rate is 8000Hz;

  • Channel_configuration: indicates the number of sound channels, for example, 1 to mono or 2 to stereo

(b) Variable headers

Description:

  • Frame_length: 13bits. Represents the length of an ADTS Frame, i.e. the ADTS header (7 or 9 bytes) +sizeof(AAC Frame);
  • adts_buffer_fullness: occupies 11bits. The value 0x7FF indicates a variable bit rate stream
  • Number_of_raw_data_blocks_In_frame: contains 2bits. Indicates that there are (number_of_raw_data_blocks_In_frame+1) ORIGINAL AAC frames in the ADTS frame

(3) Package AAC into ADTS format

As we all know, when using MediaCodec to compress PCM into AAC, the AAC output from the encoder is the original frame without ADTS header. If we save it directly as AAC file or push stream, tools like VLC cannot decode and play the AAC data stream. Therefore, we need to add ADTS data header to the AAC raw frame output by MediaCodec encoded PCM, and then save the file or push the stream. The code for MediaCodec is as follows:

private void encodeBytes(byte[] audioBuf, int readBytes) {
	ByteBuffer[] inputBuffers = mAudioEncoder.getInputBuffers();
	ByteBuffer[] outputBuffers = mAudioEncoder.getOutputBuffers();
	int inputBufferIndex = mAudioEncoder.dequeueInputBuffer(TIMES_OUT);
	if(inputBufferIndex >= 0){
		ByteBuffer inputBuffer  = null;
		if(! isLollipop()){ inputBuffer = inputBuffers[inputBufferIndex]; }else{
			inputBuffer = mAudioEncoder.getInputBuffer(inputBufferIndex);
		}
		if(audioBuf==null || readBytes<=0){
			mAudioEncoder.queueInputBuffer(inputBufferIndex,0.0,getPTSUs(),MediaCodec.BUFFER_FLAG_END_OF_STREAM);
		}else{
			inputBuffer.clear();
			inputBuffer.put(audioBuf);
			mAudioEncoder.queueInputBuffer(inputBufferIndex,0,readBytes,getPTSUs(),0); }}// Returns an output buffer handle. A value of -1 indicates that no output buffer is currently available
	The mBufferInfo argument contains the encoded data, and the timesOut argument is the time to wait out the timeout
	MediaCodec.BufferInfo  mBufferInfo = new MediaCodec.BufferInfo();
	int outputBufferIndex = -1;
	do{
		outputBufferIndex = mAudioEncoder.dequeueOutputBuffer(mBufferInfo,TIMES_OUT);
		if(outputBufferIndex == MediaCodec. INFO_TRY_AGAIN_LATER){
			Log.i(TAG,"Get encoder output buffer timeout");
		}else if(outputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED){
		   
		}else if(outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED){
		  
		}else{
			if((mBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG) ! =0){
				mBufferInfo.size = 0;
			}
			if((mBufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) ! =0) {break;
			}
			// Get a read-only output buffer, inputBuffer, that contains encoded data
			ByteBuffer mBuffer = ByteBuffer.allocate(10240);
			ByteBuffer outputBuffer = null;
			if(! isLollipop()){ outputBuffer = outputBuffers[outputBufferIndex]; }else{
				outputBuffer  = mAudioEncoder.getOutputBuffer(outputBufferIndex);
			}
			if(mBufferInfo.size ! =0){	
                Log.i(TAG,"Add ADTS header to AAC stream, cache to mBuffer");		
				mBuffer.clear();
                    // Copy the original AAC frame encoded by outputBuffer to mBuffer from the 8th byte
                    // The first 7 bytes of the mBuffer are reserved (array subscripts 0 to 6)
				outputBuffer.get(mBuffer.array(), 7, mBufferInfo.size);
				outputBuffer.clear();
                    // Set the position of buffer to 7 + mBufferInfo.size
				mBuffer.position(7 + mBufferInfo.size);
                    // Add the ADTS header, where (mbufferInfo.size + 7) is the ADTS frame length
				addADTStoPacket(mBuffer.array(), mBufferInfo.size + 7);
                    // Set position of buffer to 0
				mBuffer.flip();

				    AAC / / push flow. } mAudioEncoder.releaseOutputBuffer(outputBufferIndex,false); }}while (outputBufferIndex >= 0);
}


/ / -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- add ADTS, 7 bytes -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
    private void addADTStoPacket(byte[] packet, int packetLen) {
        int profile = 2;
        int chanCfg = 1;
        int sampleRate = mSamplingRateIndex ;
        packet[0] = (byte) 0xFF;    
        packet[1] = (byte) 0xF1;  
        packet[2] = (byte) (((profile - 1) < <6) + (sampleRate << 2) + (chanCfg>> 2));
        packet[3] = (byte) (((chanCfg & 3) < <6) + (packetLen >> 11));
        packet[4] = (byte) ((packetLen & 0x7FF) > >3);
        packet[5] = (byte) (((packetLen & 7) < <5) + 0x1F);
        packet[6] = (byte) 0xFC; } Note: msamplinGrateful index is the subscript of the sampling ratepublic static final int[] AUDIO_SAMPLING_RATES = { 96000./ / 0
            88200./ / 1
            64000./ / 2
            48000./ / 3
            44100./ / 4
            32000./ / 5
            24000./ / 6
            22050./ / 7
            16000./ / 8
            12000./ / 9
            11025./ / 10
            8000./ / 11
            7350./ / 12
            -1./ / 13
            -1./ / 14
            -1./ / 15
    };
Copy the code

Perhaps, addADTStoPacket method on each byte assignment is a bit confusing, here we refer to FFmpeg source code, to the ADTS header assignment for further explanation, in order to deepen understanding. The adts_write_frame_header() function can be found in FFmpeg libavformat/adtsenc.c.

static int adts_write_frame_header(ADTSContext *ctx,
                                   uint8_t *buf, int size, int pce_size)
{
    PutBitContext pb;

    unsigned full_frame_size = (unsigned)ADTS_HEADER_SIZE + size + pce_size;
    if (full_frame_size > ADTS_MAX_FRAME_BYTES) {
        av_log(NULL, AV_LOG_ERROR, "ADTS frame size too large: %u (max %d)\n",
               full_frame_size, ADTS_MAX_FRAME_BYTES);
        return AVERROR_INVALIDDATA;
    }

    init_put_bits(&pb, buf, ADTS_HEADER_SIZE);

    /* adts_fixed_header */
    // Add the ADTS header. Put_bits takes the second argument to bits and the third argument to value
    // Note: put_bits is defined in libavcodec/put_bits.h
    put_bits(&pb, 12.0xfff);   /* syncword */
    put_bits(&pb, 1.0);        /* ID */
    put_bits(&pb, 2.0);        /* layer */
    put_bits(&pb, 1.1);        /* protection_absent */
    put_bits(&pb, 2, ctx->objecttype); /* profile_objecttype */
    put_bits(&pb, 4, ctx->sample_rate_index); / / sampling rate
    put_bits(&pb, 1.0);        /* private_bit */
    put_bits(&pb, 3, ctx->channel_conf); /* Channel, channel_configuration */
    put_bits(&pb, 1.0);        /* original_copy */
    put_bits(&pb, 1.0);        /* home */

    /* adts_variable_header */
    put_bits(&pb, 1.0);        /* copyright_identification_bit */
    put_bits(&pb, 1.0);        /* copyright_identification_start */
    put_bits(&pb, 13, full_frame_size); /* aAC_frame_length, ADTS frame length */
    put_bits(&pb, 11.0x7ff);   /* adts_buffer_fullness */
    put_bits(&pb, 2.0);        /* number_of_raw_data_blocks_in_frame */

    flush_put_bits(&pb);

    return 0;
}
Copy the code

From adts_write_frame_header(), Except for the profile, sampling_frequency_index, channel_configuration, and ACC_frame_LENGTH values, which may not be used due to different encoder configurations, the other fields are basically the same. Even profiles can be set to default values directly. In that case, draw a general idea

Here is an AAC file opened using UtraEdit. An ADTS frame looks like this:

2. Analysis of MP4 package format

Because MP4 format is more complex, this article only makes a simple introduction to it. MP4 package format is defined based on QuickTime container format, media description and media data are separated, currently widely used in H.263 video and AAC audio package, is the representative of HD video /HDV. All data in an MP4 file is encapsulated in boxes (D corresponds to Atom in QuickTime). That is, an MP4 file is made up of several boxes, each of which has a length and type, and each box can contain additional sub-boxes. The basic structure of box is as follows:

Where size specifies the size of the entire box, including the header. If box is large (for example, the MDATbox that stores specific video data) and exceeds the maximum value of the uint32, size is set to 1 and the following 8 bits, uint64, are used to store the size. In general, an MP4 file consists of several boxes. The common MP4 file structure is:

Generally speaking, when parsing a media file, the most important part of the video file is the width and height, length, bit rate, encoding format, frame list, key frame list, and the corresponding timestamp and position in the file. This information, in MP4, is stored in several boxes under stblBox according to a specific algorithm. All boxes under STBL need to be parsed to restore the media information. The following table describes the storage information of the above important boxes:

3. Encapsulate H.264 and AAC as MP4 files

In order to deeply understand h.264 and AAC encoding format, we will use MediaCodec and MediaMuxer provided in AndroidAPI to realize compression and encoding of YUV format video data and PCM format audio data collected by hardware, and encapsulate the encoded data into MP4 format files. MediaCodec is introduced in Android4.1, it can access the hardware encoder at the bottom of the system, we can specify the corresponding encoder by specifying MIME type, to achieve the acquisition of audio, video encoding and decoding; MediaMuxer is a mixer that can mix and package H.264 video streams and ACC audio streams into an MP4 file, or just input H.264 video streams.

3.1 Encode YUV video data as H.264

First, create and configure a MediaCodec object and map it to the underlying H.264 hardware encoder by specifying the MIME type of the object as “Video/AVC”. MediaCodec’s configure method is then called to configure the encoder, such as specifying the video encoder’s bit rate, frame rate, color format and other information.

MediaFormatmFormat = MediaFormat. CreateVideoFormat (""video/avc"",640 ,480);
// The bit rate ranges from 600kbps to 5000kbps, depending on the resolution and network conditions
mFormat.setInteger(MediaFormat.KEY_BIT_RATE,BIT_RATE);     
// Frame rate: 15-30fps
mFormat.setInteger(MediaFormat.KEY_FRAME_RATE,FRAME_RATE);
COLOR_FormatYUV420Planar or COLOR_FormatYUV420SemiPlanar
mFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT,mColorFormat);
// Keyframe interval, that is, the interval between encoding a keyframe
mFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL,FRAME_INTERVAL);         
// Start the encoder
MediaCodec mVideoEncodec = MediaCodec.createByCodecName(mCodecInfo.getName());   
mVideoEncodec.configure(mFormat,null.null,MediaCodec.CONFIGURE_FLAG_ENCODE);    
mVideoEncodec.start();
Copy the code

Second, each compiler has multiple input and output buffers. When API<=20, getInputBuffers() and getOutputBuffers() can be used to obtain all input/output buffers owned by the encoder. When the encoder is started by MediaCodec’s start() method, the APP does not get the required input and output buffers. Also need to call MediaCodec dequeueInputBuffer (long) and dequeueOutputBuffer (MediaCodec BufferInfo, long) to APP and binding buffer, The handle corresponding to the input/output cache is then returned. Once the APP has an available input buffer, it can fill the buffer with valid data streams and submit the data streams (blocks) to the encoder for automatic encoding through MediaCodec’s queueInputBuffer(int,int,int,long,int) method.

ByteBuffer[]inputBuffers = mVideoEncodec.getInputBuffers();
// Returns an input buffer handle to the encoder. -1 indicates that no input buffer is currently available
intinputBufferIndex = mVideoEncodec.dequeueInputBuffer(TIMES_OUT);
if(inputBufferIndex>= 0) {// bind an empty, writable inputBuffer to the client
    ByteBuffer inputBuffer  = null;
    if(! isLollipop()){ inputBuffer =inputBuffers[inputBufferIndex]; }else{
          inputBuffer = mVideoEncodec.getInputBuffer(inputBufferIndex);
     }
     // Write valid raw data to the input cache and submit it to the encoder for encoding processing
     inputBuffer.clear();
     inputBuffer.put(mFrameData);         
     mVideoEncodec.queueInputBuffer(inputBufferIndex,0,mFrameData.length,getPTSUs(),0);
}
Copy the code

Original data streams are encoded after processing, the encoded data is saved to the APP binding output buffering, by calling MediaCodec dequeueOutputBuffer (MediaCodec BufferInfo, long). When the output buffer has been processed (such as streaming, blending into MP4), MediaCodec’s releaseOutputBuffer(int, Boolean) method can be called to return the output buffer to the encoder.

// Returns an output buffer handle. A value of -1 indicates that no output buffer is currently available
The mBufferInfo argument contains the encoded data, and the timesOut argument is the time to wait out the timeout
MediaCodec.BufferInfo  mBufferInfo = new MediaCodec.BufferInfo();
int outputBufferIndex = -1;
do{
        outputBufferIndex = mVideoEncodec.dequeueOutputBuffer(mBufferInfo,TIMES_OUT);
        if(outputBufferIndex == MediaCodec. INFO_TRY_AGAIN_LATER){
      Log.e(TAG,"Get encoder output buffer timeout");
        }else if(outputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED){
        // If the API is less than 21, APP needs to rebind the encoder's input cache;
        // If the API is greater than 21, there is no need to process INFO_OUTPUT_BUFFERS_CHANGED
        if(!isLollipop()){
            outputBuffers = mVideoEncodec.getOutputBuffers();
        }
    }else if(outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED){
        // The encoder output cache format changes, usually once before data is stored
        // Set the mixer video track here. If audio has been added, start the mixer (to ensure audio and video synchronization).
        MediaFormat newFormat = mVideoEncodec.getOutputFormat();
        MediaMuxerUtils mMuxerUtils = muxerRunnableRf.get();
        if(mMuxerUtils ! =null){
            mMuxerUtils.setMediaFormat(MediaMuxerUtils.TRACK_VIDEO,newFormat);
        }
        Log.i(TAG,"Encoder output cache format changed, add video track to mixer");
    }else{
        // Get a read-only output buffer, inputBuffer, that contains encoded data
        ByteBuffer outputBuffer = null;
        if(! isLollipop()){ outputBuffer = outputBuffers[outputBufferIndex]; }else{
            outputBuffer  = mVideoEncodec.getOutputBuffer(outputBufferIndex);
        }             
                        // If API<=19, you need to adjust the position of ByteBuffer according to the offset of BufferInfo
                        // And limit the length of data to be read from the cache, otherwise the output will be messy
                        if (isKITKAT()) {
                                outputBuffer.position(mBufferInfo.offset);
                                outputBuffer.limit(mBufferInfo.offset + mBufferInfo.size);
                        }
                        // Determine keyframes according to the NALU type
                        MediaMuxerUtils mMuxerUtils = muxerRunnableRf.get();
                        int type = outputBuffer.get(4) & 0x1F;
                        if(type==7 || type==8){
                                Log.i(TAG, "------PPS, SPS frames (non-image data), ignore -------");
                                mBufferInfo.size = 0;
                        }else if (type == 5) {
                                Log.i(TAG, "------I frame (keyframe), add to mixer -------");
                                if(mMuxerUtils ! =null && mMuxerUtils.isMuxerStarted()){
                                        mMuxerUtils.addMuxerData(new MediaMuxerUtils.MuxerData(
                                                        MediaMuxerUtils.TRACK_VIDEO, outputBuffer,
                                                        mBufferInfo));
                                        prevPresentationTimes = mBufferInfo.presentationTimeUs;
                                        isAddKeyFrame  = true; }}else{
                                 if(isAddKeyFrame){
                                         Log.d(TAG, "------ non-i frames (type=1), add to the mixer -------");
                                                if(mMuxerUtils ! =null&&mMuxerUtils.isMuxerStarted()){
                                                        mMuxerUtils.addMuxerData(newMediaMuxerUtils.MuxerData( MediaMuxerUtils.TRACK_VIDEO, outputBuffer, mBufferInfo)); prevPresentationTimes = mBufferInfo.presentationTimeUs; }}}// Release the output cache resource
                        mVideoEncodec.releaseOutputBuffer(outputBufferIndex, false); }}while (outputBufferIndex >= 0);
Copy the code

There are a few things to note here, because if not handled properly, MediaMuxer may fail to synthesize MP4 files or the recorded MP4 files may start to display a lot of mosaics or audio/video asynchronism.

A) How to ensure audio and video synchronization?

To ensure that recorded MP4 files can be synchronized with audio and video, two things need to be done: First, when we get the output buffer handle outputBufferIndex equal to mediacodec.info_output_format_CHANGED, we need to set the video format to MediaMuxer, The MediaMuxer mixer can only be started if the audio track has also been added; The second is that the PTUs time parameter in the queueInputBuffer passed to MediaCodec should be monotonically increasing, for example:

long prevPresentationTimes= mBufferInfo.presentationTimeUs;
private long getPTSUs(a){
      longresult = System.nanoTime()/1000;
      if(result< prevPresentationTimes){
             result= (prevPresentationTimes  - result ) +result;
      }
      returnresult;
}
Copy the code

B) The first few frames of the recorded MP4 file are blurred?

The main reason for the Mosaic is that the first frame of MP4 file is not key frame (I frame). According to the principle of H.264 coding, it can be known that a sequence of H.264 code stream is composed of SPS, PPS, key frame, B frame, P frame… While B frame and P frame are prediction frames, the image information carried by them is incomplete, so the part of a frame image without information will appear Mosaic. To do this, we can use a frame drop strategy, that is, if it is a normal frame, it will be discarded, and normal frames will only be inserted if the key frame has already been inserted. Note that MediaMuxer does not require SPS, PPS, if the SPS, PPS frame encountered can be ignored.

C) Stop muxer failed, resulting in invalid MP4 files?

The stop Muxer failed exception reported by MediaMuxer is usually caused by the incorrect insertion of synchronization frames (key frames)

D) The recorded video screen is painted and overlapped

Splicing or overlapping occurs when encoding YUV data, because the color space of YUV image frame collected by Camera is different from the color space required by MediaCodec encoder. In other words, the color space supported by Camera is YV12(YUV4:2:0planar) and NV21(YUV4:2:0 semi-planar), COLOR_FormatYUV420Planar(I420), COLOR_FormatYUV420SemiPlanar (NV12), etc. Android encoders support different color Spaces. The I420 color format (YYYYUU VV) is a standard YUV420 color format with a similar data structure to YV12(YYYY VV UU).

3.2 Encode PCM audio data into AAC

Since the principle of using MediaCodec to encode audio and video is the same, I won’t introduce it too much here. Please refer to this blog post for the configuration of audio parameters. In addition, here is an AudioRecord to get the PCM audio stream, which is also relatively easy, see this blog post for details. The code is as follows:

MediaCodec mMediaCodec =MediaCodec.createEncoderByType("audio/mp4a-latm");
MediaFormatformat = new MediaFormat();
format.setString(MediaFormat.KEY_MIME,"audio/mp4a-latm");      // Encoder type, AAC
format.setInteger(MediaFormat.KEY_BIT_RATE,16000);                 // The bitrate is 16kbps
format.setInteger(MediaFormat.KEY_CHANNEL_COUNT,1);        // Channel count, 1
format.setInteger(MediaFormat.KEY_SAMPLE_RATE,8000);          // The sampling rate is 8000Hz
format.setInteger(MediaFormat.KEY_AAC_PROFILE,
           MediaCodecInfo.CodecProfileLevel.AACObjectLC);// The chip supports AAC level, LC
format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE,1600); // Maximum cache,1600
mMediaCodec.configure(format,null.null, MediaCodec.CONFIGURE_FLAG_ENCODE);
mMediaCodec.start();
 
/** * Use AudioRecord to record PCM audio */
Process.setThreadPriority(Process.THREAD_PRIORITY_AUDIO);
intbufferSize = AudioRecord.getMinBufferSize(samplingRate,
AudioFormat.CHANNEL_IN_MONO,AudioFormat.ENCODING_PCM_16BIT);
if(bufferSize< 1600){
       bufferSize = 1600;
}
// Configure the audio source, sampling rate, mono, and sampling precision of the recording device
intsamplingRate = 8000; AudioRecord mAudioRecord = newAudioRecord(MediaRecorder.AudioSource.MIC, samplingRate,AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize); mAudioRecord.startRecording(); The MediaCodec coding core is similar to video, since MediaMuxer does not require ADTS headers, there are no headers added to each frame databyte[] audioBuf = new byte[AUDIO_BUFFER_SIZE];
int readBytes = mAudioRecord.read(audioBuf, 0,AUDIO_BUFFER_SIZE);
if (readBytes > 0) {
try {
	ByteBuffer[] inputBuffers = mAudioEncoder.getInputBuffers();
        ByteBuffer[] outputBuffers = mAudioEncoder.getOutputBuffers();
        // Returns an input buffer handle to the encoder. -1 indicates that no input buffer is currently available
        int inputBufferIndex = mAudioEncoder.dequeueInputBuffer(TIMES_OUT);
        if(inputBufferIndex >= 0) {// bind an empty, writable inputBuffer to the client
            ByteBuffer inputBuffer  = null;
            if(! isLollipop()){ inputBuffer = inputBuffers[inputBufferIndex]; }else{
                inputBuffer = mAudioEncoder.getInputBuffer(inputBufferIndex);
            }
            // Write valid raw data to the input cache and submit it to the encoder for encoding processing
            if(audioBuf==null || readBytes<=0){
            	mAudioEncoder.queueInputBuffer(inputBufferIndex,0.0,getPTSUs(),MediaCodec.BUFFER_FLAG_END_OF_STREAM);
            }else{
                inputBuffer.clear();
                inputBuffer.put(audioBuf);
                mAudioEncoder.queueInputBuffer(inputBufferIndex,0,readBytes,getPTSUs(),0); }}// Returns an output buffer handle. A value of -1 indicates that no output buffer is currently available
        The mBufferInfo argument contains the encoded data, and the timesOut argument is the time to wait out the timeout
        MediaCodec.BufferInfo  mBufferInfo = new MediaCodec.BufferInfo();
        int outputBufferIndex = -1;
        do{
        	outputBufferIndex = mAudioEncoder.dequeueOutputBuffer(mBufferInfo,TIMES_OUT);
        	if(outputBufferIndex == MediaCodec. INFO_TRY_AGAIN_LATER){
                Log.i(TAG,"Get encoder output buffer timeout");
            }else if(outputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED){
                // If the API is less than 21, APP needs to rebind the encoder's input cache;
                // If the API is greater than 21, there is no need to process INFO_OUTPUT_BUFFERS_CHANGED
                if(!isLollipop()){
                    outputBuffers = mAudioEncoder.getOutputBuffers();
                }
            }else if(outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED){
                // The encoder output cache format changes, usually once before data is stored
                // Set the mixer video track here. If audio has been added, start the mixer (to ensure audio and video synchronization).
                MediaFormat newFormat = mAudioEncoder.getOutputFormat();
                MediaMuxerUtils mMuxerUtils = muxerRunnableRf.get();
                if(mMuxerUtils ! =null){
                    mMuxerUtils.setMediaFormat(MediaMuxerUtils.TRACK_AUDIO,newFormat);
                }
                Log.i(TAG,"Encoder output cache format changed, add video track to mixer");
            }else{
                // If flag is set to BUFFER_FLAG_CODEC_CONFIG, the output cache data has been consumed
                if((mBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG) ! =0){
                    Log.i(TAG,"Encoded data consumed, BufferInfo size set to 0.");
                    mBufferInfo.size = 0;
                }
                // End of data flow flag to end the loop
                if((mBufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) ! =0){
                    Log.i(TAG,"End of data flow, exit loop.");
                    break;
                }
                // Get a read-only output buffer, inputBuffer, that contains encoded data
                ByteBuffer outputBuffer = null;
                if(! isLollipop()){ outputBuffer = outputBuffers[outputBufferIndex]; }else{
                    outputBuffer  = mAudioEncoder.getOutputBuffer(outputBufferIndex);
                }
                if(mBufferInfo.size ! =0) {// Failed to get the output cache, an exception was thrown
                    if(outputBuffer == null) {throw new RuntimeException("encodecOutputBuffer"+outputBufferIndex+"was null");
                    }
                    // If API<=19, you need to adjust the position of ByteBuffer according to the offset of BufferInfo
                    // And limit the length of data to be read from the cache, otherwise the output will be messy
                    if(isKITKAT()){
                        outputBuffer.position(mBufferInfo.offset);
                        outputBuffer.limit(mBufferInfo.offset+mBufferInfo.size);
                    }
                    // Mix the h.264 data in the output cache
                    MediaMuxerUtils mMuxerUtils = muxerRunnableRf.get();
                    mBufferInfo.presentationTimeUs = getPTSUs();
                    if(mMuxerUtils ! =null && mMuxerUtils.isMuxerStarted()){
                        Log.d(TAG,"------ Mixed audio data -------");
                        mMuxerUtils.addMuxerData(newMediaMuxerUtils.MuxerData(MediaMuxerUtils.TRACK_AUDIO,outputBuffer,mBufferInfo)); prevPresentationTimes = mBufferInfo.presentationTimeUs; }}// Release the output cache resource
                mAudioEncoder.releaseOutputBuffer(outputBufferIndex,false); }}while (outputBufferIndex >= 0);
			} catch (IllegalStateException e) {
				// Catch a status exception due to thread interruption and stop mixing the dequeueOutputBuffer
				e.printStackTrace();
			} catch (NullPointerException e) {
				Catch the NULL exception for interrupting the thread and stopping mixing MediaCodece.printStackTrace(); }}Copy the code

If you are using AAC data to push the stream, you need to add ADTS headers for each frame audio data. Refer to the ADTS header format and the related Settings in ffMPEG function. In Java, the ADTS header configuration information can be:

private void addADTStoPacket(byte[] packet, int packetLen) {
     packet[0] = (byte) 0xFF;		
     packet[1] = (byte) 0xF1;
     packet[2] = (byte) (((2 - 1) < <6) + (mSamplingRateIndex << 2) + (1 >> 2));
     packet[3] = (byte) (((1 & 3) < <6) + (packetLen >> 11));
     packet[4] = (byte) ((packetLen & 0x7FF) > >3);
     packet[5] = (byte) (((packetLen & 7) < <5) + 0x1F);
     packet[6] = (byte) 0xFC; } &emsp; PacketLen is the original frame data length, and MsamplinGrateful Index is the subscript of the custom sampling rate array.public static final int[] AUDIO_SAMPLING_RATES = {96000./ / 0
            88200./ / 1
            64000./ / 2
            48000./ / 3
            44100./ / 4
            32000./ / 5
            24000./ / 6
            22050./ / 7
            16000./ / 8
            12000./ / 9
            11025./ / 10
            8000./ / 11
            7350./ / 12
            -1./ / 13
            -1./ / 14
            -1./ / 15
};
Copy the code

3.3 Generate MP4 files using MediaMuxer mixed with H.264+AAC

MediaMuxer is relatively simple to use, but you need to strictly follow the following three steps:

Step 1: configure mixer audio and video tracks

public synchronized voidsetMediaFormat(int index, MediaFormat mediaFormat) {
      if (mediaMuxer == null) {
             return;
      }
      // Set the video track format
      if (index == TRACK_VIDEO) {
             if (videoMediaFormat ==null) {
                    videoMediaFormat =mediaFormat;
                    videoTrackIndex =mediaMuxer.addTrack(mediaFormat);
                    isVideoAdd = true;
                    Log.i(TAG, "Add video track"); }}else {
             if (audioMediaFormat ==null) {
                    audioMediaFormat =mediaFormat;
                    audioTrackIndex =mediaMuxer.addTrack(mediaFormat);
                    isAudioAdd = true;
                    Log.i(TAG, "Add audio track"); }}// Start the mixer
      startMediaMuxer();
}
Copy the code

Step 2: Add both audio and video tracks and start the mixer

  private void startMediaMuxer(a) {
          if (mediaMuxer == null) {
                 return;
          }
          if (isMuxerFormatAdded()) {
                 mediaMuxer.start();
                 isMediaMuxerStart = true;
                 Log.i(TAG, "Start the mixer and start waiting for data input....."); }}Copy the code

Step 3: Add audio and video data to the mixer

public void addMuxerData(MuxerData data){
      int track = 0;
      if (data.trackIndex ==TRACK_VIDEO) {
             track = videoTrackIndex;
      } else {
             track = audioTrackIndex;
      }
      try {
             ByteBuffer outputBuffer =data.byteBuf;
             BufferInfo bufferInfo =data.bufferInfo;
             if(isMediaMuxerStart&& bufferInfo.size ! =0){
                    outputBuffer.position(bufferInfo.offset);
                    outputBuffer.limit(bufferInfo.offset+ bufferInfo.size);
                    Log.i(TAG, "Write mixed data +"+data.trackIndex+", size -->"+ bufferInfo.size);
                    mediaMuxer.writeSampleData(track,outputBuffer,bufferInfo);
             }
      if((bufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) ! =0){
               Log.i(TAG,"BUFFER_FLAG_END_OF_STREAM received"); }}catch (Exception e) {
             Log.e("TAG"."Write mixed data failed!" +e.toString());
// restartMediaMuxer();}}Copy the code

Github project address: github.com/jiangdonggu…