The original address of the original article, shall not be reproduced without the permission of the author

Shan Dai far, the moon wavelength twilight cloud autumn shadow dip xiaoxiang drunk soul should be driven by lingbo dream, pay the west wind cool night

On the Android development side, audio and video occupy a large area. For those who want to understand this, they often don’t know where to start. The blogger himself has been involved in audio and video development for some time, and has been recording some audio and video development blogs as a review and supplement of his own learning. Those of you who are interested in previous blogs can get to know them first.

MediaCodeC hardcodes photo sets as video Mp4 files MediaCodeC hardcodes video MediaCodeC decodes video in its entirety and stores it as a picture file. Using two different methods, hard coding decodes video MediaCodeC decodes video specified frames hard coding decodes specified frames

An overview of the

Recently, I reviewed what I learned and found that there are still many shortcomings. Therefore, I decided to write several summaries, taking the Android platform recording video project as an example to sort out my knowledge of audio and video development. If you want to learn audio and video development, but do not know where to start, you can imitate the blogger to do a related Demo to learn. The project address entry of this article is in the Camera2Record entry interface, and the business function is realized in Camera2Recorder.

API

The technical logic of video in the project is as follows:

The raw frame data is encoded as H264 stream using Camera2API and MediaCodeC + Surface + OpenGL

The technical logic of audio is:

AudioRecord Encodes PCM data into AAC data

Audio and video coding uses MediaMuxer to encapsulate video frame data and audio frame data into MP4 files. Overall, the apis involved are:

  • MediaCodeC
  • AudioRecord
  • MediaMuxer
  • OpenGL (don’t need to go into details)

Architecture Design

As a simple audio and video recording application, there is no fancy function (not yet, will be added gradually later). The whole business logic is straightforward recording video — > producing video. Business subdivided, there are mainly three parts: one is the picture, that is, the video part; The second is sound, the audio part; The third is the mixer, which mixes video and audio and generates video files. After differentiating the business slightly, we work backwards from the results. Now that we are generating MP4 files, what data do we need to provide? So we tease out the detailed functions of each module based on the output — the mixer section.

Video encapsulation

In the mixer module, MediaMuxer provided by Android is used as the output tool for video packaging. MediaMuxer supports three output formats: MP4, Webm and 3GP files. The mixer output for this project is naturally selected MP4 files. MP4 is a generalized file extension defined by the official MPEG-4 container format. It can stream and support a wide range of multimedia content: multiple tracks, video streams, subtitles, images, variable frame rates, and bit rates. Mpeg-4 standard video/audio format should be preferred when making MP4 files. Generally speaking, there are two relatively common encoding methods for MP4 containers:

  • H264 video coding, AAC audio coding
  • Xvid video encoding, MP4 audio encoding

Video coding algorithm

In this project, the video coding algorithm adopted by the blogger is H264. H264 is the video compression format with the highest compression rate. Compared with other encoding formats, H264 has the same picture quality and the smallest volume. It has two names, one is the ITU_T organization’s H.26x name — H.264; The other is MPEG-4AVC, which stands for advanced video encoding, while MP4 is the standard package format used for H264 encoding.

Audio coding algorithm

The audio encoding algorithm used by bloggers is AAC. The AAC can support 48 tracks and 15 low-frequency tracks simultaneously, providing better sound quality at 30 percent less volume than an MP3. AAC was originally based on MPEG-2 audio encoding technology, and later the MPEG_4 standard was introduced. AAC reintegrated other technologies and changed to the current MPEG-4 AAC standard. In general, the AAC encoding commonly used today refers to MPEG-4 AAC. Mpeg-4 AAC has six seed specifications:

  • Mpeg-4 AAC LC Low Complexity (Low Complexity) – The audio part of the MP4 files that are now common on mobile phones is now loaded with this specification
  • Mpeg-4 AAC Main Specifications Note: Includes all features except gain control and has the best sound quality
  • Scaleable SampleRate MPEG-4 AAC SSR
  • Mpeg-4 AAC Long Term PredicP specifications
  • Mpeg-4 AAC LD Low Delay specifications
  • Mpeg-4 AAC HE HighEfficiency specification — this specification is for low bit rate encodings and is supported by NeroACC encoders

The most popular ones at the moment are LC and HE. Note that mPEG-4 AAC LC is a “low complexity specification” and generally applies to medium bit rates. And medium bit rate, generally refers to 96kbps~192kbps, so if the USE of LC encoding, please control the bit rate in this range will be better.

The working process

Once the business logic is sorted out, the more specific functions of each module become much clearer. Here is a rough flow chart for your reference:

VideoRecorder
OpenGL+Surface+MediaCodeC
VideoPacket
ByteArray containing type
BufferInfo: Mainly the timestamp of this frame and so on



AudioRecorder
ByteArray containing type.
The AudioRecord class can provide two external types, ShortArray and ByteArray. Since the external data type of the video is ByteArray, ByteArray is also selected here
AudioEncoder
AudioEncoder
By AAC raw data, we mean data without ADTS headers added



Mux
Mux
MediaMuxer

Audio recording and coding

Audio module is divided into recording and coding two small modules, respectively running in two independent working threads. The recording module is not much to mention, is completely based on the secondary encapsulation of AudioRecord, here is the code address AudioRecorder. The audio recording module, AudioEncoder, gets the available PCM data after running and calls it back to the outside, encapsulating it into a thread-safe linked list. AudioEncoder constantly extracts data from the linked list and uses MediaCodeC to encode PCM data into AAC audio frame data. Since MediaMuxer encapsulates AAC audio tracks and does not require ADTS headers, the original AAC frame data obtained by the AudioEncoder does not need to be processed again.

Var presentationTimeUs = 0L val bufferInfo = mediacodec.bufferInfo ()while(isRecording.isNotEmpty() || pcmDataQueue.isNotEmpty()) { val bytes = pcmDataQueue.popSafe() bytes? .apply { val (id, inputBuffer) = codec.dequeueValidInputBuffer(1000) inputBuffer?.let { totalBytes += size it.clear() it.put(this) End -- stream Flag Codec. QueueInputBuffer (id, 0, size, presentationTimeUs,if (isEmpty()) MediaCodec.BUFFER_FLAG_END_OF_STREAM else 0)
                    // 1000000L/ 总数据 / audio channel / sampleRate
            presentationTimeUs = 1000000L * (totalBytes / 2) / format.sampleRate
            }
        }
        
        loopOut@ while (trueVal outputBufferId = dequeueOutputBuffer(bufferInfo, defTimeOut)if (outputBufferId == MediaCodec.INFO_TRY_AGAIN_LATER) {
                 break@loopOut
             } else if (outputBufferId == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
             // audio format changed
            } else if (outputBufferId >= 0) {
                if(bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM ! = 0) {break@loopOut
                }
                val outputBuffer = codec.getOutputBuffer(it)
                if (bufferInfo.size > 0) {
                    frameCount++
                    dataCallback.invoke(outputBuffer, bufferInfo)
                }
                codec.releaseOutputBuffer(it, false)}}}Copy the code

Here’s how it works: MediaCodeC fills the available input queues with data only in the PCM linked list. The data length of each PCM is not necessarily the same as that of an audio frame, so all the engineering needs to do is to continuously input data to the encoder, and the encoder needs to continuously output data until the input data inside the encoder is encoded. One other thing to note is that MediaCodec sends a ==BUFFER_FLAG_END_OF_STREAM== flag when the input data is fully populated, which indicates that the input data is END. If this flag is not sent, the encoded audio data will lose the last short interval of audio. In addition, there is a very important point, is the time stamp calculation problem of AAC code, the relevant part of the knowledge please read the blogger’s previous blog to solve the TIME stamp problem of AAC code

To be continued

Due to limited space, this post only shares the encoding of the audio, in the next post the blogger will share the recording and encoding of the video ~~ above

note

  • 1. The architecture design of this paper refers to the chapter of “Advanced Guide for Audio and Video Development” – Implementing a video recording application
  • 2, reference materials Mp4 coding full introduction
  • 3, reference materials audio and video packaging format, coding format knowledge
  • 4. Reference materials AAC audio coding format introduction