007- Audio AAC codec details

preface

If you remember, in the004- Video H264 coding details (part 1)In theViii. Realization of AVFoundation video Data Collection (3)In the following section, the callback method triggered by audio and video capture is 👇🏻 Video H264theThe principle of,codecAs well asRendering showsThat’s all that’s left for this articleAudio codecThe part.

First, audio principle knowledge point

First of all, let’s understand a few knowledge points related to audio.

1.1 sound

Sound is the sound wave generated by the vibration of an object. It is a wave phenomenon transmitted through a medium (air, solid or liquid) and can be perceived by human or animal hearing organs. We all studied physics in junior high school and know that sound is made up of three elements 👇🏻

Tone: of soundHigh and low(treble, bass), byFrequencyThe higher the frequency, the higher the toneHz (Hertz))
Volume: namely the amplitude of sound vibration, the person feels the size of sound subjectively.
Timbre: also known astamberThe waveform determines the timbre of the sound. Voice becauseMaterial objectsTimbre itself is a kind of abstract thing, butwaveformIs to take this abstract and intuitive representation.Different waveform, different timbre.Different timbre, through the waveform, can be completely distinguished.

Psychoacoustic model

As can be seen from the figure above, human hearing ranges from 20Hz to 20,000 hz. Anything below 20Hz is called infrasound, and anything above 20,000 hz is called ultrasonic. We can’t hear infrasound or ultrasonic, so when we encode and decode the audio stream, we can kill them.

1.2 Pulse Code Modulation (PCM)

So the question now is 👉🏻 how to convert the sounds of real life into digital signals? This is pulse code modulation (PCM), just to understand.

The process of converting sound into digital signals is shown below 👇🏻

It can be roughly divided into three stages 👇🏻

sampling
quantitative
coding

Suppose an analog signal F (t) passes through a switch, then the output of the switch is related to the state of the switch,

When the switch is inclosed, the output of the switch is the input, that is, y(t) = f(t).
If the switch is inDisconnect position, the output y(t) is zero

It can be seen that if the switch is controlled by a narrow pulse train (sequence), the switch is closed when the pulse appears and the switch is disconnected when the pulse disappears. The output Y (t) is a pulse train (sequence) with changing amplitude, and the amplitude of each pulse is the instantaneous value of the input signal F (t) when the pulse appears. Therefore, Y (t) is the sampled signal or sample signal of f(t).

FIG. 3-2(a) shows a narrow pulse sequence P (t) with Ts as the time interval. Because it is used for sampling, it is called a sampling pulse.

Figure 3-2(b), V (t) is the analog voltage signal to be sampled, The values of k(t) of discrete signals after sampling are k(0) = 0.2, K (Ts) = 0.4, K (2Ts) = 1.8, K (3Ts) = 2.8, K (4Ts) = 3.6, k(5Ts) = 5.1, respectively. 6 k (ts) = 6.0, (ts) = 5.7 k, k (ts) = 3.9, (9 ts) = 2.0 k, k (ts) = 1.2; So it’s random between 0 and 6, which means there’s an infinite number of possible values.

FIG. 3-2(c), in order to change the infinite number of possible values into a finite number, we must quantize the values of k(t) (i.e., round to five) to get m(t). Values into the m (t), m (0) = 0.0, m = 0.0 (Ts), m (2 Ts) = 2.0, m (3 Ts) = 3.0, m (4 Ts) = 4.0, = 5.0 m (5 ts), m (ts) = 6.0, m (ts) = 6.0, m (ts) = 4.0, 2.0 m (9 ts) =, m (ts) = 1.0; There are only seven possible values from 0 to 6.

As shown in Figure 3-2(d), M (t) has become a digital signal, but it is not yet a binary digital signal in practical application. Therefore, the digital signal D (t) in Figure 3-2(d) can be obtained by natural encoding of M (t) with 3-bit binary encoding element. Thus A/D conversion is completed and pulse code modulation is realized. That is, the whole PROCESS of PCM.

1.3 Quantitative process of understanding

Quantization 👉🏻 maps an infinite set of values of a continuous function to a finite set of values of a discrete function.

Quantitative values👉🏻 Quantified value
Quantitative level👉🏻 Number of quantized values
Quantitative interval👉🏻 the difference between two adjacent quantized values

In the figure above, the sample value signal K (t) of V (t) is different from the quantized signal M (t), for example, k(0) = 0.2 and M (0) = 0.

The receiver can only be the quantized signal M (t), but can not recover k(t). In this way, the error between the sending and receiving signals is called quantization error or quantization noise.

The quantization is “rounded”, so the maximum quantization noise is 0.5. In general, for quantization noise, the maximum absolute error is 0.5 quantization intervals. The quantization with the same interval is called uniform quantization.

1.4 Audio compression coding principle & standard

The quality of digital audio depends on two parameters: sampling frequency and quantization number. In order to ensure that the sampling points in the direction of time change as close as possible, the sampling frequency should be high; The amplitude value should be as fine as possible and the quantization bit rate should be high. The immediate result is pressure on storage capacity and transmission channel capacity requirements.

Transmission rate of audio signal = sampling frequency * number of quantized bits of sample * number of channels

Audio compression coding principle

Lossy coding lossy coding eliminates redundant data, because in the acquisition process, there are all kinds of frequencies of sound, which we can discard the human ear can not hear the part of the sound data, directly from the data source to kill! This can greatly reduce the amount of data stored.
Lossless coding Lossless coding, also known as Huffman coding, preserves all sound data (that is, sounds audible to the human ear) except for compressed sound that is inaudible to the human ear! And the compressed data can be completely restored! (Short code high frequency, long code low frequency)
Method of compression
- remove-acquiredAudio redundancy information! includingBeyond the range of human hearingThe data,Is obscuredAudio signals, etc.
- Shading effect👉🏻 A weaker voice can be influenced by a stronger one!
- Shading effectThe signal performance 👉🏻Frequency domain to cover 和 The time domain to cover.
Audio compression encoding format
- MPEG-1
- Dolby AC – 3
- MPEG-2
- MPEG-4
- AAC

standard

Sampling frequency = 44.1kHz
The number of quantized bits of the sample value = 16
The number of signal channels in normal stereo = 2
Digital signal transmission bit stream is about 1.4M bit/s
The amount of data in one second is 1.4Mbit/(8/Byte) and 176.4 bytes (Byte), equal to the amount of data in 88,200 Chinese characters

Frequency domain masking and time domain masking

Now, let’s focus on frequency domain masking and time domain masking and what exactly does that mean?

2.1 Frequency domain masking

The X-axis is the frequency domain. Here it starts at 20hz because you can’t hear it20 hzFrequency sound
On the Y-axis is decibels.Below 40 decibels“Is inaudible to human ears
Take a look atPurple columnThis part of the sound is audible, but the periphery is presentRed column, the high column means that the sound is particularly loud and the surroundingPurple poleallshelterthe
whileGreen columnIs smaller than purple and red, proving that the sound comparisonmagnetic
greenandredAfter the collision, just like boys and girls quarrel, certainly stem however, doomed to lose, 😂
However,Green columnFrom theRed columnA whiledistance“, indicating theirThere is a large difference in frequency domain valuesFrom a third party’s point of view, they both canhearthe

2.2 Time domain masking

In the figure,simultaneousAt this time, there is oneThe high(top line), there’s onebass(bottom line) and vocalizing at the same time, thenThe treble will completely drown out the bass
There is a special case 👉🏻preAt the beginning, there is a small sound, and then suddenly (for a short time) there is a big sound, at which point the small sound will be obscured
There is a voiceTravel timeYes, there isTime effectProbably,Cover about 50ms forward.Cover backward about 100ms

Audio AAC coding

Audio AAC coding and video are completely different, do not refer to the previous video coding process! Video coding uses the VideoToolBox, and audio using the AudioToolBox.

Again, encoding audio also encapsulates a utility class.

3.1 Encoding tool class header file & initialization

Audio parameter configuration class

First, initialization is handled in the same way as video encoding. We also define an audio parameter configuration class named CCAudioConfig👇🏻

/ / @interface CCAudioConfig: NSObject /** code rate */ @property (nonatomic, assign) NSInteger bitRate; //96000) /** channel */ @property (nonatomic, assign) NSInteger channelCount; // (1) /** sampleRate */ @property (nonatomic, assign) NSInteger sampleRate; //(default 44100) /** Sample point quantization */ @property (nonatomic, assign) NSInteger sampleSize; //(16) + (instancetype)defaultConifg; @endCopy the code

Implementation section 👇🏻

@implementation CCAudioConfig

+ (instancetype)defaultConifg {
    return  [[CCAudioConfig alloc] init];
}
- (instancetype)init
{
    self = [super init];
    if (self) {
        self.bitrate = 96000;
        self.channelCount = 1;
        self.sampleSize = 16;
        self.sampleRate = 44100;
    }
    return self;
}
@end
Copy the code

Utility class callback

/**AAC encoder delegate */ @protocol CCAudioEncoderDelegate <NSObject> - (void)audioEncodeCallback (NSData *)aacData; @endCopy the code

So audio, unlike video, when we encode audio, we directly output binary NSData.

Utility class header file

The utility class is named CCAudioEncoder, and the header file should contain the initialization method and encoding method 👇🏻

/** CCAudioEncoder: NSObject */ @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, weak) id<CCAudioEncoderDelegate> delegate; /** initialize the incoming encoder configuration */ - (instancetype)initWithConfig:(CCAudioConfig*)config; / * * * / coding - (void) encodeAudioSamepleBuffer: (CMSampleBufferRef) sampleBuffer; @endCopy the code

Initialization Process

As with video encoding, we also need to define two queues 👉🏻 encoding queue + callback queue to asynchronously process the audio encoding and callback result 👇🏻, respectively

@property (nonatomic, strong) dispatch_queue_t encoderQueue;
@property (nonatomic, strong) dispatch_queue_t callbackQueue;
Copy the code

Also required are the audio encoding related properties 👇🏻

// For @property (nonatomic, unsafe_unretained) AudioConverterRef audioConverter; //PCM buffer @property (nonatomic) char *pcmBuffer; //PCM buffer size @property (nonatomic) size_t pcmBufferSize;Copy the code

In the initialization method, the processing for these attributes is 👇🏻

- (instancetype)initWithConfig:(CCAudioConfig*)config { self = [super init]; If (self) {encoderQueue = dispatch_queue_create(" aAC hard encoder queue", DISPATCH_QUEUE_SERIAL); // callbackQueue = dispatch_queue_create(" aAC hard encoder callback queue", DISPATCH_QUEUE_SERIAL); // Audio converter _audioConverter = NULL; _pcmBufferSize = 0; _pcmBuffer = NULL; _config = config; if (config == nil) { _config = [[CCAudioConfig alloc] init]; } } return self; }Copy the code

3.2 Preparation before coding

Before encoding the audio, we need to configure the parameters of the audio encoding, which involves the functions of the audio related structures, creating converters and configuring properties.

3.2.1 Structure of audio parameters

Audio parameter structure is AudioStreamBasicDescription 👇 🏻

This structure provides a description of the audio file.

The generation of audio files is analog signal -> digital signal after PCM -> compressed, encoded audio files.

The sampling frequency at PCM is calledsample rate.
Each sample can yield several samples of data, corresponding to multiplechannel.
A number of samples obtained from each sampling point are combined and called oneframe.
A number of frameWhen you combine them, you call them onepacket.

Members interpret 👇🏻

mSampleRate, is to adopt the frequency
mBitsPerChannelIs the number of bits per sample data
mChannelsPerFrame, can be understood as the number of sound channels, that is, the number of sampling data generated at a sampling time
mFramesPerPacketIs the number of frames in each packet, and is equal to the number of sampling intervals experienced in this packet
mBytesPerPacket, the number of bytes of data in each packet
mBytesPerFrame, the number of bytes of data in each frame

3.2.2 Creating converter

Correlation function is AudioConverterNewSpecific 👇 🏻

Parameter Description 👇🏻

Parameter 1: Enter the audio format description
Parameter 2: Output audio format description
Parameter 3: Number of class desc
Parameter 4: class desc
Parameter 5: The converter created

3.2.3 Setting Converter Properties

The correlation function isAudioConverterSetProperty Parameter Description 👇🏻

Parameter 1: converter
Parameter 2: the key of the property, refer to enumerationAudioConverterPropertyID
Parameter 3:The value attribute valuesOf the data typeThe size size
Four parameters:The value attribute valuestheAddress value

3.2.4 Description of the encoder type

The structure described by the encoder type is AudioClassDescription👇🏻

Members interpret 👇🏻

mTypeEncoding the format of the output, for exampleAAC
mSubTypeEncoding the subformat of the output
mManufacturerWay of codingSoft codingorHard coded

3.3 coding

Then we came to the key coding process, our foreign coding method is – (void) encodeAudioSamepleBuffer: (CMSampleBufferRef) sampleBuffer; , is called in the collection callback method of the collection class CCSystemCapture 👇🏻

// Capture audio and video callback - (void)captureSampleBuffer:(CMSampleBufferRef)sampleBuffer Type: (CCSystemCaptureType)type {if (CCSystemCaptureTypeAudio) {// Audio data //1. Play PCM data directly NSData * pcmData = [self convertAudioSamepleBufferToPcmData: sampleBuffer]; [_pcmPlayer playPCMData:pcmData]; / / 2. AAC coding [_audioEncoder encodeAudioSamepleBuffer: sampleBuffer]; }else { [_videoEncoder encodeVideoSampleBuffer:sampleBuffer]; }}Copy the code

There are two processing methods for the collected audio 👇🏻

Play PCM data directly
AAC encoding

Directly play PCM data to speak again, we first look at AAC coding 👇🏻

- (void)encodeAudioSamepleBuffer:(CMSampleBufferRef)sampleBuffer { CFRetain(sampleBuffer); // Check whether the audio converter was created successfully. If the vm is not created successfully. Then configure the audio encoding parameters and create a transcoder if (! _audioConverter) { [self setupEncoderWithSampleBuffer:sampleBuffer]; } // dispatch_async(_encoderQueue, ^{});Copy the code

First, take a look at the process of configuring the audio encoding parameters and creating a transcoder 👇🏻

- (void)setupEncoderWithSampleBuffer: (CMSampleBufferRef sampleBuffer) {/ / obtain input parameters AudioStreamBasicDescription inputAduioDes = *CMAudioFormatDescriptionGetStreamBasicDescription( CMSampleBufferGetFormatDescription(sampleBuffer)); / / set the output parameters AudioStreamBasicDescription outputAudioDes = {0}; outputAudioDes.mSampleRate = (Float64)_config.sampleRate; // Outputauds35. mFormatID = kAudioFormatMPEG4AAC; / / output format outputAudioDes. MFormatFlags = kMPEG4Object_AAC_LC; / / if set to 0 for lossless coding outputAudioDes. MBytesPerPacket = 0; / / identify each packet size outputAudioDes ourselves. MFramesPerPacket = 1024; // The number of frames per packet is aAC-1024; outputAudioDes.mBytesPerFrame = 0; / / each frame size outputAudioDes mChannelsPerFrame = (uint32_t) _config. ChannelCount; / / output channel number outputAudioDes. MBitsPerChannel = 0; // The number of bits sampled for each channel in the data frame. outputAudioDes.mReserved = 0; // Fill output information UInt32 outDesSize = sizeof(outputAudioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &outDesSize, &outputAudioDes); AudioClassDescription *audioClassDesc = [self getAudioCalssDescriptionWithType:outputAudioDes.mFormatID fromManufacture:kAppleSoftwareAudioCodecManufacturer]; / / create the converter OSStatus status = AudioConverterNewSpecific (& inputAduioDes, & outputAudioDes, 1, audioClassDesc, &_audioConverter); if (status ! = noErr) {NSLog(@"Error! : hardcoded AAC creation failed, status= %d", (int)status); return; } /* kAudioConverterQuality_Max = 0x7F, kAudioConverterQuality_High = 0x60, kAudioConverterQuality_Medium = 0x40, kAudioConverterQuality_Low = 0x20, kAudioConverterQuality_Min = 0 */ UInt32 temp = kAudioConverterQuality_High; / / codec rendering quality AudioConverterSetProperty (_audioConverter kAudioConverterCodecQuality, sizeof (temp), & temp); // Set the bitrate to uint32_t audioBitrate = (uint32_t)self.config.bitrate; uint32_t audioBitrateSize = sizeof(audioBitrate); status = AudioConverterSetProperty(_audioConverter, kAudioConverterEncodeBitRate, audioBitrateSize, &audioBitrate); if (status ! = noErr) {NSLog(@"Error! : hard coded AAC setting bit rate failed "); }}Copy the code

Then we come to the encoded asynchronous queue, the process that needs to be processed 👇🏻

Get PCM data stored in BlockBuffer

CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); CFRetain(blockBuffer); / / get BlockBuffer audio data and audio data in address OSStatus status = CMBlockBufferGetDataPointer (BlockBuffer, 0, NULL, & _pcmBufferSize, &_pcmBuffer); NSError *error = nil; if (status ! = kCMBlockBufferNoErr) { error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Error: ACC encode get data point error: %@",error); return; }Copy the code

Output buffer, wrapped intoAudioBufferListIn the

First take a look atAudioBufferList👇 🏻

Obviously, we want to store the blockBuffer obtained above into AudioBuffer mBuffers[1], a member of the AudioBufferList, and do so 👇🏻

Uint8_t *pcmBuffer = malloc(_pcmBufferSize); // set _pcmBufferSize to pcmBuffer. Memset (pcmBuffer, 0, _pcmBufferSize); AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)_config.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)_pcmBufferSize; outAudioBufferList.mBuffers[0].mData = pcmBuffer;Copy the code

Configure the fill function to get the output data

thisPopulate the function, it isWhen coding is doneThe callback function is passedAudioConverterFillComplexBufferTo configure it. Let’s seeAudioConverterFillComplexBuffer👇 🏻

Convert the data provided by the input callback function, with the parameters interpreted as 👇🏻

Parameter 1:inAudioConverterAudio frequency converter
Argument 2:inInputDataProcCallback function. A callback function that provides the audio data to be converted. This callback is repeatedly called when the converter is ready to accept new input data.
Parameter 3:inInputDataProcUserData 即self
Four parameters:ioOutputDataPacketSize, the output buffer size
Parameter 5:outOutputData, the audio data to be converted
Parameter 6:outPacketDescriptionTo output packet information

The code is 👇🏻

// The output packet size is 1 UInt32 outputDataPacketSize = 1. / / convert the input data provided by the callback function status = AudioConverterFillComplexBuffer (_audioConverter aacEncodeInputDataProc, (__bridge void * _Nullable)(self), &outputDataPacketSize, &outAudioBufferList, NULL); If (status == noErr) {// Get data NSData *rawAAC = [NSData dataWithBytes: outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; // Free pcmBuffer (pcmBuffer); / / add ADTS head, want to get naked, please ignore add ADTS head, write to a file, you must add / / NSData * adtsHeader = [self adtsDataForPacketLength: rawAAC, length]. // NSMutableData *fullData = [NSMutableData dataWithCapacity:adtsHeader.length + rawAAC.length];; // [fullData appendData:adtsHeader]; // [fullData appendData:rawAAC]; / / the data is passed to the callback queue dispatch_async (_callbackQueue, ^ {[_delegate audioEncodeCallback: rawAAC]; }); } else { error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; } // Release CFRelease(blockBuffer); CFRelease(sampleBuffer); If (error) {NSLog(@"error: AAC encoding failed %@",error); }Copy the code

The encoded callback function is aacEncodeInputDataProc. For details, see 3.4 Encoding Callback.

Moving on, we notice that there are two ways to handle a successful configuration function 👇🏻

writeDisk file, the premise is to addADTSheader
The data is passed to the callback queue for processing by the callerdecodingThe process of

AAC audio format

Speaking of THE ADTS head, we must first explain the AAC audio format, divided into two 👇🏻

ADIF: Audio Data Interchange Format Audio Data Interchange Format. The feature of this format is that it is possible to find the beginning of the audio data definitively, without the decoding that begins in the middle of the audio data stream, i.e. it must be decoded at a clearly defined beginning. Therefore, this format is often used in disk files.
ADTS: Audio Data Transport Stream. The characteristic of this format is that it is a bit stream with synchronization words, and decoding can start anywhere in this stream. Its features are similar to the MP3 data stream format.

Simply put, ADTS can be decoded at any frame, which means that it has headers for each frame. ADIF has only one unified header, so it must be decoded after getting all the data. And the formats of the two headers are also different. At present, the encoded and extracted audio streams are all in ADTS format 👇🏻

Add ADTS head

Here is the process of adding ADTS header 👇🏻 (just to understand)

/** * Add ADTS header at the beginning of each and every AAC packet. * This is needed as MediaCodec encoder generates a Packet of raw * AAC data. * * AAC ADtS header * Note the packetLen must count in the ADtS header itself. * See: http://wiki.multimedia.cx/index.php? title=ADTS * Also: http://wiki.multimedia.cx/index.php? title=MPEG-4_Audio#Channel_Configurations **/ - (NSData*)adtsDataForPacketLength:(NSUInteger)packetLength { int adtsLength = 7; char *packet = malloc(sizeof(char) * adtsLength); // Variables Recycled by addADTStoPacket int profile = 2; //AAC LC //39=MediaCodecInfo.CodecProfileLevel.AACObjectELD; int freqIdx = 4; Int chanCfg = 1; int chanCfg = 1; int chanCfg = 1; int chanCfg = 1; //MPEG-4 Audio Channel Configuration. 1 Channel front-center NSUInteger fullLength = adtsLength + packetLength; // fill in ADTS data packet[0] = (char)0xFF; // 11111111 = syncword packet[1] = (char)0xF9; // 1111 1 00 1 = syncword MPEG-2 Layer CRC packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2)); packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11)); packet[4] = (char)((fullLength&0x7FF) >> 3); packet[5] = (char)(((fullLength&7)<<5) + 0x1F); packet[6] = (char)0xFC; NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES]; return data; }Copy the code

3.4 Encoding callback

Finally, let’s look at the process of coding the callback function aacEncodeInputDataProc 👇🏻

static OSStatus aacEncodeInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, Void *inUserData) {// Get self CCAudioEncoder *aacEncoder = (__bridge CCAudioEncoder *)(inUserData); PcmBuffsize if (! aacEncoder.pcmBufferSize) { *ioNumberDataPackets = 0; return - 1; } // Make ioData->mBuffers[0]. MData = aacEncoder. ioData->mBuffers[0].mDataByteSize = (uint32_t)aacEncoder.pcmBufferSize; ioData->mBuffers[0].mNumberChannels = (uint32_t)aacEncoder.config.channelCount; AacEncoder. PcmBufferSize = 0; *ioNumberDataPackets = 1; return noErr; }Copy the code

Basically, the decoded data (cached in the CCAudioEncoder instance) is populated into the AudioBufferList.

3.5 summary

As shown in the figure above, the coding totals three steps:

Configure encoder, start preparing coding;
PCM data are collected and transmitted to the encoder;
Code completionThe callback callbackOr,Written to the file.

4. Audio AAC decoding

Analysis play coding process, the next natural is decoding 👉🏻 we also package a tool class CCAudioDecoder.

4.1 Decoding tool class header file

Decoding is the same as coding. Go straight to the code.

#import <Foundation/Foundation.h> #import <AVFoundation/AVFoundation.h> @class CCAudioConfig; / / @protocol CCAudioDecoderDelegate <NSObject> - (void)audioDecodeCallback:(NSData *)pcmData; @end @interface CCAudioDecoder : NSObject @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, weak) id<CCAudioDecoderDelegate> delegate; // initialize the incoming decoding configuration - (instancetype)initWithConfig:(CCAudioConfig *)config; /** decodeAudioAACData: (NSData *)aacData; @endCopy the code

Extension classes in. M 👇🏻

@interface CCAudioDecoder() @property (nonatomic, strong) dispatch_queue_t decoderQueue; @property (nonatomic, strong) dispatch_queue_t callbackQueue; // for the audioConverter object @property (nonatomic) AudioConverterRef audioConverter; @property (nonatomic) char *aacBuffer; //AAC cache size @PROPERTY (NONATOMIC) UInt32 aacBufferSize; / / the description of the audio stream packets @ property (nonatomic) AudioStreamPacketDescription * packetDesc; @endCopy the code

4.2 the initialization

Next look at initialization 👇🏻

- (instancetype)initWithConfig:(CCAudioConfig *)config {
    self = [super init];
    if (self) {
        _decoderQueue = dispatch_queue_create("aac hard decoder queue", DISPATCH_QUEUE_SERIAL);
        _callbackQueue = dispatch_queue_create("aac hard decoder callback queue", DISPATCH_QUEUE_SERIAL);
        _audioConverter = NULL;
        _aacBufferSize = 0;
        _aacBuffer = NULL;
        _config = config;
        if (_config == nil) {
            _config = [[CCAudioConfig alloc] init];
        }
        AudioStreamPacketDescription desc = {0};
        _packetDesc = &desc;
        [self setupEncoder];
    }
    return self;
}
Copy the code

And then the setupEncoder 👇 🏻

- (void) setupEncoder {/ / output parameters PCM AudioStreamBasicDescription outputAudioDes = {0}; outputAudioDes.mSampleRate = (Float64)_config.sampleRate; / / sampling rate outputAudioDes mChannelsPerFrame = (UInt32) _config. ChannelCount; // Outputauds35. mFormatID = kAudioFormatLinearPCM; / / output format outputAudioDes. MFormatFlags = (kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked); / / code 12 outputAudioDes. MFramesPerPacket = 1; // Number of frames per packet; outputAudioDes.mBitsPerChannel = 16; // The number of bits sampled for each channel in the data frame. outputAudioDes.mBytesPerFrame = outputAudioDes.mBitsPerChannel / 8 *outputAudioDes.mChannelsPerFrame; / / each frame size (sampling) the digits / 8 * the number of channels outputAudioDes. MBytesPerPacket = outputAudioDes. MBytesPerFrame * outputAudioDes mFramesPerPacket;  // Each packet size (frame size * frame number) outputauds35s.mreserved = 0; / / the way 0 (8 byte alignment) / / input parameters aac AudioStreamBasicDescription inputAduioDes = {0}; inputAduioDes.mSampleRate = (Float64)_config.sampleRate; inputAduioDes.mFormatID = kAudioFormatMPEG4AAC; inputAduioDes.mFormatFlags = kMPEG4Object_AAC_LC; inputAduioDes.mFramesPerPacket = 1024; inputAduioDes.mChannelsPerFrame = (UInt32)_config.channelCount; // Padding output information UInt32 inDesSize = sizeof(inputAduioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &inDesSize, &inputAduioDes); AudioClassDescription *audioClassDesc = [self getAudioCalssDescriptionWithType:outputAudioDes.mFormatID fromManufacture:kAppleSoftwareAudioCodecManufacturer]; / / create the converter OSStatus status = AudioConverterNewSpecific (& inputAduioDes, & outputAudioDes, 1, audioClassDesc, &_audioConverter); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC creation failed, status= %d", (int)status); return; }}Copy the code

And coding – (void) setupEncoderWithSampleBuffer: (CMSampleBufferRef) sampleBuffer process is basically the same. Difference is that create decoder method getAudioCalssDescriptionWithType: fromManufacture: 👇 🏻

- (AudioClassDescription *)getAudioCalssDescriptionWithType:(AudioFormatID)type fromManufacture:(uint32_t)manufacture { static AudioClassDescription desc; UInt32 decoderSpecific = type; // Obtain the total size of AAC decoder UInt32 SIZE; OSStatus status = AudioFormatGetPropertyInfo(kAudioFormatProperty_Decoders, sizeof(decoderSpecific), &decoderSpecific, &size); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC get info failed, status= %d", (int)status); return nil; } // Count the number of decoders unsigned int count = size/sizeof(AudioClassDescription); // Create an array containing count decoders AudioClassDescription description[count]; Status = AudioFormatGetProperty(kAudioFormatProperty_Encoders, sizeof(decoderSpecific), &decoderSpecific, &size, &description); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC get propery failed, status= %d", (int)status); return nil; } for (unsigned int i = 0; i < count; i++) { if (type == description[i].mSubType && manufacture == description[i].mManufacturer) { desc = description[i]; return &desc; }}Copy the code

There are several differences 👇🏻

Different output parameters: encoding isAACDecoding isPCM
The converter is different. The code iskAudioFormatProperty_EncodersDecoding iskAudioFormatProperty_Decoders

4.3 Preparation before decoding

We have encapsulated a structure CCAudioUserData, which records aAC information as an argument to the decoded callback function 👇🏻

typedef struct {
    char * data;
    UInt32 size;
    UInt32 channelCount;
    AudioStreamPacketDescription packetDesc;
} CCAudioUserData;
Copy the code

4.4 the decoding

Then comes the decoding process 👇🏻

- (void)decodeAudioAACData:(NSData *)aacData { if (! _audioConverter) { return; } dispatch_async(_decoderQueue, ^{//CCAudioUserData records aAC information as an argument to the decoder callback function CCAudioUserData userData = {0}; userData.channelCount = (UInt32)_config.channelCount; userData.data = (char *)[aacData bytes]; userData.size = (UInt32)aacData.length; userData.packetDesc.mDataByteSize = (UInt32)aacData.length; userData.packetDesc.mStartOffset = 0; userData.packetDesc.mVariableFramesInPacket = 0; // Output size and Number of packets UInt32 pcmBufferSize = (UInt32)(2048 * _config.channelcount); // Output size and Number of packets UInt32 pcmBufferSize = (UInt32)(2048 * _config.channelcount); UInt32 pcmDataPacketSize = 1024; PCM uint8_t *pcmBuffer = malloc(pcmBufferSize); memset(pcmBuffer, 0, pcmBufferSize); AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)_config.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)pcmBufferSize; outAudioBufferList.mBuffers[0].mData = pcmBuffer; / / output describe AudioStreamPacketDescription outputPacketDesc = {0}; // Configure the fill function, To obtain the output data OSStatus status = AudioConverterFillComplexBuffer (_audioConverter, & AudioDecoderConverterComplexInputDataProc, &userData, &pcmDataPacketSize, &outAudioBufferList, &outputPacketDesc); if (status ! = noErr) { NSLog(@"Error: AAC Decoder error, status=%d",(int)status); return; } / / if access to data if (outAudioBufferList. MBuffers [0]. MDataByteSize > 0) {NSData * rawData = [NSData dataWithBytes:outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; dispatch_async(_callbackQueue, ^{ [_delegate audioDecodeCallback:rawData]; }); } free(pcmBuffer); }); }Copy the code

The decoding of the callback function is AudioDecoderConverterComplexInputDataProc.

4.5 Decoding Callback

static OSStatus AudioDecoderConverterComplexInputDataProc( AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData) { CCAudioUserData *audioDecoder = (CCAudioUserData *)(inUserData); if (audioDecoder->size <= 0) { ioNumberDataPackets = 0; return -1; } // outDataPacketDescription = &audioDecoder->packetDesc; (*outDataPacketDescription)[0].mStartOffset = 0; (*outDataPacketDescription)[0].mDataByteSize = audioDecoder->size; (*outDataPacketDescription)[0].mVariableFramesInPacket = 0; ioData->mBuffers[0].mData = audioDecoder->data; ioData->mBuffers[0].mDataByteSize = audioDecoder->size; ioData->mBuffers[0].mNumberChannels = audioDecoder->channelCount; return noErr; }Copy the code

As you’ll see in decoding, there’s no need to bridge the self object because we’re caching the data in our custom CCAudioUserData structure. This is also different from coding.

At this point, the decoding tool class is wrapped! 🍺 🍺 🍺 🍺 🍺 🍺

V. Audio PCM playback

Finally, a bit more 👉🏻 audio PCM playback. Some scenes may be played directly after capturing the audio without codec.

We can define a method in the coding utility class 👇🏻

- (NSData *)convertAudioSamepleBufferToPcmData: (CMSampleBufferRef sampleBuffer) {/ / obtain PCM data size size_t size = CMSampleBufferGetTotalSampleSize (sampleBuffer); // Allocate space int8_t *audio_data = (int8_t *)malloc(size); memset(audio_data, 0, size); / / get CMBlockBuffer, it saves the PCM data CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer (sampleBuffer); / / the data copy to our allocated space CMBlockBufferCopyDataBytes (blockBuffer, 0, the size, audio_data); NSData *data = [NSData dataWithBytes:audio_data length:size]; free(audio_data); return data; }Copy the code

Extract the sampleBuffer data out of the PCM data, then call back to the ViewController and play the PCM data directly.

The player class CCAudioPCMPlayer is required to play, which is like a decoder 👇🏻

@class CCAudioConfig; @interface CCAudioPCMPlayer : NSObject - (instancetype)initWithConfig:(CCAudioConfig *)config; /** playPCM */ - (void)playPCMData:(NSData *)data; /** set volume increment 0.0-1.0 */ - (void)setupVoice:(Float32)gain; /** dispose */ - (void)dispose; @endCopy the code