preface

If you remember, in the004- Video H264 coding details (part 1)In theViii. Realization of AVFoundation video Data Collection (3)In the following section, the callback method triggered by audio and video capture is πŸ‘‡πŸ» Video H264theThe principle of,codecAs well asRendering showsThat’s all that’s left for this articleAudio codecThe part.

First, audio principle knowledge point

First of all, let’s understand a few knowledge points related to audio.

1.1 sound

Sound is the sound wave generated by the vibration of an object. It is a wave phenomenon transmitted through a medium (air, solid or liquid) and can be perceived by human or animal hearing organs. We all studied physics in junior high school and know that sound is made up of three elements πŸ‘‡πŸ»

  1. Tone: of soundHigh and low(treble, bass), byFrequencyThe higher the frequency, the higher the toneHz (Hertz))
  2. Volume: namely the amplitude of sound vibration, the person feels the size of sound subjectively.
  3. Timbre: also known astamberThe waveform determines the timbre of the sound. Voice becauseMaterial objectsTimbre itself is a kind of abstract thing, butwaveformIs to take this abstract and intuitive representation.Different waveform, different timbre.Different timbre, through the waveform, can be completely distinguished.

Psychoacoustic model

As can be seen from the figure above, human hearing ranges from 20Hz to 20,000 hz. Anything below 20Hz is called infrasound, and anything above 20,000 hz is called ultrasonic. We can’t hear infrasound or ultrasonic, so when we encode and decode the audio stream, we can kill them.

1.2 Pulse Code Modulation (PCM)

So the question now is πŸ‘‰πŸ» how to convert the sounds of real life into digital signals? This is pulse code modulation (PCM), just to understand.

The process of converting sound into digital signals is shown below πŸ‘‡πŸ»

It can be roughly divided into three stages πŸ‘‡πŸ»

  1. sampling
  2. quantitative
  3. coding

Suppose an analog signal F (t) passes through a switch, then the output of the switch is related to the state of the switch,

  • When the switch is inclosed, the output of the switch is the input, that is, y(t) = f(t).
  • If the switch is inDisconnect position, the output y(t) is zero

It can be seen that if the switch is controlled by a narrow pulse train (sequence), the switch is closed when the pulse appears and the switch is disconnected when the pulse disappears. The output Y (t) is a pulse train (sequence) with changing amplitude, and the amplitude of each pulse is the instantaneous value of the input signal F (t) when the pulse appears. Therefore, Y (t) is the sampled signal or sample signal of f(t).

FIG. 3-2(a) shows a narrow pulse sequence P (t) with Ts as the time interval. Because it is used for sampling, it is called a sampling pulse.

Figure 3-2(b), V (t) is the analog voltage signal to be sampled, The values of k(t) of discrete signals after sampling are k(0) = 0.2, K (Ts) = 0.4, K (2Ts) = 1.8, K (3Ts) = 2.8, K (4Ts) = 3.6, k(5Ts) = 5.1, respectively. 6 k (ts) = 6.0, (ts) = 5.7 k, k (ts) = 3.9, (9 ts) = 2.0 k, k (ts) = 1.2; So it’s random between 0 and 6, which means there’s an infinite number of possible values.

FIG. 3-2(c), in order to change the infinite number of possible values into a finite number, we must quantize the values of k(t) (i.e., round to five) to get m(t). Values into the m (t), m (0) = 0.0, m = 0.0 (Ts), m (2 Ts) = 2.0, m (3 Ts) = 3.0, m (4 Ts) = 4.0, = 5.0 m (5 ts), m (ts) = 6.0, m (ts) = 6.0, m (ts) = 4.0, 2.0 m (9 ts) =, m (ts) = 1.0; There are only seven possible values from 0 to 6.

As shown in Figure 3-2(d), M (t) has become a digital signal, but it is not yet a binary digital signal in practical application. Therefore, the digital signal D (t) in Figure 3-2(d) can be obtained by natural encoding of M (t) with 3-bit binary encoding element. Thus A/D conversion is completed and pulse code modulation is realized. That is, the whole PROCESS of PCM.

1.3 Quantitative process of understanding

Quantization πŸ‘‰πŸ» maps an infinite set of values of a continuous function to a finite set of values of a discrete function.

  • Quantitative valuesπŸ‘‰πŸ» Quantified value
  • Quantitative levelπŸ‘‰πŸ» Number of quantized values
  • Quantitative intervalπŸ‘‰πŸ» the difference between two adjacent quantized values

In the figure above, the sample value signal K (t) of V (t) is different from the quantized signal M (t), for example, k(0) = 0.2 and M (0) = 0.

The receiver can only be the quantized signal M (t), but can not recover k(t). In this way, the error between the sending and receiving signals is called quantization error or quantization noise.

The quantization is “rounded”, so the maximum quantization noise is 0.5. In general, for quantization noise, the maximum absolute error is 0.5 quantization intervals. The quantization with the same interval is called uniform quantization.

1.4 Audio compression coding principle & standard

The quality of digital audio depends on two parameters: sampling frequency and quantization number. In order to ensure that the sampling points in the direction of time change as close as possible, the sampling frequency should be high; The amplitude value should be as fine as possible and the quantization bit rate should be high. The immediate result is pressure on storage capacity and transmission channel capacity requirements.

Transmission rate of audio signal = sampling frequency * number of quantized bits of sample * number of channels

Audio compression coding principle
  • Lossy coding lossy coding eliminates redundant data, because in the acquisition process, there are all kinds of frequencies of sound, which we can discard the human ear can not hear the part of the sound data, directly from the data source to kill! This can greatly reduce the amount of data stored.

  • Lossless coding Lossless coding, also known as Huffman coding, preserves all sound data (that is, sounds audible to the human ear) except for compressed sound that is inaudible to the human ear! And the compressed data can be completely restored! (Short code high frequency, long code low frequency)

  • Method of compression

    • remove-acquiredAudio redundancy information! includingBeyond the range of human hearingThe data,Is obscuredAudio signals, etc.
    • Shading effectπŸ‘‰πŸ» A weaker voice can be influenced by a stronger one!
    • Shading effectThe signal performance πŸ‘‰πŸ»Frequency domain to cover ε’Œ The time domain to cover.
  • Audio compression encoding format

    • MPEG-1
    • Dolby AC – 3
    • MPEG-2
    • MPEG-4
    • AAC
standard
  • Sampling frequency = 44.1kHz
  • The number of quantized bits of the sample value = 16
  • The number of signal channels in normal stereo = 2
  • Digital signal transmission bit stream is about 1.4M bit/s
  • The amount of data in one second is 1.4Mbit/(8/Byte) and 176.4 bytes (Byte), equal to the amount of data in 88,200 Chinese characters

Frequency domain masking and time domain masking

Now, let’s focus on frequency domain masking and time domain masking and what exactly does that mean?

2.1 Frequency domain masking

  1. The X-axis is the frequency domain. Here it starts at 20hz because you can’t hear it20 hzFrequency sound
  2. On the Y-axis is decibels.Below 40 decibels“Is inaudible to human ears
  3. Take a look atPurple columnThis part of the sound is audible, but the periphery is presentRed column, the high column means that the sound is particularly loud and the surroundingPurple poleallshelterthe
  4. whileGreen columnIs smaller than purple and red, proving that the sound comparisonmagnetic
  5. greenandredAfter the collision, just like boys and girls quarrel, certainly stem however, doomed to lose, πŸ˜‚
  6. However,Green columnFrom theRed columnA whiledistance“, indicating theirThere is a large difference in frequency domain valuesFrom a third party’s point of view, they both canhearthe

2.2 Time domain masking

  1. In the figure,simultaneousAt this time, there is oneThe high(top line), there’s onebass(bottom line) and vocalizing at the same time, thenThe treble will completely drown out the bass
  2. There is a special case πŸ‘‰πŸ»preAt the beginning, there is a small sound, and then suddenly (for a short time) there is a big sound, at which point the small sound will be obscured
  3. There is a voiceTravel timeYes, there isTime effectProbably,Cover about 50ms forward.Cover backward about 100ms

Audio AAC coding

Audio AAC coding and video are completely different, do not refer to the previous video coding process! Video coding uses the VideoToolBox, and audio using the AudioToolBox.

Again, encoding audio also encapsulates a utility class.

3.1 Encoding tool class header file & initialization

Audio parameter configuration class

First, initialization is handled in the same way as video encoding. We also define an audio parameter configuration class named CCAudioConfigπŸ‘‡πŸ»

/ / @interface CCAudioConfig: NSObject /** code rate */ @property (nonatomic, assign) NSInteger bitRate; //96000) /** channel */ @property (nonatomic, assign) NSInteger channelCount; // (1) /** sampleRate */ @property (nonatomic, assign) NSInteger sampleRate; //(default 44100) /** Sample point quantization */ @property (nonatomic, assign) NSInteger sampleSize; //(16) + (instancetype)defaultConifg; @endCopy the code

Implementation section πŸ‘‡πŸ»

@implementation CCAudioConfig

+ (instancetype)defaultConifg {
    return  [[CCAudioConfig alloc] init];
}
- (instancetype)init
{
    self = [super init];
    if (self) {
        self.bitrate = 96000;
        self.channelCount = 1;
        self.sampleSize = 16;
        self.sampleRate = 44100;
    }
    return self;
}
@end
Copy the code
Utility class callback
/**AAC encoder delegate */ @protocol CCAudioEncoderDelegate <NSObject> - (void)audioEncodeCallback (NSData *)aacData; @endCopy the code

So audio, unlike video, when we encode audio, we directly output binary NSData.

Utility class header file

The utility class is named CCAudioEncoder, and the header file should contain the initialization method and encoding method πŸ‘‡πŸ»

/** CCAudioEncoder: NSObject */ @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, weak) id<CCAudioEncoderDelegate> delegate; /** initialize the incoming encoder configuration */ - (instancetype)initWithConfig:(CCAudioConfig*)config; / * * * / coding - (void) encodeAudioSamepleBuffer: (CMSampleBufferRef) sampleBuffer; @endCopy the code
Initialization Process

As with video encoding, we also need to define two queues πŸ‘‰πŸ» encoding queue + callback queue to asynchronously process the audio encoding and callback result πŸ‘‡πŸ», respectively

@property (nonatomic, strong) dispatch_queue_t encoderQueue;
@property (nonatomic, strong) dispatch_queue_t callbackQueue;
Copy the code

Also required are the audio encoding related properties πŸ‘‡πŸ»

// For @property (nonatomic, unsafe_unretained) AudioConverterRef audioConverter; //PCM buffer @property (nonatomic) char *pcmBuffer; //PCM buffer size @property (nonatomic) size_t pcmBufferSize;Copy the code

In the initialization method, the processing for these attributes is πŸ‘‡πŸ»

- (instancetype)initWithConfig:(CCAudioConfig*)config { self = [super init]; If (self) {encoderQueue = dispatch_queue_create(" aAC hard encoder queue", DISPATCH_QUEUE_SERIAL); // callbackQueue = dispatch_queue_create(" aAC hard encoder callback queue", DISPATCH_QUEUE_SERIAL); // Audio converter _audioConverter = NULL; _pcmBufferSize = 0; _pcmBuffer = NULL; _config = config; if (config == nil) { _config = [[CCAudioConfig alloc] init]; } } return self; }Copy the code

3.2 Preparation before coding

Before encoding the audio, we need to configure the parameters of the audio encoding, which involves the functions of the audio related structures, creating converters and configuring properties.

3.2.1 Structure of audio parameters

Audio parameter structure is AudioStreamBasicDescription πŸ‘‡ 🏻

This structure provides a description of the audio file.

The generation of audio files is analog signal -> digital signal after PCM -> compressed, encoded audio files.

  • The sampling frequency at PCM is calledsample rate.
  • Each sample can yield several samples of data, corresponding to multiplechannel.
  • A number of samples obtained from each sampling point are combined and called oneframe.
  • A number of frameWhen you combine them, you call them onepacket.

Members interpret πŸ‘‡πŸ»

  • mSampleRate, is to adopt the frequency
  • mBitsPerChannelIs the number of bits per sample data
  • mChannelsPerFrame, can be understood as the number of sound channels, that is, the number of sampling data generated at a sampling time
  • mFramesPerPacketIs the number of frames in each packet, and is equal to the number of sampling intervals experienced in this packet
  • mBytesPerPacket, the number of bytes of data in each packet
  • mBytesPerFrame, the number of bytes of data in each frame

3.2.2 Creating converter

Correlation function is AudioConverterNewSpecific πŸ‘‡ 🏻

Parameter Description πŸ‘‡πŸ»

  • Parameter 1: Enter the audio format description
  • Parameter 2: Output audio format description
  • Parameter 3: Number of class desc
  • Parameter 4: class desc
  • Parameter 5: The converter created

3.2.3 Setting Converter Properties

The correlation function isAudioConverterSetProperty Parameter Description πŸ‘‡πŸ»

  • Parameter 1: converter
  • Parameter 2: the key of the property, refer to enumerationAudioConverterPropertyID
  • Parameter 3:The value attribute valuesOf the data typeThe size size
  • Four parameters:The value attribute valuestheAddress value

3.2.4 Description of the encoder type

The structure described by the encoder type is AudioClassDescriptionπŸ‘‡πŸ»

Members interpret πŸ‘‡πŸ»

  • mTypeEncoding the format of the output, for exampleAAC
  • mSubTypeEncoding the subformat of the output
  • mManufacturerWay of codingSoft codingorHard coded

3.3 coding

Then we came to the key coding process, our foreign coding method is – (void) encodeAudioSamepleBuffer: (CMSampleBufferRef) sampleBuffer; , is called in the collection callback method of the collection class CCSystemCapture πŸ‘‡πŸ»

// Capture audio and video callback - (void)captureSampleBuffer:(CMSampleBufferRef)sampleBuffer Type: (CCSystemCaptureType)type {if (CCSystemCaptureTypeAudio) {// Audio data //1. Play PCM data directly NSData * pcmData = [self convertAudioSamepleBufferToPcmData: sampleBuffer]; [_pcmPlayer playPCMData:pcmData]; / / 2. AAC coding [_audioEncoder encodeAudioSamepleBuffer: sampleBuffer]; }else { [_videoEncoder encodeVideoSampleBuffer:sampleBuffer]; }}Copy the code

There are two processing methods for the collected audio πŸ‘‡πŸ»

  1. Play PCM data directly
  2. AAC encoding

Directly play PCM data to speak again, we first look at AAC coding πŸ‘‡πŸ»

- (void)encodeAudioSamepleBuffer:(CMSampleBufferRef)sampleBuffer { CFRetain(sampleBuffer); // Check whether the audio converter was created successfully. If the vm is not created successfully. Then configure the audio encoding parameters and create a transcoder if (! _audioConverter) { [self setupEncoderWithSampleBuffer:sampleBuffer]; } // dispatch_async(_encoderQueue, ^{});Copy the code

First, take a look at the process of configuring the audio encoding parameters and creating a transcoder πŸ‘‡πŸ»

- (void)setupEncoderWithSampleBuffer: (CMSampleBufferRef sampleBuffer) {/ / obtain input parameters AudioStreamBasicDescription inputAduioDes = *CMAudioFormatDescriptionGetStreamBasicDescription( CMSampleBufferGetFormatDescription(sampleBuffer)); / / set the output parameters AudioStreamBasicDescription outputAudioDes = {0}; outputAudioDes.mSampleRate = (Float64)_config.sampleRate; // Outputauds35. mFormatID = kAudioFormatMPEG4AAC; / / output format outputAudioDes. MFormatFlags = kMPEG4Object_AAC_LC; / / if set to 0 for lossless coding outputAudioDes. MBytesPerPacket = 0; / / identify each packet size outputAudioDes ourselves. MFramesPerPacket = 1024; // The number of frames per packet is aAC-1024; outputAudioDes.mBytesPerFrame = 0; / / each frame size outputAudioDes mChannelsPerFrame = (uint32_t) _config. ChannelCount; / / output channel number outputAudioDes. MBitsPerChannel = 0; // The number of bits sampled for each channel in the data frame. outputAudioDes.mReserved = 0; // Fill output information UInt32 outDesSize = sizeof(outputAudioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &outDesSize, &outputAudioDes); AudioClassDescription *audioClassDesc = [self getAudioCalssDescriptionWithType:outputAudioDes.mFormatID fromManufacture:kAppleSoftwareAudioCodecManufacturer]; / / create the converter OSStatus status = AudioConverterNewSpecific (& inputAduioDes, & outputAudioDes, 1, audioClassDesc, &_audioConverter); if (status ! = noErr) {NSLog(@"Error! : hardcoded AAC creation failed, status= %d", (int)status); return; } /* kAudioConverterQuality_Max = 0x7F, kAudioConverterQuality_High = 0x60, kAudioConverterQuality_Medium = 0x40, kAudioConverterQuality_Low = 0x20, kAudioConverterQuality_Min = 0 */ UInt32 temp = kAudioConverterQuality_High; / / codec rendering quality AudioConverterSetProperty (_audioConverter kAudioConverterCodecQuality, sizeof (temp), & temp); // Set the bitrate to uint32_t audioBitrate = (uint32_t)self.config.bitrate; uint32_t audioBitrateSize = sizeof(audioBitrate); status = AudioConverterSetProperty(_audioConverter, kAudioConverterEncodeBitRate, audioBitrateSize, &audioBitrate); if (status ! = noErr) {NSLog(@"Error! : hard coded AAC setting bit rate failed "); }}Copy the code

Then we come to the encoded asynchronous queue, the process that needs to be processed πŸ‘‡πŸ»

  • Get PCM data stored in BlockBuffer
CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); CFRetain(blockBuffer); / / get BlockBuffer audio data and audio data in address OSStatus status = CMBlockBufferGetDataPointer (BlockBuffer, 0, NULL, & _pcmBufferSize, &_pcmBuffer); NSError *error = nil; if (status ! = kCMBlockBufferNoErr) { error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Error: ACC encode get data point error: %@",error); return; }Copy the code
  • Output buffer, wrapped intoAudioBufferListIn the

First take a look atAudioBufferListπŸ‘‡ 🏻

Obviously, we want to store the blockBuffer obtained above into AudioBuffer mBuffers[1], a member of the AudioBufferList, and do so πŸ‘‡πŸ»

Uint8_t *pcmBuffer = malloc(_pcmBufferSize); // set _pcmBufferSize to pcmBuffer. Memset (pcmBuffer, 0, _pcmBufferSize); AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)_config.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)_pcmBufferSize; outAudioBufferList.mBuffers[0].mData = pcmBuffer;Copy the code
  • Configure the fill function to get the output data

thisPopulate the function, it isWhen coding is doneThe callback function is passedAudioConverterFillComplexBufferTo configure it. Let’s seeAudioConverterFillComplexBufferπŸ‘‡ 🏻

Convert the data provided by the input callback function, with the parameters interpreted as πŸ‘‡πŸ»

  1. Parameter 1:inAudioConverterAudio frequency converter
  2. Argument 2:inInputDataProcCallback function. A callback function that provides the audio data to be converted. This callback is repeatedly called when the converter is ready to accept new input data.
  3. Parameter 3:inInputDataProcUserData 即self
  4. Four parameters:ioOutputDataPacketSize, the output buffer size
  5. Parameter 5:outOutputData, the audio data to be converted
  6. Parameter 6:outPacketDescriptionTo output packet information

The code is πŸ‘‡πŸ»

// The output packet size is 1 UInt32 outputDataPacketSize = 1. / / convert the input data provided by the callback function status = AudioConverterFillComplexBuffer (_audioConverter aacEncodeInputDataProc, (__bridge void * _Nullable)(self), &outputDataPacketSize, &outAudioBufferList, NULL); If (status == noErr) {// Get data NSData *rawAAC = [NSData dataWithBytes: outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; // Free pcmBuffer (pcmBuffer); / / add ADTS head, want to get naked, please ignore add ADTS head, write to a file, you must add / / NSData * adtsHeader = [self adtsDataForPacketLength: rawAAC, length]. // NSMutableData *fullData = [NSMutableData dataWithCapacity:adtsHeader.length + rawAAC.length];; // [fullData appendData:adtsHeader]; // [fullData appendData:rawAAC]; / / the data is passed to the callback queue dispatch_async (_callbackQueue, ^ {[_delegate audioEncodeCallback: rawAAC]; }); } else { error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; } // Release CFRelease(blockBuffer); CFRelease(sampleBuffer); If (error) {NSLog(@"error: AAC encoding failed %@",error); }Copy the code

The encoded callback function is aacEncodeInputDataProc. For details, see 3.4 Encoding Callback.

Moving on, we notice that there are two ways to handle a successful configuration function πŸ‘‡πŸ»

  1. writeDisk file, the premise is to addADTSheader
  2. The data is passed to the callback queue for processing by the callerdecodingThe process of
AAC audio format

Speaking of THE ADTS head, we must first explain the AAC audio format, divided into two πŸ‘‡πŸ»

  1. ADIF: Audio Data Interchange Format Audio Data Interchange Format. The feature of this format is that it is possible to find the beginning of the audio data definitively, without the decoding that begins in the middle of the audio data stream, i.e. it must be decoded at a clearly defined beginning. Therefore, this format is often used in disk files.

  2. ADTS: Audio Data Transport Stream. The characteristic of this format is that it is a bit stream with synchronization words, and decoding can start anywhere in this stream. Its features are similar to the MP3 data stream format.

Simply put, ADTS can be decoded at any frame, which means that it has headers for each frame. ADIF has only one unified header, so it must be decoded after getting all the data. And the formats of the two headers are also different. At present, the encoded and extracted audio streams are all in ADTS format πŸ‘‡πŸ»

Add ADTS head

Here is the process of adding ADTS header πŸ‘‡πŸ» (just to understand)

/** * Add ADTS header at the beginning of each and every AAC packet. * This is needed as MediaCodec encoder generates a Packet of raw * AAC data. * * AAC ADtS header * Note the packetLen must count in the ADtS header itself. * See: http://wiki.multimedia.cx/index.php? title=ADTS * Also: http://wiki.multimedia.cx/index.php? title=MPEG-4_Audio#Channel_Configurations **/ - (NSData*)adtsDataForPacketLength:(NSUInteger)packetLength { int adtsLength = 7; char *packet = malloc(sizeof(char) * adtsLength); // Variables Recycled by addADTStoPacket int profile = 2; //AAC LC //39=MediaCodecInfo.CodecProfileLevel.AACObjectELD; int freqIdx = 4; Int chanCfg = 1; int chanCfg = 1; int chanCfg = 1; int chanCfg = 1; //MPEG-4 Audio Channel Configuration. 1 Channel front-center NSUInteger fullLength = adtsLength + packetLength; // fill in ADTS data packet[0] = (char)0xFF; // 11111111 = syncword packet[1] = (char)0xF9; // 1111 1 00 1 = syncword MPEG-2 Layer CRC packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2)); packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11)); packet[4] = (char)((fullLength&0x7FF) >> 3); packet[5] = (char)(((fullLength&7)<<5) + 0x1F); packet[6] = (char)0xFC; NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES]; return data; }Copy the code

3.4 Encoding callback

Finally, let’s look at the process of coding the callback function aacEncodeInputDataProc πŸ‘‡πŸ»

static OSStatus aacEncodeInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, Void *inUserData) {// Get self CCAudioEncoder *aacEncoder = (__bridge CCAudioEncoder *)(inUserData); PcmBuffsize if (! aacEncoder.pcmBufferSize) { *ioNumberDataPackets = 0; return - 1; } // Make ioData->mBuffers[0]. MData = aacEncoder. ioData->mBuffers[0].mDataByteSize = (uint32_t)aacEncoder.pcmBufferSize; ioData->mBuffers[0].mNumberChannels = (uint32_t)aacEncoder.config.channelCount; AacEncoder. PcmBufferSize = 0; *ioNumberDataPackets = 1; return noErr; }Copy the code

Basically, the decoded data (cached in the CCAudioEncoder instance) is populated into the AudioBufferList.

3.5 summary

As shown in the figure above, the coding totals three steps:

  1. Configure encoder, start preparing coding;
  2. PCM data are collected and transmitted to the encoder;
  3. Code completionThe callback callbackOr,Written to the file.

4. Audio AAC decoding

Analysis play coding process, the next natural is decoding πŸ‘‰πŸ» we also package a tool class CCAudioDecoder.

4.1 Decoding tool class header file

Decoding is the same as coding. Go straight to the code.

#import <Foundation/Foundation.h> #import <AVFoundation/AVFoundation.h> @class CCAudioConfig; / / @protocol CCAudioDecoderDelegate <NSObject> - (void)audioDecodeCallback:(NSData *)pcmData; @end @interface CCAudioDecoder : NSObject @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, weak) id<CCAudioDecoderDelegate> delegate; // initialize the incoming decoding configuration - (instancetype)initWithConfig:(CCAudioConfig *)config; /** decodeAudioAACData: (NSData *)aacData; @endCopy the code

Extension classes in. M πŸ‘‡πŸ»

@interface CCAudioDecoder() @property (nonatomic, strong) dispatch_queue_t decoderQueue; @property (nonatomic, strong) dispatch_queue_t callbackQueue; // for the audioConverter object @property (nonatomic) AudioConverterRef audioConverter; @property (nonatomic) char *aacBuffer; //AAC cache size @PROPERTY (NONATOMIC) UInt32 aacBufferSize; / / the description of the audio stream packets @ property (nonatomic) AudioStreamPacketDescription * packetDesc; @endCopy the code

4.2 the initialization

Next look at initialization πŸ‘‡πŸ»

- (instancetype)initWithConfig:(CCAudioConfig *)config {
    self = [super init];
    if (self) {
        _decoderQueue = dispatch_queue_create("aac hard decoder queue", DISPATCH_QUEUE_SERIAL);
        _callbackQueue = dispatch_queue_create("aac hard decoder callback queue", DISPATCH_QUEUE_SERIAL);
        _audioConverter = NULL;
        _aacBufferSize = 0;
        _aacBuffer = NULL;
        _config = config;
        if (_config == nil) {
            _config = [[CCAudioConfig alloc] init];
        }
        AudioStreamPacketDescription desc = {0};
        _packetDesc = &desc;
        [self setupEncoder];
    }
    return self;
}
Copy the code

And then the setupEncoder πŸ‘‡ 🏻

- (void) setupEncoder {/ / output parameters PCM AudioStreamBasicDescription outputAudioDes = {0}; outputAudioDes.mSampleRate = (Float64)_config.sampleRate; / / sampling rate outputAudioDes mChannelsPerFrame = (UInt32) _config. ChannelCount; // Outputauds35. mFormatID = kAudioFormatLinearPCM; / / output format outputAudioDes. MFormatFlags = (kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked); / / code 12 outputAudioDes. MFramesPerPacket = 1; // Number of frames per packet; outputAudioDes.mBitsPerChannel = 16; // The number of bits sampled for each channel in the data frame. outputAudioDes.mBytesPerFrame = outputAudioDes.mBitsPerChannel / 8 *outputAudioDes.mChannelsPerFrame; / / each frame size (sampling) the digits / 8 * the number of channels outputAudioDes. MBytesPerPacket = outputAudioDes. MBytesPerFrame * outputAudioDes mFramesPerPacket;  // Each packet size (frame size * frame number) outputauds35s.mreserved = 0; / / the way 0 (8 byte alignment) / / input parameters aac AudioStreamBasicDescription inputAduioDes = {0}; inputAduioDes.mSampleRate = (Float64)_config.sampleRate; inputAduioDes.mFormatID = kAudioFormatMPEG4AAC; inputAduioDes.mFormatFlags = kMPEG4Object_AAC_LC; inputAduioDes.mFramesPerPacket = 1024; inputAduioDes.mChannelsPerFrame = (UInt32)_config.channelCount; // Padding output information UInt32 inDesSize = sizeof(inputAduioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &inDesSize, &inputAduioDes); AudioClassDescription *audioClassDesc = [self getAudioCalssDescriptionWithType:outputAudioDes.mFormatID fromManufacture:kAppleSoftwareAudioCodecManufacturer]; / / create the converter OSStatus status = AudioConverterNewSpecific (& inputAduioDes, & outputAudioDes, 1, audioClassDesc, &_audioConverter); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC creation failed, status= %d", (int)status); return; }}Copy the code

And coding – (void) setupEncoderWithSampleBuffer: (CMSampleBufferRef) sampleBuffer process is basically the same. Difference is that create decoder method getAudioCalssDescriptionWithType: fromManufacture: πŸ‘‡ 🏻

- (AudioClassDescription *)getAudioCalssDescriptionWithType:(AudioFormatID)type fromManufacture:(uint32_t)manufacture { static AudioClassDescription desc; UInt32 decoderSpecific = type; // Obtain the total size of AAC decoder UInt32 SIZE; OSStatus status = AudioFormatGetPropertyInfo(kAudioFormatProperty_Decoders, sizeof(decoderSpecific), &decoderSpecific, &size); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC get info failed, status= %d", (int)status); return nil; } // Count the number of decoders unsigned int count = size/sizeof(AudioClassDescription); // Create an array containing count decoders AudioClassDescription description[count]; Status = AudioFormatGetProperty(kAudioFormatProperty_Encoders, sizeof(decoderSpecific), &decoderSpecific, &size, &description); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC get propery failed, status= %d", (int)status); return nil; } for (unsigned int i = 0; i < count; i++) { if (type == description[i].mSubType && manufacture == description[i].mManufacturer) { desc = description[i]; return &desc; }}Copy the code

There are several differences πŸ‘‡πŸ»

  1. Different output parameters: encoding isAACDecoding isPCM
  2. The converter is different. The code iskAudioFormatProperty_EncodersDecoding iskAudioFormatProperty_Decoders

4.3 Preparation before decoding

We have encapsulated a structure CCAudioUserData, which records aAC information as an argument to the decoded callback function πŸ‘‡πŸ»

typedef struct {
Β  Β  char * data;
Β  Β  UInt32 size;
Β  Β  UInt32 channelCount;
Β  Β  AudioStreamPacketDescription packetDesc;
} CCAudioUserData;
Copy the code

4.4 the decoding

Then comes the decoding process πŸ‘‡πŸ»

- (void)decodeAudioAACData:(NSData *)aacData { if (! _audioConverter) { return; } dispatch_async(_decoderQueue, ^{//CCAudioUserData records aAC information as an argument to the decoder callback function CCAudioUserData userData = {0}; userData.channelCount = (UInt32)_config.channelCount; userData.data = (char *)[aacData bytes]; userData.size = (UInt32)aacData.length; userData.packetDesc.mDataByteSize = (UInt32)aacData.length; userData.packetDesc.mStartOffset = 0; userData.packetDesc.mVariableFramesInPacket = 0; // Output size and Number of packets UInt32 pcmBufferSize = (UInt32)(2048 * _config.channelcount); // Output size and Number of packets UInt32 pcmBufferSize = (UInt32)(2048 * _config.channelcount); UInt32 pcmDataPacketSize = 1024; PCM uint8_t *pcmBuffer = malloc(pcmBufferSize); memset(pcmBuffer, 0, pcmBufferSize); AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)_config.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)pcmBufferSize; outAudioBufferList.mBuffers[0].mData = pcmBuffer; / / output describe AudioStreamPacketDescription outputPacketDesc = {0}; // Configure the fill function, To obtain the output data OSStatus status = AudioConverterFillComplexBuffer (_audioConverter, & AudioDecoderConverterComplexInputDataProc, &userData, &pcmDataPacketSize, &outAudioBufferList, &outputPacketDesc); if (status ! = noErr) { NSLog(@"Error: AAC Decoder error, status=%d",(int)status); return; } / / if access to data if (outAudioBufferList. MBuffers [0]. MDataByteSize > 0) {NSData * rawData = [NSData dataWithBytes:outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; dispatch_async(_callbackQueue, ^{ [_delegate audioDecodeCallback:rawData]; }); } free(pcmBuffer); }); }Copy the code

The decoding of the callback function is AudioDecoderConverterComplexInputDataProc.

4.5 Decoding Callback

static OSStatus AudioDecoderConverterComplexInputDataProc( AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData) { CCAudioUserData *audioDecoder = (CCAudioUserData *)(inUserData); if (audioDecoder->size <= 0) { ioNumberDataPackets = 0; return -1; } // outDataPacketDescription = &audioDecoder->packetDesc; (*outDataPacketDescription)[0].mStartOffset = 0; (*outDataPacketDescription)[0].mDataByteSize = audioDecoder->size; (*outDataPacketDescription)[0].mVariableFramesInPacket = 0; ioData->mBuffers[0].mData = audioDecoder->data; ioData->mBuffers[0].mDataByteSize = audioDecoder->size; ioData->mBuffers[0].mNumberChannels = audioDecoder->channelCount; return noErr; }Copy the code

As you’ll see in decoding, there’s no need to bridge the self object because we’re caching the data in our custom CCAudioUserData structure. This is also different from coding.

At this point, the decoding tool class is wrapped! 🍺 🍺 🍺 🍺 🍺 🍺

V. Audio PCM playback

Finally, a bit more πŸ‘‰πŸ» audio PCM playback. Some scenes may be played directly after capturing the audio without codec.

We can define a method in the coding utility class πŸ‘‡πŸ»

- (NSData *)convertAudioSamepleBufferToPcmData: (CMSampleBufferRef sampleBuffer) {/ / obtain PCM data size size_t size = CMSampleBufferGetTotalSampleSize (sampleBuffer); // Allocate space int8_t *audio_data = (int8_t *)malloc(size); memset(audio_data, 0, size); / / get CMBlockBuffer, it saves the PCM data CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer (sampleBuffer); / / the data copy to our allocated space CMBlockBufferCopyDataBytes (blockBuffer, 0, the size, audio_data); NSData *data = [NSData dataWithBytes:audio_data length:size]; free(audio_data); return data; }Copy the code

Extract the sampleBuffer data out of the PCM data, then call back to the ViewController and play the PCM data directly.

The player class CCAudioPCMPlayer is required to play, which is like a decoder πŸ‘‡πŸ»

@class CCAudioConfig; @interface CCAudioPCMPlayer : NSObject - (instancetype)initWithConfig:(CCAudioConfig *)config; /** playPCM */ - (void)playPCMData:(NSData *)data; /** set volume increment 0.0-1.0 */ - (void)setupVoice:(Float32)gain; /** dispose */ - (void)dispose; @endCopy the code

Implementation section πŸ‘‡πŸ»

#import "CCAudioPCMPlayer.h" #import <AudioToolbox/AudioToolbox.h> #import <AVFoundation/AVFoundation.h> #import "Ccavconfig. h" #import" ccaudiodataqueue. h" #define MIN_SIZE_PER_FRAME 2048 // Minimum data length per frame static const int kNumberBuffers_play = 3; // 1 typedef struct AQPlayerState { AudioStreamBasicDescription mDataFormat; // 2 AudioQueueRef mQueue; // 3 AudioQueueBufferRef mBuffers[kNumberBuffers_play]; // 4 AudioStreamPacketDescription *mPacketDescs; // 9 }AQPlayerState; @interface CCAudioPCMPlayer () @property (nonatomic, assign) AQPlayerState aqps; @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, assign) BOOL isPlaying; @end @implementation CCAudioPCMPlayer static void TMAudioQueueOutputCallback(void * inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer) { AudioQueueFreeBuffer(inAQ, inBuffer); } - (instancetype)initWithConfig:(CCAudioConfig *)config { self = [super init]; if (self) { _config = config; / / configuration AudioStreamBasicDescription dataFormat = {0}; dataFormat.mSampleRate = (Float64)_config.sampleRate; / / sampling rate dataFormat mChannelsPerFrame = (UInt32) _config. ChannelCount; Dataformat. mFormatID = kAudioFormatLinearPCM; dataformat. mFormatID = kAudioFormatLinearPCM; / / output format dataFormat. MFormatFlags = (kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked); / / code 12 dataFormat. MFramesPerPacket = 1; // Number of frames per packet; dataFormat.mBitsPerChannel = 16; // The number of bits sampled for each channel in the data frame. dataFormat.mBytesPerFrame = dataFormat.mBitsPerChannel / 8 *dataFormat.mChannelsPerFrame; / / each frame size (sampling) the digits / 8 * the number of channels dataFormat. MBytesPerPacket = dataFormat. MBytesPerFrame * dataFormat mFramesPerPacket; // Each packet size (frame size * frame number) dataformat. mReserved = 0; AQPlayerState state = {0}; state.mDataFormat = dataFormat; _aqps = state; [self setupSession]; / / create play queue OSStatus status = AudioQueueNewOutput (& _aqps mDataFormat, TMAudioQueueOutputCallback, NULL, NULL, NULL, 0, &_aqps.mQueue); if (status ! = noErr) { NSError *error = [[NSError alloc] initWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Error: AudioQueue create error = %@", [error description]); return self; } [self setupVoice:1]; _isPlaying = false; } return self; } - (void)setupSession { NSError *error = nil; // Set the session to active or inactive. Please note that activating an audio session is a synchronous (blocking) operation [[AVAudioSession sharedInstance] setActive:YES error:&error]; if (error) { NSLog(@"Error: audioQueue palyer AVAudioSession error, error: %@", error); } / / set the session category [[AVAudioSession sharedInstance] setCategory: AVAudioSessionCategoryPlayAndRecord error: & error]; if (error) { NSLog(@"Error: audioQueue palyer AVAudioSession error, error: %@", error); }} - (void)playPCMData:(NSData *)data {// point to audio queue buffer AudioQueueBufferRef inBuffer; /* Asks the audio queue object to allocate an audio queue buffer. Parameter 1: audio queue parameter to allocate buffer 2: capacity required for new buffer (in bytes) Parameter 3: output, Pointer to the newly allocated audio queue buffer */ AudioQueueAllocateBuffer(_AQps.mqueue, MIN_SIZE_PER_FRAME, &inBuffer); Memcpy (inBuffer->mAudioData, data.bytes, data.length); / / set inBuffer. MAudioDataByteSize inBuffer - > mAudioDataByteSize = (UInt32) data. Length; // Adds the buffer to the buffer queue for recording or playing audio. /* Parameter 1: the audio queue with the audio queue buffer parameter 2: the audio queue buffer to add to the buffer queue. Parameter 3: The number of audio packets in the inBuffer parameter. For any of the following cases, use the value 0: * when playing constant bitrate (CBR) format. * When an audio queue is a recording (input) audio queue. * when using audioqueueallocateBufferWithPacketDescriptions functions assigned to the end of the line in the buffer. In this case, the callback should describe the packets of the buffer in the mpackedDescriptions and mpackedDescriptionCount fields of the buffer. Parameter 4: Description of a set of packets. For any of the following cases, use null value * when playing constant bitrate (CBR) format. * When an audio queue is an input (recorded) audio queue. * when using audioqueueallocateBufferWithPacketDescriptions functions assigned to the end of the line in the buffer. In this case, The callback should describe the packets */ OSStatus status = in the buffer's mpackedDescriptions and mpackedDescriptionCount fields AudioQueueEnqueueBuffer(_aqps.mQueue, inBuffer, 0, NULL); if (status ! = noErr) { NSLog(@"Error: audio queue palyer enqueue error: %d",(int)status); } // Start playing or recording audio /* Parameter 1: the audio queue to start parameter 2: the time the audio queue should start. To specify the start time relative to the timeline of the associated audio device, use the MsampleTime field of the audioTimestamp structure. Use NULL to indicate that the audio queue should start as soon as possible */ AudioQueueStart(_aqps.mqueue, NULL); } // This function is not required, //- (void)pause {// AudioQueuePause(_aqps.mqueue); // set the volume increment // 0.0-1.0 - (void)setupVoice:(Float32)gain {Float32 gain0 = gain; if (gain < 0) { gain0 = 0; }else if (gain > 1) { gain0 = 1; } // Set audio queue parameter values /* Parameter 1: audio queue parameter to start 2: property parameter 3:value */ AudioQueueSetParameter(_AQps.mqueue, kAudioQueueParam_Volume, gain0); Dispose {AudioQueueStop(_aqps.mqueue, true); dispose {AudioQueueStop(_aqps.mqueue, true); AudioQueueDispose(_aqps.mQueue, true); } @endCopy the code

There are detailed comments in the code, which will not be repeated here.

conclusion

  • Audio principle

    • Three elements of sound:Pitch, volume and timbre
    • Range of human hearing:20hz-20kHzIs lower than 20hzsound, higher than 20kHz isultrasonic
    • Pulse code modulation (PCM)
      • There are three stages:Sampling, quantification and coding
    • Audio compression coding principle
      • Transmission rate of audio signal = sampling frequency * number of quantized bits of sample * number of channels
      • Lossy coding: Eliminates redundant data (inaudible to the human ear)
      • Lossless coding:Haverman codingCompress parts of the human ear that cannot be heard and keep the rest as is
      • Compression method
        • remove-acquiredAudio redundancy information
        • Masking effect: One sound overwrites another
      • Audio compression encoding format: MPEG-1, Dolby AC-3, MPEG-2, MPEG-4 and AAC
    • Standard parameters for audio
      • Sampling frequency = 44.1kHz
      • The number of quantized bits of the sample value = 16
      • The number of signal channels in normal stereo = 2
      • Digital signal transmission bit stream is about 1.4M bit/s
      • The amount of data in one second is 1.4Mbit/(8/Byte) and 176.4 bytes (Byte), equal to the amount of data in 88,200 Chinese characters
  • Frequency domain masking and time domain masking

    • Frequency domain to cover
      • A weaker sound is overwritten by a stronger sound from nearby
      • The sound of different frequency domains and different decibels has little influence on each other
    • The time domain to cover
      • At the same time, the treble will completely block out the bass
      • Special shielding: start small volume sound, a short time (50ms) within another large volume sound, the latter will cover the former
      • Sound transmission time:Cover about 50ms forward.Cover backward about 100ms
  • Audio AAC coding tool class

    • Two queues:Coding queueΒ +Β The callback queue
    • Preparation before coding
      • Audio parameter structureAudioStreamBasicDescription
      • Create Converter: functionAudioConverterNewSpecific
      • Set converter properties: functionAudioConverterSetProperty
      • The encoder type describes the structureAudioClassDescription
    • AAC coding process
      1. Check whether the audio converter is created successfully.
        1. Success πŸ‘‰πŸ» direct return
        2. Failed πŸ‘‰πŸ» configure audio encoding parameters and create transcoder
          1. Get input parametersAudioStreamBasicDescription
          2. Set the output parameters (AAC format)AudioStreamBasicDescription)
          3. AudioFormatGetPropertyFill output parameter
          4. Obtain the description of the encoderAudioClassDescription
          5. AudioConverterNewSpecificCreating an Audio converter
          6. AudioConverterSetPropertyConfigure the converter properties:Codec qualityandBit rateEtc.
      2. Audio encoding is processed in asynchronous queues
        1. To obtainBlockBufferPCM data stored in
        2. Package buffer (PCM data) intoAudioBufferListIn the
        3. AudioConverterFillComplexBufferConverts the data provided by the input callback function
        4. After the callback configuration succeeds, the output data in AAC format is converted to NSData
        5. At this point, the processing can be divided into two scenarios 5.1. WriteDisk file, the premise is to addADTSHeader 5.2. Pass the data to the callback queue for the caller to processdecodingThe process of
        6. Release blockBuffer and sampleBuffer
    • AAC audio format
      • ADIF: Audio Data Interchange Format Audio Data Interchange Format. Commonly used inDisk fileIn the.
      • ADTS: Audio Data Transport Stream Used forNetwork transmissionScenario.
        • Code implementation adds ADTS header
    • AAC encoding callback: willDecoded data(cachedCCAudioEncoderIn instance) toAudioBufferListIn the.
  • Audio AAC decoding tool class

    • Two queues:Decoding the queueΒ +Β The callback queue
    • The audio converter is initialized
      • Output PCM configurationAudioStreamBasicDescription
      • Input parameter AAC configurationAudioStreamBasicDescription
      • AudioFormatGetPropertyFill in the output informationkAudioFormatProperty_FormatInfo
      • Gets the description of the decoderAudioClassDescription
        • Only the incomingkAppleSoftwareAudioCodecManufacturer
      • AudioConverterNewSpecificTo create the converter
    • Preparation before decoding:
      • Custom structCCAudioUserData, including
        • Data address
        • Data size
        • Number of data channels (Number of sound channels)
        • AudioStreamPacketDescriptionDescription of the packet in the data buffer
      • Used forRecord aAC informationAnd asDecode the callback functionThe parameters of the
    • AAC decoding process
      1. Audio converter not created, return directly
      2. Decode queue for asynchronous processing
        1. configurationCCAudioUserData, the cache records the aAC information
        2. Set output sizepcmBufferSizeAnd the number of packetpcmDataPacketSize
        3. mallocCreate a temporary container PCM,memsetInitialize the space
        4. The output bufferAudioBufferListThe configuration of the
        5. Output descriptionAudioStreamPacketDescriptionStructure initialization
        6. AudioConverterFillComplexBufferConfigure the fill function to get the output data
        7. Convert the output data toNSDataFormat, callback queue asynchronous delegate out
    • Decoded callback: will cacheCCAudioUserDataTo fill in the parameters of the callback functionAudioStreamPacketDescriptionandAudioBufferListAmong the
  • Audio PCM playback: refer to the specific code.