preface
If you remember, in the004- Video H264 coding details (part 1)In theViii. Realization of AVFoundation video Data Collection (3)
In the following section, the callback method triggered by audio and video capture is ππ» Video H264
theThe principle of
,codec
As well asRendering shows
That’s all that’s left for this articleAudio codec
The part.
First, audio principle knowledge point
First of all, let’s understand a few knowledge points related to audio.
1.1 sound
Sound is the sound wave generated by the vibration of an object. It is a wave phenomenon transmitted through a medium (air, solid or liquid) and can be perceived by human or animal hearing organs. We all studied physics in junior high school and know that sound is made up of three elements ππ»
- Tone: of sound
High and low
(treble, bass), byFrequency
The higher the frequency, the higher the toneHz (Hertz)
) - Volume: namely the amplitude of sound vibration, the person feels the size of sound subjectively.
- Timbre: also known as
tamber
The waveform determines the timbre of the sound. Voice becauseMaterial objects
Timbre itself is a kind of abstract thing, butwaveform
Is to take this abstract and intuitive representation.Different waveform, different timbre.
Different timbre, through the waveform, can be completely distinguished.
Psychoacoustic model
As can be seen from the figure above, human hearing ranges from 20Hz to 20,000 hz. Anything below 20Hz is called infrasound, and anything above 20,000 hz is called ultrasonic. We can’t hear infrasound or ultrasonic, so when we encode and decode the audio stream, we can kill them.
1.2 Pulse Code Modulation (PCM)
So the question now is ππ» how to convert the sounds of real life into digital signals? This is pulse code modulation (PCM), just to understand.
The process of converting sound into digital signals is shown below ππ»
It can be roughly divided into three stages ππ»
- sampling
- quantitative
- coding
Suppose an analog signal F (t) passes through a switch, then the output of the switch is related to the state of the switch,
- When the switch is in
closed
, the output of the switch is the input, that is, y(t) = f(t). - If the switch is in
Disconnect position
, the output y(t) is zero
It can be seen that if the switch is controlled by a narrow pulse train (sequence), the switch is closed when the pulse appears and the switch is disconnected when the pulse disappears. The output Y (t) is a pulse train (sequence) with changing amplitude, and the amplitude of each pulse is the instantaneous value of the input signal F (t) when the pulse appears. Therefore, Y (t) is the sampled signal or sample signal of f(t).
FIG. 3-2(a) shows a narrow pulse sequence P (t) with Ts as the time interval. Because it is used for sampling, it is called a sampling pulse.
Figure 3-2(b), V (t) is the analog voltage signal to be sampled, The values of k(t) of discrete signals after sampling are k(0) = 0.2, K (Ts) = 0.4, K (2Ts) = 1.8, K (3Ts) = 2.8, K (4Ts) = 3.6, k(5Ts) = 5.1, respectively. 6 k (ts) = 6.0, (ts) = 5.7 k, k (ts) = 3.9, (9 ts) = 2.0 k, k (ts) = 1.2; So it’s random between 0 and 6, which means there’s an infinite number of possible values.
FIG. 3-2(c), in order to change the infinite number of possible values into a finite number, we must quantize the values of k(t) (i.e., round to five) to get m(t). Values into the m (t), m (0) = 0.0, m = 0.0 (Ts), m (2 Ts) = 2.0, m (3 Ts) = 3.0, m (4 Ts) = 4.0, = 5.0 m (5 ts), m (ts) = 6.0, m (ts) = 6.0, m (ts) = 4.0, 2.0 m (9 ts) =, m (ts) = 1.0; There are only seven possible values from 0 to 6.
As shown in Figure 3-2(d), M (t) has become a digital signal, but it is not yet a binary digital signal in practical application. Therefore, the digital signal D (t) in Figure 3-2(d) can be obtained by natural encoding of M (t) with 3-bit binary encoding element. Thus A/D conversion is completed and pulse code modulation is realized. That is, the whole PROCESS of PCM.
1.3 Quantitative process of understanding
Quantization ππ» maps an infinite set of values of a continuous function to a finite set of values of a discrete function.
Quantitative values
ππ» Quantified valueQuantitative level
ππ» Number of quantized valuesQuantitative interval
ππ» the difference between two adjacent quantized values
In the figure above, the sample value signal K (t) of V (t) is different from the quantized signal M (t), for example, k(0) = 0.2 and M (0) = 0.
The receiver can only be the quantized signal M (t), but can not recover k(t). In this way, the error between the sending and receiving signals is called quantization error or quantization noise.
The quantization is “rounded”, so the maximum quantization noise is 0.5. In general, for quantization noise, the maximum absolute error is 0.5 quantization intervals. The quantization with the same interval is called uniform quantization.
1.4 Audio compression coding principle & standard
The quality of digital audio depends on two parameters: sampling frequency and quantization number. In order to ensure that the sampling points in the direction of time change as close as possible, the sampling frequency should be high; The amplitude value should be as fine as possible and the quantization bit rate should be high. The immediate result is pressure on storage capacity and transmission channel capacity requirements.
Transmission rate of audio signal = sampling frequency * number of quantized bits of sample * number of channels
Audio compression coding principle
-
Lossy coding lossy coding eliminates redundant data, because in the acquisition process, there are all kinds of frequencies of sound, which we can discard the human ear can not hear the part of the sound data, directly from the data source to kill! This can greatly reduce the amount of data stored.
-
Lossless coding Lossless coding, also known as Huffman coding, preserves all sound data (that is, sounds audible to the human ear) except for compressed sound that is inaudible to the human ear! And the compressed data can be completely restored! (Short code high frequency, long code low frequency)
-
Method of compression
- remove-acquired
Audio redundancy information
! includingBeyond the range of human hearing
The data,Is obscured
Audio signals, etc. Shading effect
ππ» A weaker voice can be influenced by a stronger one!Shading effect
The signal performance ππ»Frequency domain to cover
εThe time domain to cover
.
- remove-acquired
-
Audio compression encoding format
- MPEG-1
- Dolby AC – 3
- MPEG-2
- MPEG-4
- AAC
standard
- Sampling frequency = 44.1kHz
- The number of quantized bits of the sample value = 16
- The number of signal channels in normal stereo = 2
- Digital signal transmission bit stream is about 1.4M bit/s
- The amount of data in one second is 1.4Mbit/(8/Byte) and 176.4 bytes (Byte), equal to the amount of data in 88,200 Chinese characters
Frequency domain masking and time domain masking
Now, let’s focus on frequency domain masking and time domain masking and what exactly does that mean?
2.1 Frequency domain masking
The X-axis is the frequency domain
. Here it starts at 20hz because you can’t hear it20 hz
Frequency soundOn the Y-axis is decibels
.Below 40 decibels
“Is inaudible to human ears- Take a look at
Purple column
This part of the sound is audible, but the periphery is presentRed column
, the high column means that the sound is particularly loud and the surroundingPurple pole
allshelter
the - while
Green column
Is smaller than purple and red, proving that the sound comparisonmagnetic
green
andred
After the collision, just like boys and girls quarrel, certainly stem however, doomed to lose, π- However,
Green column
From theRed column
A whiledistance
“, indicating theirThere is a large difference in frequency domain values
From a third party’s point of view, they both canhear
the
2.2 Time domain masking
- In the figure,
simultaneous
At this time, there is oneThe high
(top line), there’s onebass
(bottom line) and vocalizing at the same time, thenThe treble will completely drown out the bass
- There is a special case ππ»
pre
At the beginning, there is a small sound, and then suddenly (for a short time) there is a big sound, at which point the small sound will be obscured - There is a voice
Travel time
Yes, there isTime effect
Probably,Cover about 50ms forward
.Cover backward about 100ms
Audio AAC coding
Audio AAC coding and video are completely different, do not refer to the previous video coding process! Video coding uses the VideoToolBox, and audio using the AudioToolBox.
Again, encoding audio also encapsulates a utility class.
3.1 Encoding tool class header file & initialization
Audio parameter configuration class
First, initialization is handled in the same way as video encoding. We also define an audio parameter configuration class named CCAudioConfigππ»
/ / @interface CCAudioConfig: NSObject /** code rate */ @property (nonatomic, assign) NSInteger bitRate; //96000) /** channel */ @property (nonatomic, assign) NSInteger channelCount; // (1) /** sampleRate */ @property (nonatomic, assign) NSInteger sampleRate; //(default 44100) /** Sample point quantization */ @property (nonatomic, assign) NSInteger sampleSize; //(16) + (instancetype)defaultConifg; @endCopy the code
Implementation section ππ»
@implementation CCAudioConfig
+ (instancetype)defaultConifg {
return [[CCAudioConfig alloc] init];
}
- (instancetype)init
{
self = [super init];
if (self) {
self.bitrate = 96000;
self.channelCount = 1;
self.sampleSize = 16;
self.sampleRate = 44100;
}
return self;
}
@end
Copy the code
Utility class callback
/**AAC encoder delegate */ @protocol CCAudioEncoderDelegate <NSObject> - (void)audioEncodeCallback (NSData *)aacData; @endCopy the code
So audio, unlike video, when we encode audio, we directly output binary NSData.
Utility class header file
The utility class is named CCAudioEncoder, and the header file should contain the initialization method and encoding method ππ»
/** CCAudioEncoder: NSObject */ @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, weak) id<CCAudioEncoderDelegate> delegate; /** initialize the incoming encoder configuration */ - (instancetype)initWithConfig:(CCAudioConfig*)config; / * * * / coding - (void) encodeAudioSamepleBuffer: (CMSampleBufferRef) sampleBuffer; @endCopy the code
Initialization Process
As with video encoding, we also need to define two queues ππ» encoding queue + callback queue to asynchronously process the audio encoding and callback result ππ», respectively
@property (nonatomic, strong) dispatch_queue_t encoderQueue;
@property (nonatomic, strong) dispatch_queue_t callbackQueue;
Copy the code
Also required are the audio encoding related properties ππ»
// For @property (nonatomic, unsafe_unretained) AudioConverterRef audioConverter; //PCM buffer @property (nonatomic) char *pcmBuffer; //PCM buffer size @property (nonatomic) size_t pcmBufferSize;Copy the code
In the initialization method, the processing for these attributes is ππ»
- (instancetype)initWithConfig:(CCAudioConfig*)config { self = [super init]; If (self) {encoderQueue = dispatch_queue_create(" aAC hard encoder queue", DISPATCH_QUEUE_SERIAL); // callbackQueue = dispatch_queue_create(" aAC hard encoder callback queue", DISPATCH_QUEUE_SERIAL); // Audio converter _audioConverter = NULL; _pcmBufferSize = 0; _pcmBuffer = NULL; _config = config; if (config == nil) { _config = [[CCAudioConfig alloc] init]; } } return self; }Copy the code
3.2 Preparation before coding
Before encoding the audio, we need to configure the parameters of the audio encoding, which involves the functions of the audio related structures, creating converters and configuring properties.
3.2.1 Structure of audio parameters
Audio parameter structure is AudioStreamBasicDescription π π»
This structure provides a description of the audio file.
The generation of audio files is analog signal -> digital signal after PCM -> compressed, encoded audio files.
- The sampling frequency at PCM is called
sample rate
. - Each sample can yield several samples of data, corresponding to multiple
channel
. - A number of samples obtained from each sampling point are combined and called one
frame
. A number of frame
When you combine them, you call them onepacket
.
Members interpret ππ»
mSampleRate
, is to adopt the frequencymBitsPerChannel
Is the number of bits per sample datamChannelsPerFrame
, can be understood as the number of sound channels, that is, the number of sampling data generated at a sampling timemFramesPerPacket
Is the number of frames in each packet, and is equal to the number of sampling intervals experienced in this packetmBytesPerPacket
, the number of bytes of data in each packetmBytesPerFrame
, the number of bytes of data in each frame
3.2.2 Creating converter
Correlation function is AudioConverterNewSpecific π π»
Parameter Description ππ»
- Parameter 1: Enter the audio format description
- Parameter 2: Output audio format description
- Parameter 3: Number of class desc
- Parameter 4: class desc
- Parameter 5: The converter created
3.2.3 Setting Converter Properties
The correlation function isAudioConverterSetProperty
Parameter Description ππ»
- Parameter 1: converter
- Parameter 2: the key of the property, refer to enumeration
AudioConverterPropertyID
- Parameter 3:
The value attribute values
Of the data typeThe size size
- Four parameters:
The value attribute values
theAddress value
3.2.4 Description of the encoder type
The structure described by the encoder type is AudioClassDescriptionππ»
Members interpret ππ»
mType
Encoding the format of the output, for exampleAAC
mSubType
Encoding the subformat of the outputmManufacturer
Way of codingSoft coding
orHard coded
3.3 coding
Then we came to the key coding process, our foreign coding method is – (void) encodeAudioSamepleBuffer: (CMSampleBufferRef) sampleBuffer; , is called in the collection callback method of the collection class CCSystemCapture ππ»
// Capture audio and video callback - (void)captureSampleBuffer:(CMSampleBufferRef)sampleBuffer Type: (CCSystemCaptureType)type {if (CCSystemCaptureTypeAudio) {// Audio data //1. Play PCM data directly NSData * pcmData = [self convertAudioSamepleBufferToPcmData: sampleBuffer]; [_pcmPlayer playPCMData:pcmData]; / / 2. AAC coding [_audioEncoder encodeAudioSamepleBuffer: sampleBuffer]; }else { [_videoEncoder encodeVideoSampleBuffer:sampleBuffer]; }}Copy the code
There are two processing methods for the collected audio ππ»
- Play PCM data directly
- AAC encoding
Directly play PCM data to speak again, we first look at AAC coding ππ»
- (void)encodeAudioSamepleBuffer:(CMSampleBufferRef)sampleBuffer { CFRetain(sampleBuffer); // Check whether the audio converter was created successfully. If the vm is not created successfully. Then configure the audio encoding parameters and create a transcoder if (! _audioConverter) { [self setupEncoderWithSampleBuffer:sampleBuffer]; } // dispatch_async(_encoderQueue, ^{});Copy the code
First, take a look at the process of configuring the audio encoding parameters and creating a transcoder ππ»
- (void)setupEncoderWithSampleBuffer: (CMSampleBufferRef sampleBuffer) {/ / obtain input parameters AudioStreamBasicDescription inputAduioDes = *CMAudioFormatDescriptionGetStreamBasicDescription( CMSampleBufferGetFormatDescription(sampleBuffer)); / / set the output parameters AudioStreamBasicDescription outputAudioDes = {0}; outputAudioDes.mSampleRate = (Float64)_config.sampleRate; // Outputauds35. mFormatID = kAudioFormatMPEG4AAC; / / output format outputAudioDes. MFormatFlags = kMPEG4Object_AAC_LC; / / if set to 0 for lossless coding outputAudioDes. MBytesPerPacket = 0; / / identify each packet size outputAudioDes ourselves. MFramesPerPacket = 1024; // The number of frames per packet is aAC-1024; outputAudioDes.mBytesPerFrame = 0; / / each frame size outputAudioDes mChannelsPerFrame = (uint32_t) _config. ChannelCount; / / output channel number outputAudioDes. MBitsPerChannel = 0; // The number of bits sampled for each channel in the data frame. outputAudioDes.mReserved = 0; // Fill output information UInt32 outDesSize = sizeof(outputAudioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &outDesSize, &outputAudioDes); AudioClassDescription *audioClassDesc = [self getAudioCalssDescriptionWithType:outputAudioDes.mFormatID fromManufacture:kAppleSoftwareAudioCodecManufacturer]; / / create the converter OSStatus status = AudioConverterNewSpecific (& inputAduioDes, & outputAudioDes, 1, audioClassDesc, &_audioConverter); if (status ! = noErr) {NSLog(@"Error! : hardcoded AAC creation failed, status= %d", (int)status); return; } /* kAudioConverterQuality_Max = 0x7F, kAudioConverterQuality_High = 0x60, kAudioConverterQuality_Medium = 0x40, kAudioConverterQuality_Low = 0x20, kAudioConverterQuality_Min = 0 */ UInt32 temp = kAudioConverterQuality_High; / / codec rendering quality AudioConverterSetProperty (_audioConverter kAudioConverterCodecQuality, sizeof (temp), & temp); // Set the bitrate to uint32_t audioBitrate = (uint32_t)self.config.bitrate; uint32_t audioBitrateSize = sizeof(audioBitrate); status = AudioConverterSetProperty(_audioConverter, kAudioConverterEncodeBitRate, audioBitrateSize, &audioBitrate); if (status ! = noErr) {NSLog(@"Error! : hard coded AAC setting bit rate failed "); }}Copy the code
Then we come to the encoded asynchronous queue, the process that needs to be processed ππ»
- Get PCM data stored in BlockBuffer
CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); CFRetain(blockBuffer); / / get BlockBuffer audio data and audio data in address OSStatus status = CMBlockBufferGetDataPointer (BlockBuffer, 0, NULL, & _pcmBufferSize, &_pcmBuffer); NSError *error = nil; if (status ! = kCMBlockBufferNoErr) { error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Error: ACC encode get data point error: %@",error); return; }Copy the code
- Output buffer, wrapped into
AudioBufferList
In the
First take a look atAudioBufferList
π π»
Obviously, we want to store the blockBuffer obtained above into AudioBuffer mBuffers[1], a member of the AudioBufferList, and do so ππ»
Uint8_t *pcmBuffer = malloc(_pcmBufferSize); // set _pcmBufferSize to pcmBuffer. Memset (pcmBuffer, 0, _pcmBufferSize); AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)_config.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)_pcmBufferSize; outAudioBufferList.mBuffers[0].mData = pcmBuffer;Copy the code
- Configure the fill function to get the output data
thisPopulate the function
, it isWhen coding is done
The callback function is passedAudioConverterFillComplexBuffer
To configure it. Let’s seeAudioConverterFillComplexBuffer
π π»
Convert the data provided by the input callback function, with the parameters interpreted as ππ»
- Parameter 1:
inAudioConverter
Audio frequency converter - Argument 2:
inInputDataProc
Callback function. A callback function that provides the audio data to be converted. This callback is repeatedly called when the converter is ready to accept new input data. - Parameter 3:
inInputDataProcUserData
ε³self
- Four parameters:
ioOutputDataPacketSize
, the output buffer size - Parameter 5:
outOutputData
, the audio data to be converted - Parameter 6:
outPacketDescription
To output packet information
The code is ππ»
// The output packet size is 1 UInt32 outputDataPacketSize = 1. / / convert the input data provided by the callback function status = AudioConverterFillComplexBuffer (_audioConverter aacEncodeInputDataProc, (__bridge void * _Nullable)(self), &outputDataPacketSize, &outAudioBufferList, NULL); If (status == noErr) {// Get data NSData *rawAAC = [NSData dataWithBytes: outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; // Free pcmBuffer (pcmBuffer); / / add ADTS head, want to get naked, please ignore add ADTS head, write to a file, you must add / / NSData * adtsHeader = [self adtsDataForPacketLength: rawAAC, length]. // NSMutableData *fullData = [NSMutableData dataWithCapacity:adtsHeader.length + rawAAC.length];; // [fullData appendData:adtsHeader]; // [fullData appendData:rawAAC]; / / the data is passed to the callback queue dispatch_async (_callbackQueue, ^ {[_delegate audioEncodeCallback: rawAAC]; }); } else { error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; } // Release CFRelease(blockBuffer); CFRelease(sampleBuffer); If (error) {NSLog(@"error: AAC encoding failed %@",error); }Copy the code
The encoded callback function is aacEncodeInputDataProc. For details, see 3.4 Encoding Callback.
Moving on, we notice that there are two ways to handle a successful configuration function ππ»
- write
Disk file
, the premise is to addADTS
header - The data is passed to the callback queue for processing by the caller
decoding
The process of
AAC audio format
Speaking of THE ADTS head, we must first explain the AAC audio format, divided into two ππ»
-
ADIF: Audio Data Interchange Format Audio Data Interchange Format. The feature of this format is that it is possible to find the beginning of the audio data definitively, without the decoding that begins in the middle of the audio data stream, i.e. it must be decoded at a clearly defined beginning. Therefore, this format is often used in disk files.
-
ADTS: Audio Data Transport Stream. The characteristic of this format is that it is a bit stream with synchronization words, and decoding can start anywhere in this stream. Its features are similar to the MP3 data stream format.
Simply put, ADTS can be decoded at any frame, which means that it has headers for each frame. ADIF has only one unified header, so it must be decoded after getting all the data. And the formats of the two headers are also different. At present, the encoded and extracted audio streams are all in ADTS format ππ»
Add ADTS head
Here is the process of adding ADTS header ππ» (just to understand)
/** * Add ADTS header at the beginning of each and every AAC packet. * This is needed as MediaCodec encoder generates a Packet of raw * AAC data. * * AAC ADtS header * Note the packetLen must count in the ADtS header itself. * See: http://wiki.multimedia.cx/index.php? title=ADTS * Also: http://wiki.multimedia.cx/index.php? title=MPEG-4_Audio#Channel_Configurations **/ - (NSData*)adtsDataForPacketLength:(NSUInteger)packetLength { int adtsLength = 7; char *packet = malloc(sizeof(char) * adtsLength); // Variables Recycled by addADTStoPacket int profile = 2; //AAC LC //39=MediaCodecInfo.CodecProfileLevel.AACObjectELD; int freqIdx = 4; Int chanCfg = 1; int chanCfg = 1; int chanCfg = 1; int chanCfg = 1; //MPEG-4 Audio Channel Configuration. 1 Channel front-center NSUInteger fullLength = adtsLength + packetLength; // fill in ADTS data packet[0] = (char)0xFF; // 11111111 = syncword packet[1] = (char)0xF9; // 1111 1 00 1 = syncword MPEG-2 Layer CRC packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2)); packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11)); packet[4] = (char)((fullLength&0x7FF) >> 3); packet[5] = (char)(((fullLength&7)<<5) + 0x1F); packet[6] = (char)0xFC; NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES]; return data; }Copy the code
3.4 Encoding callback
Finally, let’s look at the process of coding the callback function aacEncodeInputDataProc ππ»
static OSStatus aacEncodeInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, Void *inUserData) {// Get self CCAudioEncoder *aacEncoder = (__bridge CCAudioEncoder *)(inUserData); PcmBuffsize if (! aacEncoder.pcmBufferSize) { *ioNumberDataPackets = 0; return - 1; } // Make ioData->mBuffers[0]. MData = aacEncoder. ioData->mBuffers[0].mDataByteSize = (uint32_t)aacEncoder.pcmBufferSize; ioData->mBuffers[0].mNumberChannels = (uint32_t)aacEncoder.config.channelCount; AacEncoder. PcmBufferSize = 0; *ioNumberDataPackets = 1; return noErr; }Copy the code
Basically, the decoded data (cached in the CCAudioEncoder instance) is populated into the AudioBufferList.
3.5 summary
As shown in the figure above, the coding totals three steps:
- Configure encoder, start preparing coding;
- PCM data are collected and transmitted to the encoder;
- Code completion
The callback callback
Or,Written to the file
.
4. Audio AAC decoding
Analysis play coding process, the next natural is decoding ππ» we also package a tool class CCAudioDecoder.
4.1 Decoding tool class header file
Decoding is the same as coding. Go straight to the code.
#import <Foundation/Foundation.h> #import <AVFoundation/AVFoundation.h> @class CCAudioConfig; / / @protocol CCAudioDecoderDelegate <NSObject> - (void)audioDecodeCallback:(NSData *)pcmData; @end @interface CCAudioDecoder : NSObject @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, weak) id<CCAudioDecoderDelegate> delegate; // initialize the incoming decoding configuration - (instancetype)initWithConfig:(CCAudioConfig *)config; /** decodeAudioAACData: (NSData *)aacData; @endCopy the code
Extension classes in. M ππ»
@interface CCAudioDecoder() @property (nonatomic, strong) dispatch_queue_t decoderQueue; @property (nonatomic, strong) dispatch_queue_t callbackQueue; // for the audioConverter object @property (nonatomic) AudioConverterRef audioConverter; @property (nonatomic) char *aacBuffer; //AAC cache size @PROPERTY (NONATOMIC) UInt32 aacBufferSize; / / the description of the audio stream packets @ property (nonatomic) AudioStreamPacketDescription * packetDesc; @endCopy the code
4.2 the initialization
Next look at initialization ππ»
- (instancetype)initWithConfig:(CCAudioConfig *)config {
self = [super init];
if (self) {
_decoderQueue = dispatch_queue_create("aac hard decoder queue", DISPATCH_QUEUE_SERIAL);
_callbackQueue = dispatch_queue_create("aac hard decoder callback queue", DISPATCH_QUEUE_SERIAL);
_audioConverter = NULL;
_aacBufferSize = 0;
_aacBuffer = NULL;
_config = config;
if (_config == nil) {
_config = [[CCAudioConfig alloc] init];
}
AudioStreamPacketDescription desc = {0};
_packetDesc = &desc;
[self setupEncoder];
}
return self;
}
Copy the code
And then the setupEncoder π π»
- (void) setupEncoder {/ / output parameters PCM AudioStreamBasicDescription outputAudioDes = {0}; outputAudioDes.mSampleRate = (Float64)_config.sampleRate; / / sampling rate outputAudioDes mChannelsPerFrame = (UInt32) _config. ChannelCount; // Outputauds35. mFormatID = kAudioFormatLinearPCM; / / output format outputAudioDes. MFormatFlags = (kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked); / / code 12 outputAudioDes. MFramesPerPacket = 1; // Number of frames per packet; outputAudioDes.mBitsPerChannel = 16; // The number of bits sampled for each channel in the data frame. outputAudioDes.mBytesPerFrame = outputAudioDes.mBitsPerChannel / 8 *outputAudioDes.mChannelsPerFrame; / / each frame size (sampling) the digits / 8 * the number of channels outputAudioDes. MBytesPerPacket = outputAudioDes. MBytesPerFrame * outputAudioDes mFramesPerPacket; // Each packet size (frame size * frame number) outputauds35s.mreserved = 0; / / the way 0 (8 byte alignment) / / input parameters aac AudioStreamBasicDescription inputAduioDes = {0}; inputAduioDes.mSampleRate = (Float64)_config.sampleRate; inputAduioDes.mFormatID = kAudioFormatMPEG4AAC; inputAduioDes.mFormatFlags = kMPEG4Object_AAC_LC; inputAduioDes.mFramesPerPacket = 1024; inputAduioDes.mChannelsPerFrame = (UInt32)_config.channelCount; // Padding output information UInt32 inDesSize = sizeof(inputAduioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &inDesSize, &inputAduioDes); AudioClassDescription *audioClassDesc = [self getAudioCalssDescriptionWithType:outputAudioDes.mFormatID fromManufacture:kAppleSoftwareAudioCodecManufacturer]; / / create the converter OSStatus status = AudioConverterNewSpecific (& inputAduioDes, & outputAudioDes, 1, audioClassDesc, &_audioConverter); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC creation failed, status= %d", (int)status); return; }}Copy the code
And coding – (void) setupEncoderWithSampleBuffer: (CMSampleBufferRef) sampleBuffer process is basically the same. Difference is that create decoder method getAudioCalssDescriptionWithType: fromManufacture: π π»
- (AudioClassDescription *)getAudioCalssDescriptionWithType:(AudioFormatID)type fromManufacture:(uint32_t)manufacture { static AudioClassDescription desc; UInt32 decoderSpecific = type; // Obtain the total size of AAC decoder UInt32 SIZE; OSStatus status = AudioFormatGetPropertyInfo(kAudioFormatProperty_Decoders, sizeof(decoderSpecific), &decoderSpecific, &size); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC get info failed, status= %d", (int)status); return nil; } // Count the number of decoders unsigned int count = size/sizeof(AudioClassDescription); // Create an array containing count decoders AudioClassDescription description[count]; Status = AudioFormatGetProperty(kAudioFormatProperty_Encoders, sizeof(decoderSpecific), &decoderSpecific, &size, &description); if (status ! = noErr) {NSLog(@"Error! : hard decoding AAC get propery failed, status= %d", (int)status); return nil; } for (unsigned int i = 0; i < count; i++) { if (type == description[i].mSubType && manufacture == description[i].mManufacturer) { desc = description[i]; return &desc; }}Copy the code
There are several differences ππ»
- Different output parameters: encoding is
AAC
Decoding isPCM
- The converter is different. The code is
kAudioFormatProperty_Encoders
Decoding iskAudioFormatProperty_Decoders
4.3 Preparation before decoding
We have encapsulated a structure CCAudioUserData, which records aAC information as an argument to the decoded callback function ππ»
typedef struct {
Β Β char * data;
Β Β UInt32 size;
Β Β UInt32 channelCount;
Β Β AudioStreamPacketDescription packetDesc;
} CCAudioUserData;
Copy the code
4.4 the decoding
Then comes the decoding process ππ»
- (void)decodeAudioAACData:(NSData *)aacData { if (! _audioConverter) { return; } dispatch_async(_decoderQueue, ^{//CCAudioUserData records aAC information as an argument to the decoder callback function CCAudioUserData userData = {0}; userData.channelCount = (UInt32)_config.channelCount; userData.data = (char *)[aacData bytes]; userData.size = (UInt32)aacData.length; userData.packetDesc.mDataByteSize = (UInt32)aacData.length; userData.packetDesc.mStartOffset = 0; userData.packetDesc.mVariableFramesInPacket = 0; // Output size and Number of packets UInt32 pcmBufferSize = (UInt32)(2048 * _config.channelcount); // Output size and Number of packets UInt32 pcmBufferSize = (UInt32)(2048 * _config.channelcount); UInt32 pcmDataPacketSize = 1024; PCM uint8_t *pcmBuffer = malloc(pcmBufferSize); memset(pcmBuffer, 0, pcmBufferSize); AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)_config.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)pcmBufferSize; outAudioBufferList.mBuffers[0].mData = pcmBuffer; / / output describe AudioStreamPacketDescription outputPacketDesc = {0}; // Configure the fill function, To obtain the output data OSStatus status = AudioConverterFillComplexBuffer (_audioConverter, & AudioDecoderConverterComplexInputDataProc, &userData, &pcmDataPacketSize, &outAudioBufferList, &outputPacketDesc); if (status ! = noErr) { NSLog(@"Error: AAC Decoder error, status=%d",(int)status); return; } / / if access to data if (outAudioBufferList. MBuffers [0]. MDataByteSize > 0) {NSData * rawData = [NSData dataWithBytes:outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; dispatch_async(_callbackQueue, ^{ [_delegate audioDecodeCallback:rawData]; }); } free(pcmBuffer); }); }Copy the code
The decoding of the callback function is AudioDecoderConverterComplexInputDataProc.
4.5 Decoding Callback
static OSStatus AudioDecoderConverterComplexInputDataProc( AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData) { CCAudioUserData *audioDecoder = (CCAudioUserData *)(inUserData); if (audioDecoder->size <= 0) { ioNumberDataPackets = 0; return -1; } // outDataPacketDescription = &audioDecoder->packetDesc; (*outDataPacketDescription)[0].mStartOffset = 0; (*outDataPacketDescription)[0].mDataByteSize = audioDecoder->size; (*outDataPacketDescription)[0].mVariableFramesInPacket = 0; ioData->mBuffers[0].mData = audioDecoder->data; ioData->mBuffers[0].mDataByteSize = audioDecoder->size; ioData->mBuffers[0].mNumberChannels = audioDecoder->channelCount; return noErr; }Copy the code
As you’ll see in decoding, there’s no need to bridge the self object because we’re caching the data in our custom CCAudioUserData structure. This is also different from coding.
At this point, the decoding tool class is wrapped! πΊ πΊ πΊ πΊ πΊ πΊ
V. Audio PCM playback
Finally, a bit more ππ» audio PCM playback. Some scenes may be played directly after capturing the audio without codec.
We can define a method in the coding utility class ππ»
- (NSData *)convertAudioSamepleBufferToPcmData: (CMSampleBufferRef sampleBuffer) {/ / obtain PCM data size size_t size = CMSampleBufferGetTotalSampleSize (sampleBuffer); // Allocate space int8_t *audio_data = (int8_t *)malloc(size); memset(audio_data, 0, size); / / get CMBlockBuffer, it saves the PCM data CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer (sampleBuffer); / / the data copy to our allocated space CMBlockBufferCopyDataBytes (blockBuffer, 0, the size, audio_data); NSData *data = [NSData dataWithBytes:audio_data length:size]; free(audio_data); return data; }Copy the code
Extract the sampleBuffer data out of the PCM data, then call back to the ViewController and play the PCM data directly.
The player class CCAudioPCMPlayer is required to play, which is like a decoder ππ»
@class CCAudioConfig; @interface CCAudioPCMPlayer : NSObject - (instancetype)initWithConfig:(CCAudioConfig *)config; /** playPCM */ - (void)playPCMData:(NSData *)data; /** set volume increment 0.0-1.0 */ - (void)setupVoice:(Float32)gain; /** dispose */ - (void)dispose; @endCopy the code
Implementation section ππ»
#import "CCAudioPCMPlayer.h" #import <AudioToolbox/AudioToolbox.h> #import <AVFoundation/AVFoundation.h> #import "Ccavconfig. h" #import" ccaudiodataqueue. h" #define MIN_SIZE_PER_FRAME 2048 // Minimum data length per frame static const int kNumberBuffers_play = 3; // 1 typedef struct AQPlayerState { AudioStreamBasicDescription mDataFormat; // 2 AudioQueueRef mQueue; // 3 AudioQueueBufferRef mBuffers[kNumberBuffers_play]; // 4 AudioStreamPacketDescription *mPacketDescs; // 9 }AQPlayerState; @interface CCAudioPCMPlayer () @property (nonatomic, assign) AQPlayerState aqps; @property (nonatomic, strong) CCAudioConfig *config; @property (nonatomic, assign) BOOL isPlaying; @end @implementation CCAudioPCMPlayer static void TMAudioQueueOutputCallback(void * inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer) { AudioQueueFreeBuffer(inAQ, inBuffer); } - (instancetype)initWithConfig:(CCAudioConfig *)config { self = [super init]; if (self) { _config = config; / / configuration AudioStreamBasicDescription dataFormat = {0}; dataFormat.mSampleRate = (Float64)_config.sampleRate; / / sampling rate dataFormat mChannelsPerFrame = (UInt32) _config. ChannelCount; Dataformat. mFormatID = kAudioFormatLinearPCM; dataformat. mFormatID = kAudioFormatLinearPCM; / / output format dataFormat. MFormatFlags = (kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked); / / code 12 dataFormat. MFramesPerPacket = 1; // Number of frames per packet; dataFormat.mBitsPerChannel = 16; // The number of bits sampled for each channel in the data frame. dataFormat.mBytesPerFrame = dataFormat.mBitsPerChannel / 8 *dataFormat.mChannelsPerFrame; / / each frame size (sampling) the digits / 8 * the number of channels dataFormat. MBytesPerPacket = dataFormat. MBytesPerFrame * dataFormat mFramesPerPacket; // Each packet size (frame size * frame number) dataformat. mReserved = 0; AQPlayerState state = {0}; state.mDataFormat = dataFormat; _aqps = state; [self setupSession]; / / create play queue OSStatus status = AudioQueueNewOutput (& _aqps mDataFormat, TMAudioQueueOutputCallback, NULL, NULL, NULL, 0, &_aqps.mQueue); if (status ! = noErr) { NSError *error = [[NSError alloc] initWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Error: AudioQueue create error = %@", [error description]); return self; } [self setupVoice:1]; _isPlaying = false; } return self; } - (void)setupSession { NSError *error = nil; // Set the session to active or inactive. Please note that activating an audio session is a synchronous (blocking) operation [[AVAudioSession sharedInstance] setActive:YES error:&error]; if (error) { NSLog(@"Error: audioQueue palyer AVAudioSession error, error: %@", error); } / / set the session category [[AVAudioSession sharedInstance] setCategory: AVAudioSessionCategoryPlayAndRecord error: & error]; if (error) { NSLog(@"Error: audioQueue palyer AVAudioSession error, error: %@", error); }} - (void)playPCMData:(NSData *)data {// point to audio queue buffer AudioQueueBufferRef inBuffer; /* Asks the audio queue object to allocate an audio queue buffer. Parameter 1: audio queue parameter to allocate buffer 2: capacity required for new buffer (in bytes) Parameter 3: output, Pointer to the newly allocated audio queue buffer */ AudioQueueAllocateBuffer(_AQps.mqueue, MIN_SIZE_PER_FRAME, &inBuffer); Memcpy (inBuffer->mAudioData, data.bytes, data.length); / / set inBuffer. MAudioDataByteSize inBuffer - > mAudioDataByteSize = (UInt32) data. Length; // Adds the buffer to the buffer queue for recording or playing audio. /* Parameter 1: the audio queue with the audio queue buffer parameter 2: the audio queue buffer to add to the buffer queue. Parameter 3: The number of audio packets in the inBuffer parameter. For any of the following cases, use the value 0: * when playing constant bitrate (CBR) format. * When an audio queue is a recording (input) audio queue. * when using audioqueueallocateBufferWithPacketDescriptions functions assigned to the end of the line in the buffer. In this case, the callback should describe the packets of the buffer in the mpackedDescriptions and mpackedDescriptionCount fields of the buffer. Parameter 4: Description of a set of packets. For any of the following cases, use null value * when playing constant bitrate (CBR) format. * When an audio queue is an input (recorded) audio queue. * when using audioqueueallocateBufferWithPacketDescriptions functions assigned to the end of the line in the buffer. In this case, The callback should describe the packets */ OSStatus status = in the buffer's mpackedDescriptions and mpackedDescriptionCount fields AudioQueueEnqueueBuffer(_aqps.mQueue, inBuffer, 0, NULL); if (status ! = noErr) { NSLog(@"Error: audio queue palyer enqueue error: %d",(int)status); } // Start playing or recording audio /* Parameter 1: the audio queue to start parameter 2: the time the audio queue should start. To specify the start time relative to the timeline of the associated audio device, use the MsampleTime field of the audioTimestamp structure. Use NULL to indicate that the audio queue should start as soon as possible */ AudioQueueStart(_aqps.mqueue, NULL); } // This function is not required, //- (void)pause {// AudioQueuePause(_aqps.mqueue); // set the volume increment // 0.0-1.0 - (void)setupVoice:(Float32)gain {Float32 gain0 = gain; if (gain < 0) { gain0 = 0; }else if (gain > 1) { gain0 = 1; } // Set audio queue parameter values /* Parameter 1: audio queue parameter to start 2: property parameter 3:value */ AudioQueueSetParameter(_AQps.mqueue, kAudioQueueParam_Volume, gain0); Dispose {AudioQueueStop(_aqps.mqueue, true); dispose {AudioQueueStop(_aqps.mqueue, true); AudioQueueDispose(_aqps.mQueue, true); } @endCopy the code
There are detailed comments in the code, which will not be repeated here.
conclusion
-
Audio principle
- Three elements of sound:
Pitch, volume and timbre
- Range of human hearing:
20hz-20kHz
Is lower than 20hzsound
, higher than 20kHz isultrasonic
- Pulse code modulation (PCM)
- There are three stages:
Sampling, quantification and coding
- There are three stages:
- Audio compression coding principle
- Transmission rate of audio signal = sampling frequency * number of quantized bits of sample * number of channels
Lossy coding
: Eliminates redundant data (inaudible to the human ear)Lossless coding
:Haverman coding
Compress parts of the human ear that cannot be heard and keep the rest as is- Compression method
- remove-acquired
Audio redundancy information
- Masking effect: One sound overwrites another
- remove-acquired
- Audio compression encoding format: MPEG-1, Dolby AC-3, MPEG-2, MPEG-4 and AAC
- Standard parameters for audio
- Sampling frequency = 44.1kHz
- The number of quantized bits of the sample value = 16
- The number of signal channels in normal stereo = 2
- Digital signal transmission bit stream is about 1.4M bit/s
- The amount of data in one second is 1.4Mbit/(8/Byte) and 176.4 bytes (Byte), equal to the amount of data in 88,200 Chinese characters
- Three elements of sound:
-
Frequency domain masking and time domain masking
- Frequency domain to cover
- A weaker sound is overwritten by a stronger sound from nearby
- The sound of different frequency domains and different decibels has little influence on each other
- The time domain to cover
- At the same time, the treble will completely block out the bass
- Special shielding: start small volume sound, a short time (50ms) within another large volume sound, the latter will cover the former
- Sound transmission time:
Cover about 50ms forward
.Cover backward about 100ms
- Frequency domain to cover
-
Audio AAC coding tool class
- Two queues:
Coding queue
Β +ΒThe callback queue
- Preparation before coding
- Audio parameter structure
AudioStreamBasicDescription
- Create Converter: function
AudioConverterNewSpecific
- Set converter properties: function
AudioConverterSetProperty
- The encoder type describes the structure
AudioClassDescription
- Audio parameter structure
- AAC coding process
- Check whether the audio converter is created successfully.
- Success ππ» direct return
- Failed ππ» configure audio encoding parameters and create transcoder
- Get input parameters
AudioStreamBasicDescription
- Set the output parameters (AAC format)
AudioStreamBasicDescription
) AudioFormatGetProperty
Fill output parameter- Obtain the description of the encoder
AudioClassDescription
AudioConverterNewSpecific
Creating an Audio converterAudioConverterSetProperty
Configure the converter properties:Codec quality
andBit rate
Etc.
- Get input parameters
- Audio encoding is processed in asynchronous queues
- To obtain
BlockBuffer
PCM data stored in - Package buffer (PCM data) into
AudioBufferList
In the AudioConverterFillComplexBuffer
Converts the data provided by the input callback function- After the callback configuration succeeds, the output data in AAC format is converted to NSData
- At this point, the processing can be divided into two scenarios 5.1. Write
Disk file
, the premise is to addADTS
Header 5.2. Pass the data to the callback queue for the caller to processdecoding
The process of - Release blockBuffer and sampleBuffer
- To obtain
- Check whether the audio converter is created successfully.
- AAC audio format
ADIF
: Audio Data Interchange Format Audio Data Interchange Format. Commonly used inDisk file
In the.ADTS
: Audio Data Transport Stream Used forNetwork transmission
Scenario.- Code implementation adds ADTS header
- AAC encoding callback: will
Decoded data
(cachedCCAudioEncoder
In instance) toAudioBufferList
In the.
- Two queues:
-
Audio AAC decoding tool class
- Two queues:
Decoding the queue
Β +ΒThe callback queue
- The audio converter is initialized
- Output PCM configuration
AudioStreamBasicDescription
- Input parameter AAC configuration
AudioStreamBasicDescription
AudioFormatGetProperty
Fill in the output informationkAudioFormatProperty_FormatInfo
- Gets the description of the decoder
AudioClassDescription
- Only the incoming
kAppleSoftwareAudioCodecManufacturer
- Only the incoming
AudioConverterNewSpecific
To create the converter
- Output PCM configuration
- Preparation before decoding:
- Custom struct
CCAudioUserData
, including- Data address
- Data size
- Number of data channels (Number of sound channels)
AudioStreamPacketDescription
Description of the packet in the data buffer
- Used for
Record aAC information
And asDecode the callback function
The parameters of the
- Custom struct
- AAC decoding process
- Audio converter not created, return directly
- Decode queue for asynchronous processing
- configuration
CCAudioUserData
, the cache records the aAC information - Set output size
pcmBufferSize
And the number of packetpcmDataPacketSize
malloc
Create a temporary container PCM,memset
Initialize the space- The output buffer
AudioBufferList
The configuration of the - Output description
AudioStreamPacketDescription
Structure initialization AudioConverterFillComplexBuffer
Configure the fill function to get the output data- Convert the output data to
NSData
Format, callback queue asynchronous delegate out
- configuration
- Decoded callback: will cache
CCAudioUserData
To fill in the parameters of the callback functionAudioStreamPacketDescription
andAudioBufferList
Among the
- Two queues:
-
Audio PCM playback: refer to the specific code.