When I was looking at the LFLiveKit code, I saw that the audio part was made by audioUnit, so I learned about audioUnit. To sum up, it includes several parts: playing, recording, audio file writing, audio file reading.

The VideoGather library is VideoGather, which includes audioUnitTest, and singASong, which combines various audio processing components to create a ‘play an accompaniment + singASong’.

### Basic understanding

In the official document of AudioUnitHostingFundamentals are a few good figure:

For a general-purpose audioUnit, there can be 1-2 input and output streams that are not necessarily equal. For example, a mixer can combine two audio inputs into a single audio stream output. Each element represents an audio processing context, also known as a bus. Each element has an Output and an Output part, called scope, which are input scope and Output scope respectively. The Global Scope determines that there is only one element, element0, and some properties can only be set on the Global Scope.

For an audioUnit of type remote_IO, that is, an audioUnit that collects and outputs from hardware to hardware, the logic is fixed: fix the 2 elements, the microphone goes through element1 to APP, and APP goes through element0 to the speaker.

What we can control is the “in-App processing” part in the middle. Combined with the figure above, the light yellow part is APP controllable. The component Element1 is responsible for linking the microphone and APP. Element0 is responsible for connecting APP and speaker, input part APP control, output part system control.

This figure shows a complete recording + mixing + playback process. Set the stream format on both sides of the component. The concept in the code is scope.

File to read

Demo is in the TFAudioUnitPlayer class and plays audioUnit that requires audio files to be read and output.

File read using ExtAudioFile, this as far as I know, there are two important: 1. 2. Only PCM is processed.

Not only ExtAudioFile, but also other audioUnit, actually should be the nature of streaming data processing, these components are “input + output” this mode of operation, this mode determines you want to set the output format, output format, etc.

  • ExtAudioFileOpenURL The audio format in the ExtAudioFile file is saved in the file, no setting, but can be read out, such as sampling rate used for subsequent processing.

  • Format the output

   AudioStreamBasicDescription clientDesc;
   clientDesc.mSampleRate = fileDesc.mSampleRate;
   clientDesc.mFormatID = kAudioFormatLinearPCM;
   clientDesc.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
   clientDesc.mReserved = 0;
   clientDesc.mChannelsPerFrame = 1; //2
   clientDesc.mBitsPerChannel = 16;
   clientDesc.mFramesPerPacket = 1;
   clientDesc.mBytesPerFrame = clientDesc.mChannelsPerFrame * clientDesc.mBitsPerChannel / 8;
   clientDesc.mBytesPerPacket = clientDesc.mBytesPerFrame;
Copy the code

PCM is an unencoded, uncompressed format that is more processable, so output this format. First format with AudioStreamBasicDescription this structure description, here includes audio related knowledge:

  • SampleRate SampleRate: indicates the number of samples per second

  • Frame: Each sampled data corresponds to one frame

  • MChannelsPerFrame: people’s two ears have different feelings for the unified sound source to bring about distance positioning. Multi-channel is also for three-dimensional feeling. Each channel has separate sampling data, so one more channel means one more batch of data.

  • Finally, the data format of each sampled single channel is determined by mFormatFlags and mBitsPerChannel. MBitsPerChannel is the data size, that is, the sampling bit depth. The larger the value range is, the larger the value range is, and data overflow is not easy. MFormatFlags contains symbols, integers or floats, big endian or small endian, etc. A signed number is plus or minus, sound is a wave, vibration is plus or minus. The s16 format is used, that is, the signed 16-bit integer format.

  • SampleRate is sampled multiple times per second, with each frame sampled one time. Each frame has mChannelsPerFrame samples, and each sample has mBitsPerChannel data. So all the other data sizes can be calculated using these. Of course, the premise is that the data is not encoded and compressed

  • Set format:

   size = sizeof(clientDesc);
   status = ExtAudioFileSetProperty(audioFile, kExtAudioFileProperty_ClientDataFormat, size, &clientDesc);
Copy the code

On the APP side is the client, on the file side is the file, and the client means setting the properties of the APP side. Test mp3 file reading, is able to change the sampling rate, that is, MP3 file sampling rate is 11025, can directly read output 44100 sampling rate data.

  • Read the dataExtAudioFileRead(audioFile, framesNum, bufferList)FramesNum input is the number of frames that you want to read, output is the number of frames that you actually read, and output that data to the bufferList. The mData of the AudioBuffer in bufferList needs to be allocated memory.

play

Use AudioUnit play, first by the three related things: AudioComponentDescription, AudioComponent and AudioComponentInstance. AudioUnit and AudioComponentInstance are the same thing, and typedef is just an alias for the definition.

AudioComponentDescription is described, which is used to make components of the filter, similar to the SQL statement after the where.

An AudioComponent is an abstraction of a component, like the concept of a class, that uses AudioComponentFindNext to find a component that matches a condition.

AudioComponentInstance is component, like the concept of object, using AudioComponentInstanceNew build.

After building audioUnit, set the properties:

  • KAudioOutputUnitProperty_EnableIO: Enables I/O. The default is element0, which is IO from APP to speaker, turned on, while element1, which is IO from microphone to APP, is turned off. useAudioUnitSetPropertyThe setAttribute () function takes several arguments to:
    • 1. AudioUnit to set
    • 2. Attribute name
    • 3. Element, element0, or element1, depending on whether you want to receive audio or play
    • Here is play, we want to open the output channel to the system, using kAudioUnitScope_Output
    • 5. Value to be set
    • 6. Value size.

The tricky ones are element and scope, and you need to understand how audioUnit works, the first two diagrams.

  • AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, renderAudioElement, &audioDesc, sizeof(audioDesc)); , using AudioStreamBasicDescription structure data format. The output part is system control, so don’t bother.

  • Then you set up how to provide the data. Here’s how it works: After audioUnit is turned on, the system plays a piece of audio data and an audioBuffer. When the audio is finished, it calls back to the APP and asks for the next piece of data. The loop continues until you close the audioUnit. The point is:

    • 1. The system takes the initiative to ask for data from you, not our program
    • 2. Through the callback function. For example, the APP side is the factory, while the system is the store. When they are out of stock or about to be out of stock, they will come to purchase from us until your factory closes down or stops selling

So set the playback callback function:

AURenderCallbackStruct callbackSt;
   callbackSt.inputProcRefCon = (__bridge void * _Nullable)(self);
   callbackSt.inputProc = playAudioBufferCallback;
AudioUnitSetProperty(audioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Group, renderAudioElement, &callbackSt, sizeof(callbackSt));
Copy the code

The data type passed in is the AURenderCallbackStruct structure, whose inputProc is the callback function, and inputProcRefCon is the parameter passed to inRefCon when the callback function is called. This is a common design of the callback mode. In other places it might be called context. So if I pass in self, I can get the current player object, get the audio data, etc.

The callback function

The main purpose of the callback function is to assign a value to the ioData, to the AudioBufferList of the audio data that you want to play. In combination with the above audio file reading, ExtAudioFileRead can be used to read data to achieve audio file playback.

The playback function itself does not depend on the data source, because a callback function is used, so either a file or a remote data stream can be played.

The recording

Recording class TFAudioRecorder, file writing class TFAudioFileWriter and TFAACFileWriter. In order to combine audio processing components more freely, the TFAudioOutput class and the TFAudioInput protocol are defined. TFAudioOutput defines some methods to output data, while TFAudioInput receives data.

Set in the TFAudioUnitRecordViewController setupRecorder method of a class 4 kinds of testing:

  • The PCM stream is written to the CAF file
  • PCM is written through extAudioFile, which is internally converted to AAC format and written to M4A files
  • PCM goes to aAC stream and is written to adTS file
  • Compare the performance of methods 2 and 3
1. Use audioUnit to obtain recording data

And play, build AudioComponentDescription variables, Use AudioComponentFindNext find audioComponent, again use AudioComponentInstanceNew build a audioUnit.

  • Open the IO:
UInt32 flag = 1; status = AudioUnitSetProperty(audioUnit,kAudioOutputUnitProperty_EnableIO, // use io kAudioUnitScope_Input, // Enable input kInputBus, //element1 is hardware to APP component &flag, // enable, output YES sizeof(flag));Copy the code

Element1 is the element of the system hardware input to the APP, and the value passed in is 1 to indicate that it is on.

  • Set the output format:
AudioStreamBasicDescription audioFormat;
   audioFormat = [self audioDescForType:encodeType];
   status = AudioUnitSetProperty(audioUnit,
                                 kAudioUnitProperty_StreamFormat,
                                 kAudioUnitScope_Output,
                                 kInputBus,
                                 &audioFormat,
                                 sizeof(audioFormat));
Copy the code

In the method of audioDescForType, only AAC and PCM formats are processed. PCM can be calculated by itself or by FillOutASBDForLPCM, a function provided by the system. The logic is the same as that mentioned above. It is easy to understand the relationship between sampling rate, channel, sampling number and so on in audio.

For AAC format, because the encoding is compressed, AAC fixes 1024Frame into a packet. Many attributes are not used, such as mBytesPerFrame, but they must be set to 0 or undefined values may affect.

  • Sets the input callback function
AURenderCallbackStruct callbackStruct;
   callbackStruct.inputProc = recordingCallback;
   callbackStruct.inputProcRefCon = (__bridge void * _Nullable)(self);
   status = AudioUnitSetProperty(audioUnit,kAudioOutputUnitProperty_SetInputCallback,
                                 kAudioUnitScope_Global,
                                 kInputBus,
                                 &callbackStruct,
                                 sizeof(callbackStruct));
Copy the code

Property kAudioOutputUnitProperty_SetInputCallback specifies the input callback, kInputBus being 1 for element1.

  • Open the AVAudioSession
   AVAudioSession *session = [AVAudioSession sharedInstance];
   [session setPreferredSampleRate:44100 error:&error];
   [session setCategory:AVAudioSessionCategoryRecord withOptions:AVAudioSessionCategoryOptionDuckOthers
                  error:&error];
[session setActive:YES error:&error];
Copy the code

AVAudioSessionCategoryRecord or AVAudioSessionCategoryPlayAndRecord can, the latter can broadcast while recording, such as recording APP, playing accompaniment at the same time to record voices.

  • Finally, use the callback function to get the audio data

Build the AudioBufferList and then use AudioUnitRender to get the data. The memory data of AudioBufferList needs to be allocated by ourselves, so we need to calculate the size of buffer based on the number of samples and channels passed in.

2. Write PCM data into the CAF file

In the TFAudioFileWriter class, extAudioFile is used to write audio data. ExtAudioFile = extAudioFile

  • build
OSStatus status = ExtAudioFileCreateWithURL((__bridge CFURLRef _Nonnull)(recordFilePath),_fileType, &_audioDesc, NULL, kAudioFileFlags_EraseFile, &mAudioFileRef);
Copy the code

The parameters are: file address, type, audio format, auxiliary Settings (in this case, remove the file), audioFile variable.

_audioDesc here is to use the – (void) setAudioDesc: AudioStreamBasicDescription audioDesc introduced from the outside world, is the recording of the output data format.

  • write
OSStatus status = ExtAudioFileWrite(mAudioFileRef, _bufferData->inNumberFrames, &_bufferData->bufferList);
Copy the code

After the audio data is received, it is continuously written in the format of AudioBufferList. The intermediate parameter is the number of frames to be written. SampleRate in frame and audioDesc affects the duration calculation of the audio. If frame is incorrectly transmitted, the duration calculation is incorrect.

3. Use ExtAudioFile’s built-in converter to record aAC encoded audio files

Output PCM data from the recorded audioUnit, test can be directly input to ExtAudioFile to record AAC encoded audio files. Set the format when building ExtAudioFile:

AudioStreamBasicDescription outputDesc;
            outputDesc.mFormatID = kAudioFormatMPEG4AAC;
            outputDesc.mFormatFlags = kMPEG4Object_AAC_Main;
            outputDesc.mChannelsPerFrame = _audioDesc.mChannelsPerFrame;
            outputDesc.mSampleRate = _audioDesc.mSampleRate;
            outputDesc.mFramesPerPacket = 1024;
            outputDesc.mBytesPerFrame = 0;
            outputDesc.mBytesPerPacket = 0;
            outputDesc.mBitsPerChannel = 0;
            outputDesc.mReserved = 0;

Copy the code

The key is mFormatID and mFormatFlags, and there is a pit where the useless properties are not reset to 0.

Then create ExtAudioFile: OSStatus status = ExtAudioFileCreateWithURL((__bridge CFURLRef _Nonnull)(recordFilePath),_fileType, &outputDesc, NULL, kAudioFileFlags_EraseFile, &mAudioFileRef);

Format the input: ExtAudioFileSetProperty(mAudioFileRef, kExtAudioFileProperty_ClientDataFormat, sizeof(_audioDesc), &_audioDesc);

ExtAudioFileWrite loop is used to write as PCM, but ExtAudioFileDispose is called to mark the end of write, which may be related to the file format.

4. PCM codes AAC

I’m going to use the AudioConverter, and the demo is written in the TFAudioConvertor class.

  • build

OSStatus status = AudioConverterNew(&sourceDesc, &_outputDesc, &_audioConverter);

As with other components, you need to configure the data format of the input and output. The input is the PCM format of the output of the recording audiounit. Output hope into aac, set mFormatID kAudioFormatMPEG4AAC, mFramesPerPacket set to 1024. And then set the sampling rate to mSampleRate and the number of channels to mChannelsPerFrame, and set everything else to 0. For simplicity, the sampling rate and number of channels can be set to the same as the input PCM data.

After coding data compression, so the output size is unknown, through property kAudioConverterPropertyMaximumOutputPacketSize get output packet size, rely on this to the output buffer for proper memory size.

  • Input and transformation

First of all make sure every conversion data size: bufferLengthPerConvert = audioDesc. MBytesPerFrame * _outputDesc mFramesPerPacket * PACKET_PER_CONVERT;

That is, the size of each frame * the number of frames per packet * the number of pcket converted each time. After each conversion, multiple frames are packaged into a packet, so the number of frames should be a multiple of mFramesPerPacket.

In the receiveNewAudioBuffers method, the input of audio data is constantly accepted. Because the number of bytes received may not be the same or even a multiple of the number of bytes transcoded, the input may have multiple transcodings, and the data left over from the previous input must be considered.

So:

  1. LeftLength records the length of data left after the last transcoding input, and leftBuf retains the last remaining data

  2. Each input merges the data left over from the previous one, and then enters the loop each time converting bufferLengthPerConvert length of data until the remaining ones are insufficient, saving them to leftBuf for the next processing

The conversion function itself is simple: AudioConverterFillComplexBuffer(_audioConverter, convertDataProc, &encodeBuffer, &packetPerConvert, &outputBuffers, NULL);

The parameters are: converter, callback function, value of callback function parameter inUserData, converted packet size, output data.

The data input is handled in the drop function, where the input data is passed in the “value of the inUserData callback parameter”, or the data can be read in the callback.

OSStatus convertDataProc(AudioConverterRef inAudioConverter,UInt32 *ioNumberDataPackets,AudioBufferList *ioData,AudioStreamPacketDescription **outDataPacketDescription,void *inUserData){
    
    AudioBuffer *buffer = (AudioBuffer *)inUserData;
    
    ioData->mBuffers[0].mNumberChannels = buffer->mNumberChannels;
    ioData->mBuffers[0].mData = buffer->mData;
    ioData->mBuffers[0].mDataByteSize = buffer->mDataByteSize;
    return noErr;
}
Copy the code