On iOS, all audio frameworks are implemented based on AudioUnit. Higher-level Audio frameworks include: Media Player, AV Foundation, OpenAL, and Audio Toolbox, all of which package AudioUnit and then provide higher-level apis (interfaces with fewer features and more single responsibilities).

Developers can use the AudioUnit API when developing audio and video related products that require a higher degree of control, performance, and flexibility for audio and video, or when they want to use special features (echo cancellation). As described in the Apple documentation, AudioUnit provides fast modular processing of audio, and is better suited to use AudioUnit than a high-level audio framework in the following scenarios.

  • You want to use low latency audio I/O(input or output), for example in VoIP scenarios.
  • Multichannel sound synthesis and playback, such as games or musical instrument synthesis applications.
  • Use the unique features provided in AudioUnit, such as echo cancellation, Mix two-track audio, and effectors such as equalizer, compressor, and reverberator.
  • Audio processing modules can be assembled into flexible graphical structures, which are required for audio processing. Apple provides this API for audio developers.

When building an AudioUnit, you need to specify the Type, subtype, and manufacturer. The Type is the Type of the four types of AudioUnit; The subtype is the subtype of the large type (for example, Effect has subtypes such as EQ, Compressor, limiter, etc.). In general, the Manufacture is fixed and can be written as kAudioUnitManufacturer_Apple. Using these three variables, developers can describe an AudioUnit completely, such as creating an AudioUnit of type RemoteIO with the following code:

AudioComponentDescription ioUnitDescription;
ioUnitDescription.componentType = kAudioUnitType_Output;
ioUnitDescription.componentSubType = kAudioUnitSubType_RemoteIO;
ioUnitDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
ioUnitDescription.componentFlags = 0;
ioUnitDescription.componentFlagsMask = 0;
Copy the code

The appeal code constructs the structure of the AudioUnit description of type RemoteIO, so how do you use this description to construct the real AudioUnit? There are two ways to do this. The first way is to create an AudioUnit naked directly. The second way is to build with AUGraph and AUNode(an AUNode is a wrapper around an AudioUnit). Here are the two ways.

(1) Raw creation mode

First, find the actual AudioUnit type based on the description of the AudioUnit:

AudioComponent ioUnitRef = AudioComponentFindNext(NULL,&ioUnitDescription);
Copy the code

Then declare an AudioUnit reference:

AudioUnit ioUnitInstance;
Copy the code

Finally, create an AudioUnit instance based on type:

AudioConponentInstanceNew(isUnitRef,&ioUnitInstance);
Copy the code

(2) AUGraph creation method

First declare and instantiate an AUGraph:

AUGraph processingGraph;
NewAUGraph(&processingGraph);
Copy the code

Then add an AUNode to AUGraph as described by AudioUnit:

AUNode ioNode;
AUGraphAddNode(processingGraph,&ioUnitDescription,&isNode);
Copy the code

Next, open AUGraph. In fact, opening AUGraph is also an indirect instantiation of all AuNodes in AUGraph. Note that the entire AUGraph must be opened before getting the AudioUnit, otherwise we won’t be able to get the correct AudioUnit from the corresponding AUNode:

AUGraphOpen(processingGraph);
Copy the code

Finally get the AudioUnit application from a Node in AUGraph:

AudioUnit ioUnit;
AUGraphNodeInfo(processingGraph,ioNode,NULL,&ioUnit);
Copy the code

General parameter Settings for AudioUnit

This section uses RemoteIO as an example to explain how to set AudioUnit parameters. RemoteIO is a Unit related to hardware I/O. It is divided into Input and Output (I represents Input and O represents Output). The input terminal is usually a microphone, and the output terminal is usually a Speaker or earphone. If you want to use both input and output, such as the earback feature in the Karaoke app (when the user sings or speaks, the earphone plays back the voice recorded by the microphone, so that the user can hear their own voice), you need to make some Settings to connect them.

In the figure above, RemoteIO Unit is divided into Element0 and Element1, where Element0 controls the Output and Element1 controls the Input, and each Element is divided into Input Scope and Output Scope. If you want to use the Speaker’s sound playback capability, you must connect the Unit’s Element0 OutputScope to the Speaker. To use the microphone’s recording capabilities, the developer must connect the Unit’s Element1 InputScope to the microphone. The code to use the speaker is as follows:

OSStatus status = noErr; UInt32 oneFlag = 1; UInt32 busZero = 0; // Element 0 status = AudioUnitSetProperty(remoteIOUnit,kAudioOutputUnitProperty_EnableIO,kAudioUnitScope_output,busZero,&oneFlag,sizeof(oneFl ag)); CheckStatus(status,@"Could not Connect To Speaker",YES);
Copy the code

This code connects RemoteIOUnit’s Element0 OutputScope to the Speaker, and returns a value of type OSStatus. You can use the custom CheckStatus function To determine errors and print the prompt Could not Connect To the Speaker. The CheakStatus function is as follows:

static void CheckStatus(OSStatus status,NSString *message,BOOL fatal)
{
      if(status ! = noErr) { char fourCC[16]; *(UInt32 *)fourCC = CFSwapInt32HostToBig(status); fourCC[4] ='\ 0';
              if(isprint(fourCC[0]) && isprint(fourCC[1]) && isprint(fourCC[2]) && isprint(fourCC[3]))
                    NSLog(@"%@:%s",message,fourCC);
              else
                    NSLog(@"%@:%d",message,(int)status);
              if(fatal)
                    exit(1); }}Copy the code

Let’s look at the code for starting the microphone:

UInt32 busOne = 1; // Element 1 AudioUnitSetProperty(remoteIOUnit,kAudioOutputUnitProperty_EnableIO,kAudioUnitScope_input,busOne,&oneFlag,sizeof(oneFlag ));Copy the code

The above code connects the InputScope of Element1 of RemoteIOUnit to the microphone. The Audio Stream Format is the same as the Audio Stream Format. The Audio Stream Format is the same as the Audio Stream Format.

UInt32 bytesPerSample = sizeof(Float32);
AudioStreamBasicDescription asbd;
bzero(&asbd,sizeof(asbd));
asbd.mFormatID = kAudioFormatLinearPCM;
asbd.mSampleRate = _sampleRate;
asbd.mChannelsPerFrame = channels;
asbd.mFramesPerPacket = 1;
asbd.mFormatFlags = kAudioFormatFlagsNativeFloatPacked | kAudioFormatFlagIsNonInterleaved;
asbd.mBitsPerChannel = 8*bytesPerSample;
asbd.mBytesPerFrame = bytePerSample;
asbd.mBytesPerPacket = bytesPerSamele;
Copy the code

Above this code shows how to populate AudioStreamBasicDescription structure, actually do in iOS development of audio and video for a long time will know: Both audio and video apis are exposed to a number of StreambasicDescriptions, which describe the specific format of audio and video. Let’s take a closer look at how the above code is formatted.

  • The mFormatID parameter can be used to specify the encoding format of the audio. In this case, the encoding format of the audio is PCM.
  • The next step is to set the sampling rate of the sound, the number of channels and the number of frames per Packet.
  • MFormatFlags is the argument used to describe the sound representation format. The first argument in the code specifies that each sample is represented as a Float, similar to the two bytes (SInt16) used for each sample. The actual audio data will be stored in the mBuffers[0] variable of an AudioBufferList structure. The right channel will be in mBuffers[1]; If mFormatFlags specifies Interleaved, the left and right channels are Interleaved in mBuffers[1].
  • The following mBitsPerChannel represents the number of bits of audio data in a channel. As mentioned earlier, Float is used for each sample, so 8 times the number of bytes per sample is used to assign the value.
  • Finally, the parameters mBytesPerFrame and mBytesPerPacket are assigned according to the value of mFormatFlags. In the case of NonInterleaved, bytesPerSamele is assigned (because the left and right channels are stored separately), That’s how many bytes a Frame has.

Now that we have the BasicDescription structure completely constructed, let’s set this structure to the corresponding AudioUnit as follows:

AudioUnitSetProperty(remoteIOUnit,kAudioOutputUnitProperty_StreamFormat,kAudioUnitScope_output,1,&asbd,sizeof(asbd));
Copy the code

The classification of AudioUnit

With the general Settings of AudioUnit covered, this section introduces the categories of AudioUnit. IOS classifies an AudioUnit into five types based on its usage. This section introduces each type and its subtypes from a global perspective, and describes their usage and the meanings of corresponding parameters.

(1) Effect Unit

The type is kAudioUnitType_Effect, which provides the function of sound effects processing. The subtypes and uses are described below.

  • Equalizer: The subtype is kAudioUnitSubType_NBandEQ, which is mainly used to increase or decrease the energy of certain frequency bands of sound. This equalizer needs to formulate multiple frequency bands, and then set the width and gain for each band. Finally, it will change the energy distribution of sound in the frequency domain.
  • Compression effector: The subtype is kAudioUnitSubType_DynamicsProcessor, which is mainly used to raise the sound energy when the sound is low. When the sound energy is set to a threshold value, it can lower the sound energy. Of course, the acting time, release time and trigger value should be set reasonably. So that you can eventually compress the energy of the sound in the time domain to a certain range.
  • Reverberation effector: The subtype is kAudioUnitSubType_Reverb2, which is a very important effector for human voice processing. Imagine yourself in an empty house. If there are a lot of reflected sounds on top of the original sound, it may sound more powerful, but at the same time, the original sound will become more blurred. The details of the original sound are obscured, so the size of the reverb Settings can be very different for different people, and can be changed to suit their own preferences. Of course, there are many sub-types of effectors under the Effect Unit, such as HighPass, LowPass, BandPass, Delay, Limiter, etc. You can try them by yourself. Feel the effects of each.

(2) Mixer Units

The type is kAudioUnitType_Mixer, which mainly provides the function of Mix multichannel sound. The subtypes and uses are as follows.

  • 3D Mixer: This Mixer is not available on mobile devices and is only available on OS X, so it will not be covered here.
  • MultiChannelMixer: The subtype is kAudioUnitSubType_MultiChannelMixer, which is an effect mixer for multichannel sound mixing. It can receive the input of multichannel audio, adjust the gain and switch of each channel separately, and combine the multichannel audio into one channel. The effector is very useful in dealing with the graphic structure of audio.

(3) I/O Units

The type is kAudioUnitType_Output, which serves as the name of the class and mainly provides I/O functionality. The subtypes and uses are described below.

  • RemoteIO: Subtype is kAudioUnitSubType_RemoteIO. As the name suggests, this AudioUnit is used to collect and play audio, and is used by developers when using microphones and speakers in their application scenarios.
  • GenericOutput: The subtype is kAudioUnitSubType_GenericOutput, which is used when the developer needs to do offline processing, or when speakers are not available in AUGraph to drive the entire data stream, This type is used when you want to use an output (which can be put into a memory queue or do disk I/O) to drive the data.

(4) Format Converter Units

The kAudioUnitType_FormatConverter type is used to convert samples from Float to SInt16, interlaced and tiled formats, single-channel and dual-channel formats, etc. Its subtypes and uses are described below.

  • AUConverter: The subtype is kAudioUnitSubType_AUConverter, a format conversion effector. When some effector has explicit requirements for the input audio format, or when the developer inputs the audio data to some other encoder for encoding, The ConverterNode is needed for scenarios where developers want to use SInt16 PCM raw data to perform audio algorithms on other cpus, etc. For a typical scenario, we have a custom audio player where the PCM data decoded by FFmpeg is in SInt16 format and therefore cannot be directly output to RemoteIO Unit until it is played back.
  • Time Pinch: The subtype is kAudioUnitSubType_NewTimePitch, which adjusts the pitch and speed of a sound.

(5) Generator Units

The type is kAudioUnitType_Generator, which is often used in development to provide player functionality, and its subtype and purpose are described below.

  • AudioFilePlayer: The subtype is kAudioUnitSubType_AudioFilePlayer. In AudioUnit, if our input is not a microphone, we want to actually use a media file. It should be noted that AUGraph must be initialized before the AudioFilePlayer’s data source and play range are configured, otherwise there will be errors. In fact, the data source will still call the decoding function of AudioFile to decompress compressed data in the media file into PCM bare data. The film is then sent to the AudioFilePlayer Unit for subsequent processing.

Construct an AUGraph

In the actual karaoke application, the voice emitted by the user will be processed, and immediately give the user an ear echo (within 50ms, the voice will be output to the second level, so that the user can hear). So how can RemoteIOUnit use the sound collected by the microphone, processed by the intermediate effector, and finally output to the Speaker to play to the user? Here’s how to manage the entire process of sound acquisition, sound processing, and sound output using AUGraph.

The first thing to know is that the data that can be passed through the channel is driven by the right-most Speak(RemoteIO Unit), which requests data from its upper level, AUNode, which then requests data from its previous level, and finally from Element1(microphone) of RemoteIOUnit. This allows the data to be passed level by level in the opposite direction, eventually passing to Element0(that is, Speaker) of RemoteIOUnit and playing back to the user. And of course who should drive it when you want to do it offline? In fact, AudioUnit with the Generic Output face type under Mixer Unit large type should be used as the driver for offline processing. So how do AudioUnit or AuNodes connect? There are two ways to do this. The first way is to connect auNodes directly. The second method is to connect auNodes via a callback.

(1) Direct connection

AUGraphConnectNodeInput(mPlayerGraph,mPlayerNode,0,mPlayerIONode,0);
Copy the code

Connect the AudioFile Player Unit to the RemotelIO Unit. When the Remote Unit needs to play data, the AudioFilePlay Unit will be called to retrieve the data.

(2) The way of callback

AURenderCallbackStruct renderProc;
renderProc.inputProc = &inputAvailableCallback;
renderProc.inputProcRefCon = (__bridge void *)self;
AUGraphSetNodeInputCallback(mGraph,ioNode,0,&finalRenderProc);
Copy the code

This code first constructs an AURenderCallBack structure and sets it to the input of RemoteIO Unit. The callback function will be called when RemoteIO Unit needs data input.

static OSStatus renderCallback(void *inRefCon,AudioUnitRenderActionFlags *ioActionFlags,const AudioTimeStamp *inTimeStamp,UInt32 inBusNumber,UInt32 inNumberFrames,AudioBufferList *ioData)
{
      OSStatus result = noErr;
      _unsafe_unretained AUGraphRecoder *THIS = (__bridge AUGraphRecorder *)inRefCon;
      AudioUnitRender(THIS->mixerUnit,ioActionFlags,inTimeStamp,0,isNumberFrames,ioData);
      result = ExtAudioFileWriteAsync(THIS->finalAudiofile,inNumberFrames,ioData);
      return result;
}
Copy the code

This callback does two things: The first thing is to go to Mixer Unit to ask for data, drive Mixer Unit to obtain data by calling AudioUnitRender, and then put the data into ioData to fill in the parameters in the method. Mixer Unit was connected with RemoteIO Unit. The second thing is to encode the sound using ExtAudioFile and write it to a file on your local disk.

The sample code

Here (github.com/Nicholas86/…). Is the code.