After Windows Vista, the Audio system has changed a lot compared to the previous system, resulting in a new set of underlying APIs called Core Audio APIs. This low-level API provides services for high-level apis such as Media Foundation(which will replace high-level apis such as DirectShow). The system API has the characteristics of low delay, high reliability and security.

This article mainly introduces the use of the API in real-time audio and video scenarios.

Composition of Core Audio APIs: MMDevice, EndpointVolume, WASAPI, etc. For real-time audio and video systems, MMDevice and EndpointVolume apis are mainly used. Its position in the system is shown as follows:

My simple use of audio equipment in real-time audio and video can be divided into:

1. Device list management

2. Initialize the device

3. Equipment function management

4. Data interaction

5. Volume management

6. Device terminal monitoring

Next, we will introduce the implementation of relevant functions:

1. Device list management

The management of audio devices is realized by the MMDevice API.

We’ll start by creating a IMMDeviceEnumerator object to start calling the related functions.

IMMDeviceEnumerator* ptrEnumerator; CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), reinterpret_cast<void**>(&ptrEnumerator)); And through IMMDeviceEnumerator can be achieved: Obtain the default device GetDefaultAudioEndpoint, IMMDeviceCollection, GetDevice, and IMMNotificationClient.Copy the code
With these methods, we can get the system default device, traverse the device list, open the specified device, and listen for device changes. This enables device management functions in real-time audio and video.

2. Initialize the device

The startup of audio equipment is an important node of the reliability of the whole audio module. According to device type and device data capture mode, we can divide into three types of devices: microphone acquisition, speaker playback, and speaker acquisition.

First we need an IMMDevice object, available in device Management.

IMMDevice* pDevice;
//GetDefault
ptrEnumerator->GetDefaultAudioEndpoint((EDataFlow)dir,                                              (ERole)role/* eCommunications */,                                              &pDevice);
//Get by path
ptrEnumerator->GetDevice(device_path, &pDevice);
//GetIndex
pCollection->Item(index, &pDevice);Copy the code
Then IAudioClient is obtained through IMMDevice, and the formatting and initialization of the device are realized through the IAudioClient object. Generally, they are turned on in shared mode, in which microphone acquisition and speaker broadcast process data in event-driven mode, while speaker acquisition drives data processing in loopback mode. A simple example is as follows:

//mic capturer
ptrClient->Initialize(
AUDCLNT_SHAREMODE_SHARED,
AUDCLNT_STREAMFLAGS_EVENTCALLBACK |  
AUDCLNT_STREAMFLAGS_NOPERSIST, 
0, 
0,  
 (WAVEFORMATEX*)&Wfx,
NULL); 
//playout render
ptrClient->Initialize(
AUDCLNT_SHAREMODE_SHARED,
AUDCLNT_STREAMFLAGS_EVENTCALLBACK, 
0, 
0,  
 (WAVEFORMATEX*)&Wfx,
NULL); 
//playout capturer
ptrClient->Initialize(
AUDCLNT_SHAREMODE_SHARED,
AUDCLNT_STREAMFLAGS_LOOPBACK, 
0, 
0,  
 (WAVEFORMATEX*)&Wfx,
NULL); Copy the code
The Wfx format parameters is equipment, generally in order to ensure the availability, use the default format (through the IAudioClient: : GetMixFormat access), if you need to use the custom format, Can through the IAudioClient: : IsFormatSupported method tries to traverse device support format.

3. Equipment function management

For microphone equipment, we usually need to process its data. Some hardware devices and systems support built-in noise reduction, gain, echo cancellation and other functions. However, under the general Windows system, devices are complicated and uncontrollable, and software algorithms are mostly used for processing. To check whether the device uses the built-in processing function and related parameters, use the Topology module.

IDeviceTopology* pTopo;
pDevice->Activate(__uuidof(IDeviceTopology), CLSCTX_INPROC_SERVER, 0,&pTopo);Copy the code
With IDeviceTopology, we can traverse IConnector objects, obtain IAudioAutoGainControl, IAudioVolumeLevel and other capability objects, and process related capabilities.

Note: IConnector may be nested in a loop, so you need to identify the type of the member object IPart while iterating through the IPart of IConnector.

4. Data interaction

During device initialization, we selected different startup modes according to different devices. Different devices have different data drives in their respective modes:

Microphone acquisition:

Speaker playback:

Speaker acquisition:



During data interaction with the device, we need to obtain the corresponding service object based on the data acquisition mode to obtain device data. In the collection part, IAudioCaptureClient service is used to obtain device data, and IAudioRenderClient service is used to obtain device data incoming pointer for playback. The following is an example:

//capturer IAudioCaptureClient* ptrCaptureClient; //audioin or audioout ptrClient->GetService(__uuidof(IAudioCaptureClient), (void**)&ptrCaptureClient); {//work thread //Wait Event ptrCaptureClient->GetBuffer( &pData, // packetwhich is ready to be read by used
	&framesAvailable,  // #frames in the captured packet (can be zero)
	&flags,            // support flags (check)
	&recPos,    // device position of first audio frame indata packet &recTime); // value of performance counter at the time of recording //pData processing ptrCaptureClient->ReleaseBuffer(framesAvailable); } //render IAudioRenderClient* ptrRenderClient; //audioout ptrClient->GetService(__uuidof(IAudioRenderClient), (void**)&ptrRenderClient); {//work thread BYTE* pData; //form buffer UINT32 bufferLength = 0; ptrClient->GetBufferSize(&bufferLength); UINT32 playBlockSize = nSamplesPerSec / 100; //Wait Event UINT32 padding = 0; ptrClient->GetCurrentPadding(&padding);if(bufferLength - padding > playBlockSize) { ptrRenderClient->GetBuffer(playBlockSize, &pData); //request and getdata ptrCaptureClient->ReleaseBuffer(playBlockSize, 0); }}Copy the code
In actual data interaction, separate threads are required to process GetBuffer and ReleaseBuffer. The collecting and speakers, microphones are event driven by equipment, can be set in the device initialization complete response of event handlers (the IAudioClient: : SetEventHandle).

In the whole audio and video system, the device data thread also needs to count the data processing time, the size of the cache for collecting and playing, etc., and the user listens to check the device status and aeC delay calculation.

5. Volume management

Generally, volume management processes the volume of the current device only after the device is selected. Therefore, IAudioEndpointVolume is generally used. This object is obtained from the IMMDevice device object:

IAudioEndpointVolume* pVolume;
pDevice->Activate(__uuidof(IAudioEndpointVolume), CLSCTX_ALL, NULL,                           reinterpret_cast<void**>(&pVolume));Copy the code
With the IAudioEndpointVolume object, we can handle volume controls for the current device:

pVolume->GetMasterVolumeLevelScalar(&fLevel);
pVolume->SetMasterVolumeLevelScalar(fLevel, NULL);Copy the code
Mute control:

BOOL mute;
pVolume->GetMute(&mute);
pVolume->SetMute(mute, NULL);Copy the code
And registered IAudioEndpointVolumeCallback to monitor the volume status:

IAudioEndpointVolumeCallback* cbSessionVolume; //need todo
pVolume->RegisterControlChangeNotify(cbSessionVolume);Copy the code

6. Device terminal monitoring

IAudioSessionEvents is generally used to listen to the following operations:

IAudioSessionControl* ptrSessionControl;
ptrClient->GetService(__uuidof(IAudioSessionControl), (void**)&ptrSessionControl);
IAudioSessionEvents* notify;
ptrSessionControl->RegisterAudioSessionNotification(notify);Copy the code
This callback listener can listen for the connection status of the device, name changes, etc.

Some considerations:

1. Thread priority

In the actual project development process, we need to deal with the audio thread worker thread. Usually by calling the system module avrt.dll, the function under it is dynamically called, and the calling thread is associated with the specified task (Pro Audio). The code:

Function binding:

avrt_module_ = LoadLibrary(TEXT("Avrt.dll"));
if (avrt_module_)
{
	_PAvRevertMmThreadCharacteristics = (PAvRevertMmThreadCharacteristics)GetProcAddress(avrt_module_, "AvRevertMmThreadCharacteristics");
	_PAvSetMmThreadCharacteristicsA = (PAvSetMmThreadCharacteristicsA)GetProcAddress(avrt_module_, "AvSetMmThreadCharacteristicsA");
	_PAvSetMmThreadPriority = (PAvSetMmThreadPriority)GetProcAddress(avrt_module_, "AvSetMmThreadPriority");
}Copy the code
In actual data processing thread association:

hMmTask_ = _PAvSetMmThreadCharacteristicsA("Pro Audio", &taskIndex);
if (hMmTask_)
{
	_PAvSetMmThreadPriority(hMmTask_, AVRT_PRIORITY_CRITICAL);
}Copy the code
Through task binding, the reliability of audio data processing thread can be improved effectively.

2. Worker threads

Device initialization and release operations need to be handled in a unified thread. Some SYSTEM COM objects need to be released in the creation thread when they are released, otherwise the release crash may occur. And some volume selection, monitoring and other processing can be handled in the user thread, but need to do a good job of multi-threaded security.

3. Equipment format selection

When selecting a format such as the sampling rate or sound channel, if you need to use a customized format, the format may fail to match or the device may fail to initialize after selecting a matching format. Generally, the default startup format is used in such scenarios.

4. Abnormal data processing

When the data processing thread processes audio data, event response timeout, device object exception and so on usually occur. The usual approach is to exit the data thread and terminate the device, then check whether the current device is functional, and then restart the current device or choose the default device.