WebRTC consists of three modules: voice engine, video engine and network transmission. Among them, voice engine is one of the most valuable technologies in WebRTC, realizing a series of processing processes such as audio data acquisition, pre-processing, coding, sending, receiving, decoding, mixing, post-processing and playing.
Audio working mechanism
Audio Engine core class diagram:
1. AudioDeviceModule is responsible for the hardware layer, including audio data collection and playback, and hardware operations.
3. Audio 3A processor AudioProcessing is mainly responsible for pre-processing audio data, including echo cancellation (AEC), automatic gain control (AGC), and noise suppression (NS). APM is divided into two streams, a proximal stream and a distal stream. Near-end stream refers to data coming in from the microphone; A far-end stream is the received data.
5. AudioEncodeFactory includes Opus, iSAC, G711, G722, iLBC, L16 and other CODec.
Audio workflow flow chart:
2. The initiator sends the collected sound signal to the APM module for echo cancellation AEC, noise suppression NS, and automatic gain control for AGC
4. The initiator sends the encoded data through the RtpRtcp transmission module and transmits it to the receiver through the Internet
6. The receiver sends the processed audio data to the sound card for playback
NetEQ module is the core module of Webrtc speech engine
Audio data flow
Based on the audio workflow flow chart introduced above, we will continue to refine the audio data flow. The role of AudioTransportImpl, the data flow center, in this process will be highlighted.
RecordDataIsAvailbale internal main process:
-
The audio data collected by hardware is directly resampled to the transmission sampling rate
-
The audio data after resampling is processed by audio pre-processing
-
VAD treatment
-
Digital gain adjusts acquisition volume
-
Audio data is called back externally for external pre-processing
-
All the audio data to be sent by the mixer sender, including the collected data and the accompanying data
-
Calculate the energy value of audio data
-
Distribute it to all Streams that send it
-
Remix all Streams’ received audio data
-
Under specific conditions, noise injection is carried out for the acquisition side as a reference signal
-
Mix local audio
-
Digital gain adjusts playback volume
-
Audio data is called back externally for external pre-processing
-
Calculate the energy value of audio data
-
Resampling the audio to the sampling rate of the requested output
-
The audio data is fed to the APM for reference signal processing
The AudioDeviceModule plays and collects data. The AudioDeviceBuffer always brings in or sends out 10 ms of audio data. For platforms that do not support collecting and playing 10 ms of audio data, a FineAudioBuffer is also inserted into the platform AudioDeviceModule and AudioDeviceBuffer. Used to convert the platform’s audio data format to 10 ms of webrTC-capable audio frames. In the AudioDeviceBuffer, the number of sampling points and sampling rate corresponding to the audio data from the current hardware device will be calculated periodically for 10s, which can be used to detect a working status of the current hardware.
Audio related changes
-
The realization of audio Profile supports Voip and Music scenarios, and realizes the comprehensive technical strategy of sampling rate, coding rate, coding mode and number of sound channels. IOS achieves the separation of acquisition and playback threads, and supports dual-channel playback.
-
Compatibility Of audio 3A parameters Delivers adaptation solutions.
-
Headset adaptation, Bluetooth headset adaptation and common headset adaptation, dynamic 3A switching adaptation.
-
Noise_Injection algorithm, as a reference signal, plays a particularly important role in echo cancellation in headphone scenes.
-
Supports local audio files file and network audio files HTTP&HTTPS.
-
The implementation of Audio Nack, which improves the anti-packet loss capability of Audio, is currently under way In- Band FEC.
-
Audio processing optimization in single and double talk.
-
IOS built-in AGC Research:
(2) The microphone hardware gain of different models is different, iPhone 7 Plus > iPhone 8 > iPhone X; Therefore, when both software AGC and hardware AGC are turned off, the sound heard by the remote end is different in size.
(4) On most iOS models, the volume of input will decrease in the external mode “After the earphone is inserted again”. The current solution is to add a preGain to bring the input volume back to normal after the headset is inserted again.