background

In the scenario of online classroom, sound is one of the most important content transmission channels. Ensuring sound stability and reliability is a very important part of the quality of online classroom. At the same time, many functional modules in the online classroom are associated with sound, so how to deal with the sound conflict between each module has become an important topic.

AVAudioSession

On iOS, it’s hard to talk about sound without AVAudioSession. The role of AVAudioSession is to manage the allocation of audio, the only hardware resource, and adapt to the functional requirements of our APP for audio by tuning the appropriate AVAudioSession. When switching audio scenes, the corresponding switching AVAudioSession is required.

Set the audio session *category, mode, and options *.

    // Get the singleton instance.
    let audioSession = AVAudioSession.sharedInstance()
    do {
        // Set the audio session *category, mode, and options *.
        try audioSession.setCategory(.playback, mode: .moviePlayback, options: [])
    } catch {
        print("Failed to set audio session category.")
    }

Copy the code

AVAudioSession.Category

Categories define a list of audio behaviors, and each Category has a corresponding representation. As described in.mode, you can refine the configuration provided by the Playback, Record, and playAndRecord categories by using audio session patterns. First of all, AVAudioSession divides the scenes using audio into seven categories. By setting Session into different categories, it can be controlled:

  • Whether the App interrupts other App sounds that do not support mixing when the Session is activated
  • Whether to mute the phone when the user triggers the “Mute” button or locks the screen
  • Whether the current status supports recording
  • Whether the current status supports playback

Each App will be set to the default state mentioned above when it starts, that is, other apps will be interrupted and the corresponding “mute” key will play mode. Support for each category can be broken down in the following table:

Category Mute Or lock the screen Yes Mute Or Not Interrupt the App that does not support sound mixing Whether recording and playing are supported

category Whether to mute when pressing “Mute” or locking the screen Whether to interrupt the App that does not support audio mixing Whether to support recording and playing
1 .ambient is no Play only
AudioProcessing (Deprecated) Don’t support
2 .multiRoute no is It can be recorded or played
3 .playAndRecord no Default not to cause It can be recorded or played
4 .playback no The default cause For playback only
5 .record no is For recording only
6 SoloAmbient (default) is is For playback only
  1. . Ambient: Only used for music playback and can be played at the same time with QQ music. For example, if you want to listen to QQ music while playing a game, you can set the background sound to this type. At the same time, the screen will be muted when the user locks the screen or is muted. This category basically uses the background scenes of all apps.

  2. .soloambient: Is also used for playing, but unlike “.ambient”, it does not allow users to listen to QQ music. For example, apps that do not want QQ music to interfere with them are like rhythm masters. Similarly, when the user locks the screen or muting it, it will also mute it. If the screen is locked, you can’t play rhythm master anymore.

  3. .Playback: What if I want to listen to sound after I lock the screen? Using this category, for example, the App itself is the player, and other similar QQ music cannot be played when the App is playing. So this category is usually used for player apps

  4. .Record: with player, must tape recorder, such as wechat voice recording, will use this category, since to quiet recording, certainly do not want to have QQ music, so other playback sound will interrupt. Think of the wechat voice scene to know when to use it.

  5. PlayAndRecord: what mode should I use if I want to playAndRecord? For example, VoIP, phone calls, PlayAndRecord is designed for that scenario.

  6. .multiroute: Imagine a DJ App where the phone is connected to the HDMI speaker to play the current music, and then the next one is played in the headset. This scenario is not understood by most people. This category can support multiple device input and output.

  7. AudioProcessing: Mainly used for audio format processing and can be used together with AudioUnit

AVAudioSession.CategoryOptions

CoreAudio’s approach is to first set a tone for each of the seven and then fine-tune it. CoreAudio provides a few options for fine-tuning each Category.

options Applicable categories role
mixWithOthers PlayAndRecord, Playback, and MultiRoute Whether it can be mixed with other background apps
duckOthers Ambient, PlayAndRecord, Playback, and MultiRoute Whether to suppress the voice of other apps
interruptSpokenAudioAndMixWithOthers
allowBluetooth Record and PlayAndRecord Whether bluetooth headset is supported
allowBluetoothA2DP
allowAirPlay
defaultToSpeaker PlayAndRecord Whether the speaker is used by default
overrideMutedMicrophoneInterruption
  • MixWithOthers: If you do use AVAudioSessionCategoryPlayback achieve a background, however, and would like to QQ music, so this option can be set under the AVAudioSessionCategoryPlayback category in, can realize the coexistence.
  • DuckOthers: In the real-time call scenario, such as QQ Music, when making a video call, the sound of QQ music will be automatically reduced. This option is used to suppress other music apps.
  • AllowBluetooth: This option needs to be set if you want to support Bluetooth headphone
  • DefaultToSpeaker: This option is required if you want to enable the hands-free function by default in VoIP mode

AVAudioSession.Mode

The basic sound performance in the category Settings, using modes to do some complaint actions for the category

model Applicable Categories scenario
default All categories The default audio session mode.
gameChat PlayAndRecord A mode that the GameKit framework sets on an application that uses GameKit’s voice chat service.
measurement PlayAndRecord, record, playback A mode that indicates that your app is performing measurement of audio input or output. The minimum system
moviePlayback playback A mode that indicates that your app is playing back movie content. Video playback
spokenAudio A mode used for continuous spoken audio to pause the audio when another app plays a short audio prompt.
videoChat PlayAndRecord A mode that indicates that your app is engaging in online video conferencing. Video call
videoRecording PlayAndRecord, record A mode that indicates that your app is recording a movie. When you record a video
voiceChat PlayAndRecord A mode that indicates that your app is performing two-way voice communication, such as using Voice over Internet Protocol (VoIP).
voicePrompt A mode that indicates that your app plays audio using text-to-speech.

Each model has its own category, so there are not “77, 49” combinations. If the schema does not exist under the current category, the setting is unsuccessful. After setting a Category, you can use the: availableModes attribute to check the supported attributes and verify the validity.

  • Default: This is the Default mode for each category. If you want to restore it, set it to this mode.
  • VoiceChat: applies to VoIP scenarios. In this case, the system selects the best input device, for example, the microphone on the headset is used to collect data after the headset is plugged in. At this time there is a side effect, he would set categories option for “AVAudioSessionCategoryOptionAllowBluetooth” to support bluetooth headset.
  • VideoChat: Mainly used for video calls, such as QQ video and FaceTime. The system will also select the best input device, Such as the plug in the headset is used to collect on the headset microphone and set the category of option to “AVAudioSessionCategoryOptionAllowBluetooth” and “AVAudioSessionCategoryOptionDefaultToSpeaker”.
  • GameChat: Applies to the collection and playback of game apps, such as the “GKVoiceChat” object. Manual Settings are not required

The other ones are not related to audio apps. Generally, we only need to focus on VoIP or video calls. You can set the mode after setting the Category.

Of course, these patterns are just CoreAduio summary, not necessarily meet the requirements, for specific patterns, in iOS10 can be tweaked. Through interface :(BOOL)setCategory:(NSString *)category Mode :(NSString *)mode options: Options (AVAudioSessionCategoryOptions) error: (NSError * *) outError but under iOS9 and can only be in the Category raised, actually is essentially the same, can be considered to be a API sugar, interface encapsulation.Copy the code

Call volume and media volume

Generally speaking, call volume refers to the volume of voice and video calls. Media volume refers to the volume at which music, video, or game sound effects and background sound are played.

In practice, the difference is that,

  • Call volume has good echo cancellation, media volume has good sound performance.
  • The media volume can be set to 0, but the call volume cannot.

Call volume and media volume can be selected only. Therefore, you need to distinguish between call volume and media volume. System volume Refers to the call volume when the volume is adjusted on the device. The same goes for media volume. Media volume and call volume belong to two different, independent systems, one setting does not affect the other. After entering a call, the sound playback volume is controlled by the call volume. After exiting the call, the volume of the media is controlled. In the education scenario, when the student is an audience, the media volume is used to make the teacher’s voice more three-dimensional and full. When the student is connected to the microphone, the voice volume is used to ensure the quality of the voice.

In simple terms, media volume control is used in non-connected mode, and call volume control is used in connected mode, which have separate volume control mechanisms.

When playing a media resource, use a player (such as AVPlayer) to play the audio. The description of the underlying AudioUnit of the player is VoiceProcessingIO.

The RTC SDK internally maintains an AudioUnit with description RemoteIO for call volume and VoiceProcessingIO for media volume. The original AudioUnit will be destroyed when mode switch occurs. Create a new AudioUnit and always keep an AudioUnit for audio playback.

The AudioUnit sound of VoiceProcessingIO in AVPlayer is suppressed when the voice volume is lowered. Similarly, at the media volume, the DESCRIPTION of the AudioUnit in the RTC SDK is set to VoiceProcessingIO. If the other modules switch to the call volume by setting AVAudioSession, the RTC’s voice will be suppressed.

Industry status and problems

In the online classroom scenario, many functions need to play sound, including in-class audio and video broadcast, after-class playback, webView embedded courseware sound (including audio, video and sound), classroom audio, classroom video, classroom game sound and sound effect sound, etc.

In addition, there are many audio recording functions in the classroom, including linking, following, group speaking, chat voice input, voice recognition and so on.

There are various combinations of these functions in the classroom, and there are different requirements for the setting of AVAudioSession, while AVAudioSession is a singleton. If there is no unified management logic, it is easy to appear the problem of chaotic setting. At present, the industry encountered more problems are not hearing RTC sound and media sound suppression.

The RTC sound cannot be heard

Hear RTC voice is the main reason of the other functions in setting up the AVAudioSession, AVAudioSessionOptions not contain AVAudioSessionCategoryOptionMixWithOthers mixing mode, The RTC voice is interrupted by a high – priority process. For example, if the embedded audio of webView is played in non-mixing mode, because Webview uses the system process to play the sound and has the highest priority, the RTC sound in APP process will be suppressed and cannot sound normally.

This kind of problem in general are hidden, if there is a problem because of simple scene, before the launch to test out commonly, when multiple functions scene after strung together to trigger problems, often it is hard to find during the test, and if there is no complete log query system, online screening for online this kind of problem difficulty is very big also, Often because the location of the cause and long-term legacy.

Media voices are suppressed

In the call volume mode, the media voice is lowered, causing the voice to decrease. A common scenario is that in a small-class scenario, the audio and video of the class and other media resources are played by the students when they push the stream. The sound is lower than that of the RTC, causing the media sound to be unclear.

In talk mode (with mics), the media sound is lowered because iOS phones enable echo cancellation to ensure the human voice experience, thus lowering the sound in the media channel and background sound.

Some head apps in the education industry do not fundamentally solve this problem. Many of them avoid problems from the product function level and compromise for technical problems through product compromise. For example, when the classroom audio and video resources are played, all students are forced to turn off the mic by default. When the mic is turned off, the volume of the media of students is in a state of low volume. After the class audio and video is played, students are allowed to turn on the mic. This approach to problem solving by avoiding problem scenarios is not replicable.

The RTC voice becomes smaller

RTC sound decreases, the main reason is that the sound is audible through the handset, but not through the speaker normally, resulting in the appearance of reduced sound. Another under iOS14 system, used RTC models and cut back to the media after the call, then call setCategory: PlayAndRecord + DefaultToSpeaker will will now to the problem of the small voice.

The solution

In view of the above industry pain points, through the analysis of underlying principles and actual project experience, a set of feasible solutions are sorted out from the aspects of code specification, problem covering and problem alarming.

The RTC voice is not heard, and the RTC voice becomes smaller

The sound problem of RTC is basically because other module functions change AVAudioSession, and after the function ends, AVAudioSession does not reset to the Settings required by RTC. The audio and video SDK itself (such as Agora, Zego, etc.) has a certain back-pocket logic for this situation, but this back-pocket logic is unreasonable if it is intrusive, so it has certain limitations.

AudioSession modifies the specification

The system cannot tell which module changes AudioSession in the same process. To avoid hearing RTC, other modules need to follow the following principles when using RTC:

  1. Before calling setCategory, the module determines that AudioSession does not need to be set again if it has met the needs of use to avoid triggering iOS 14 system bugs

    • When a module needs to record, the Category should use PlayAndRecord (don’t use CategoryRecord only to prevent interrupting the audio being played), Call setCategory if the current category is not PlayAndRecord
    • SetCategory is not required when the module only needs to play and the current category is PlayAndRecord or Playback
  2. If the current category is not suitable for module use, the current AudioSession state should be saved before setCategory and then use the audio function. After the use, You should reset the category to its previous AudioSession state

  3. When setting up audioSession, CategoryOptions should contain AVAudioSessionCategoryOptionDefaultToSpeaker and AVAudioSessionCategoryOptionMixWithOthers, And above should also contain AVAudioSessionCategoryOptionAllowBluetooth iOS10 system.

The core code is as follows:

If ([AVAudioSession sharedInstance].category! = AVAudioSessionCategoryPlayAndRecord) { RTCAudioSessionCacheManager cacheCurrentAudioSession]; AVAudioSessionCategoryOptions categoryOptions = AVAudioSessionCategoryOptionDefaultToSpeaker | AVAudioSessionCategoryOptionMixWithOthers; If (@ the available (iOS 10.0, *)) {categoryOptions | = AVAudioSessionCategoryOptionAllowBluetooth; } [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord withOptions:categoryOptions error:nil]; [[AVAudioSession sharedInstance] setActive:YES error:nil]; } / / function at the end of the reset audioSession [RTCAudioSessionCacheManager resetToCachedAudioSession];Copy the code
static AVAudioSessionCategory cachedCategory = nil; static AVAudioSessionCategoryOptions cachedCategoryOptions = nil; @ implementation RTCAudioSessionCacheManager / / modify the Settings of the cache of the RTC before audioSession + (void) cacheCurrentAudioSession {the if (! [[AVAudioSession sharedInstance].category isEqualToString:AVAudioSessionCategoryPlayback] && ! [[AVAudioSession sharedInstance].category isEqualToString:AVAudioSessionCategoryPlayAndRecord]) { return; } @synchronized (self) { cachedCategory = [AVAudioSession sharedInstance].category; cachedCategoryOptions = [AVAudioSession sharedInstance].categoryOptions; }} / / reset to the cache audioSession set + (void) resetToCachedAudioSession {the if (! cachedCategory || ! cachedCategoryOptions) { return; } BOOL needResetAudioSession = ! [[AVAudioSession sharedInstance].category isEqualToString:cachedCategory] || [AVAudioSession sharedInstance].categoryOptions ! = cachedCategoryOptions; if (needResetAudioSession) { dispatch_async(dispatch_get_global_queue(0, 0), ^{ [[AVAudioSession sharedInstance] setCategory:cachedCategory withOptions:cachedCategoryOptions error:nil]; [[AVAudioSession sharedInstance] setActive:YES error:nil]; @synchronized (self) { cachedCategory = nil; cachedCategoryOptions = nil; }}); } } @endCopy the code

Out strategy

Considering the complexity of the online classroom scene, let all the functional codes in the classroom follow the modified specification of AVAudioSession. Although there is a strict codeReview, there are also certain risks caused by human factors. With the continuous iteration of business functions, no problems can be completely guaranteed online. So a reliable bottom – pocket strategy is very necessary.

The basic logic of the backstop strategy is to hook into the changes of AVAudioSession. When the Settings of each module for AVAudioSession do not meet the requirements of the specification, we compulsively revise them without affecting the functions, such as adding the mixing mode to options.

By exchanging methods we can hook into AVAudioSession’s changes. For example, kk_setCategory:withOptions: error versus the system setCategory:withOptions: error: Swapping in exchange method, we judge whether the options include AVAudioSessionCategoryOptionMixWithOthers, if not included we append.

- (BOOL)kk_setCategory:(AVAudioSessionCategory)category withOptions:(AVAudioSessionCategoryOptions)options Error :(NSError **)outError {// if audioSession needs to be modified (RTC live), options does not contain mixWithOther, Add mixWithOther BOOL addMixWithOthersEnable = shouldFixAudioSession &&! (options & AVAudioSessionCategoryOptionMixWithOthers)]; if (addMixWithOthersEnable) { return [self kk_setCategory:category withOptions:options | AVAudioSessionCategoryOptionMixWithOthers error:outError];; } return [self kk_setCategory:category withOptions:options error:outError]; }Copy the code

But the above method only works for setting AVAudioSession by calling setCategory:withOptions: error: if a module calls setCategory:error: Methods to set the AVAudioSession, setCategory: error: method will default options is set to 0 (not include AVAudioSessionCategoryOptionMixWithOthers).

After we hook to setCategory: error: method, not by adjusting the parameters to generate additional options in a way that mixing mode option, but can be invoked in exchange method to setCategory: withOptions: error: Method, and the options parameter into the AVAudioSessionCategoryOptionMixWithOthers, to meet our requirements.

The problem is called setCategory: withOptions: error: when the bottom will be nested calls setCategory: error: method, while the setCategory: error: Has been our hooks and in exchange method call setCategory: withOptions: error:, thus formed the infinite loop.

Aiming at this problem, we through listening AVAudioSessionRouteChangeNotification notice, to hookcategory change, AVAudioSessionRouteChangeNotification setCategory the call: error: triggered when, rather than in the calling setCategory: withOptions: error: When triggered directly, and then with the above method formed a good complement.

/ / add the AVAudioSessionRouteChange listening in [[NSNotificationCenter defaultCenter] addObserver: self selector:@selector(handleRouteChangeNotification:) name:AVAudioSessionRouteChangeNotification object:nil]; - (void)handleRouteChangeNotification:(NSNotification *)notification { NSNumber* reasonNumber = notification.userInfo[AVAudioSessionRouteChangeReasonKey]; AVAudioSessionRouteChangeReason reason = (AVAudioSessionRouteChangeReason)reasonNumber.unsignedIntegerValue; if (reason == AVAudioSessionRouteChangeReasonCategoryChange) { AVAudioSessionCategoryOptions currentCategoryOptions = [AVAudioSession sharedInstance].categoryOptions; AVAudioSessionCategory currentCategory = [AVAudioSession sharedInstance].category; // In a scenario where audioSession needs to be modified (RTC live), options does not contain mixWithOther when modifying category. Add mixWithOther if (shouldFixAudioSession &&! (currentCategoryOptions & AVAudioSessionCategoryOptionMixWithOthers)) { [[AVAudioSession sharedInstance] setCategory:currentCategory withOptions:currentCategoryOptions | AVAudioSessionCategoryOptionMixWithOthers error:nil]; }}}Copy the code

Alarm mechanism

Even if have to modify and kept out strategy, with iteration and the iOS upgrade classroom business, there is no guarantee that the online completely is not a problem, so we set up alarm mechanism, when online there is a problem, can in the work group received alarm in time, according to the problem of alarm information, further explored through the log. Through the alarm mechanism, we can respond to online problems more quickly, not passively relying on students’ complaints and feedbacks, and promote problem solving at the fastest speed.

When the RTC sound is interrupted, the underlying audio and video SDK will call back the warning error code (such as agora’s warningCode is 1025). When the corresponding warningCode appears, it will be synchronized in the flying book group in the form of messages combined with the alarm function of slardar. At the same time, when the change is hooked to AVAudioSession, the stack information can be obtained to locate the change triggered by which module. Combined with the alarm user information, the problem can be located more conveniently.

Media voices are suppressed

When a media sound is played at the media volume, the voice volume is switched to the call volume due to the microphone connection. In this case, the media volume is suppressed by the call volume due to system features, causing the sound to decrease.

To solve this problem, we use the mixing and streaming functions provided by the AUDIO and video SDK to avoid it. The basic principle is that when playing a media resource, we take the PCM audio data of the resource, throw the data to RTC’s audioUnit for mixing, and the RTC audio player unit plays the data uniformly. If RTC uses the call volume, the media resource also plays the call volume, and vice versa. In this way, media resources and RTC always maintain the same volume control mechanism, and avoid the difference in sound size.

Mixing refers to the local file path to the audio, or the URL to play, which is read and played by the SDK. Mixed stream is the pointer to the video file. The player only decodes and plays the video data, and then throws the audio data to the SDK in real time. The SDK mixes and plays the incoming real-time audio data with the RTC audio data. In the project, we use the ON-DEMAND SDK TTVideoEngine to achieve video playback and audio casting.

conclusion

Through the introduction of the above integrated solution, the sound problem is effectively solved, while also coping with the needs of the rapidly iterating classroom, effectively improving the online classroom experience.

!

By Bytedance Technology Team link: juejin.cn/post/693498… The copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.