AV Foundation Audio recording and playback

An overview of the

This article shares some of the audio features of the AV Foundation, including:

Digital audio
Audio playback and recording
Text to speech

Digital audio

Sound is a wave generated by the vibration of objects, which spreads through the medium (such as air) and can be perceived by human or animal auditory organs. Its essence is that the vibration of objects causes the medium (such as air) to vibrate, causing the surrounding medium (air) to produce dense changes, forming longitudinal waves with dense phases.

There are three important characteristics of sound, namely pitch, loudness and timbre. Pitch refers to the level of sound, which is determined by frequency. The higher the frequency, the higher the pitch (frequency unit is Hz, Hertz). The hearing range of human ear is from 20Hz to 20,00Hz, below 20Hz becomes infrasonic wave, and above 20,00Hz becomes ultrasonic. Loudness is the subjectively felt sound size (commonly known as volume), which is determined by the amplitude and the distance between people and the sound source. The greater the amplitude, the greater the loudness, and the smaller the distance between people and the sound source, the greater the loudness. Timbre is also called audio, the waveform determines the timbre of the sound.

The sound waveform is shown below:

The x-coordinate is the frequency
The y-coordinate shows the amplitude

Digital audio is a technology that uses digital means to record, store, edit, compress or play sound. Because sound can be decomposed into sinusoidal superposition of different frequency and intensity through Fourier transform, it is possible to convert analog signals into electrical signals and store them in the form of 0 and 1 on the computer.

Digital audio involves two important variables. One is the sampling frequency, which refers to the collection of data within the cycle into electrical signals for storage. The shorter the cycle, the higher the frequency, the more real the sound reduction. The other is quantization number, that is, the maximum number of data stored after sampling. The data stored by the computer is limited, and it is impossible to be infinitely accurate to the data collected, so it needs to be eliminated. The quantization number commonly used is 8, 16 and 32 bits.

The following figure is a collection data diagram of a certain digital audio:

Digitizing sound requires a lot of space if the original data is retained without any compression. For example, a 44.1khz, 16-bit LPCM audio file may take up 10MB of memory per minute. Therefore, the industry has launched a number of standard audio formats for digital audio compression. The following are some commonly used formats:

WAV: An audio format developed by Microsoft that supports audio compression but is often used to store uncompressed lossless audio
MP3: Common audio file compression technique used to dramatically reduce the number of audio files
AAC: currently one of the most popular formats, compared to MP3, sound quality is better, the file is smaller, ideally can be compressed to 1/18 of the original file
APE: lossless compression, can compress files for the original half
FLAC: no compression

Audio playback and recording

IOS has many frameworks in Audio, including advanced frameworks such as AVKit and AV Foundation, and underlying frameworks such as Core Audio and CoreMedia. AV Foundation encapsulates these underlying frameworks and abstracts them into high-level interfaces, which is convenient for developers. The entire iOS audio and video processing framework is shown in the figure below:

Audio session

When using AV Foundation to process audio, a core object, AVAudioSession (audio session), is involved. The audio session is an intermediary between application and operating system interaction that schedules system audio functionality through semantic description.

To use audio sessions, you need to configure session attributes, namely, AVAudioSession.Category (Audio session classification). Different categories have different system permissions, as shown in the following table:

classification	role	Whether remixing is allowed	Audio input	Audio output
Ambient	Games, productivity apps	allow		allow
Solo Ambient	Games, productivity apps			allow
Playback	Audio and video player	optional		allow
Record	Recorder, audio capture		allow
Play And Record	VoIP and voice chat	optional	allow	allow
Audio Processing	Offline sessions and processing
Multi-Route	Advanced A/V applications using external hardware		allow	allow

An AVAudioSession instance is a singleton object in an application. The developer cannot initialize an AVAudioSession instance directly through the constructor, but needs to return it through its singleton method, sharedInstance(). AVAudioSession configuration in the application’s life cycle can be modified, but usually only to its configuration, generally in the boot method application (_ : configure didFinishLaunchingWithOptions:), as shown in the following code:

do {
    try AVAudioSession.sharedInstance().setCategory(.playAndRecord)
    try AVAudioSession.sharedInstance().setActive(true, options: [])
} catch {
    .
}
Copy the code

AVAudioPlayer plays audio

AVAudioPlayer AVAudioPlayer is the first choice for AV Foundation Audio playback, or the first choice for iOS Audio playback. It provides all the core functions of Audio Queue Service and is suitable for local playback or scenarios with no delay sensitive requirements.

Init (contentsOf: URL); init(Data: Data); init(Data: Data);

NSURL *fileURL = ... ; self.player = ... [self.player prepareToPlay];Copy the code

It is recommended to call prepareToPlay() first during initialization because the audio hardware is obtained and the AudioQueue buffer is preloaded before the play() method is called, reducing the delay between calling the Play() method and hearing the sound output. If the prepareToPlay() method is not called, an implicit call to something like prepareToPlay() will activate the audio when the Play() method is called.

AVAudioPlayer provides a list of lifecycle control methods for playback, as follows:

Play () : Plays an audio file, and resumes the audio file stopped by pause or stop
Pause () : Pauses audio playback, which can be resumed from Play without erasing prepareToPlay content
Stop () : Stop audio playback, clear prepareToPlay content, can be resumed from play

AVAudioPlayer also provides some of the audio control property variables, as shown below:

Volume: Changes the volume of a player. The value ranges from 0.0 to 1.0, in floating point units
Pan: Player channel, ranging from -1 (far left) to 1.0 (far right), default is 0.0 (center)
Rate: Adjusts the playback rate from 0.5 to 2.0
NumberOfLoops: n > 0 numberOfLoops (n = -1
IsMeteringEnabled: Whether to enable audio metering, that is, output visual audio metering data

background

When playing audio, a very common scenario is that after the App exits the foreground, it can still play audio continuously in the background until the user stops.

It is not difficult to play audio in the background. There are only two steps:

Set the audio session category to Playback, which allows audio to play while the phone is silent
You need to add an array of Required background modes in the info.plist file and add an item for App Plays Audio or Streams Audio/Video using AirPlay

With these two steps, audio playback can continue to serve in the background.

Interrupt handling

Sometimes the audio is interrupted by a phone call or Face Time call. When the user rejects the call or the call ends, the audio starts to play again from the paused position.

The successful realization of this series of operations depends on the interrupt notification of AVAudioSession. By listening on the interrupt notification, when the interrupt starts or ends, the system will tell the outside world the changes. The sample code is as follows:

func setupNotifications(a) {
    let nc = NotificationCenter.default
    nc.addObserver(self,
                   selector: #selector(handleInterruption),
                   name: AVAudioSession.interruptionNotification,
                   object: AVAudioSession.sharedInstance)
}

@objc func handleInterruption(notification: Notification){}Copy the code

Interrupt notifications contain a userInfo dictionary with important information that determines the behavior of the audio, whether to pause or play
HandleInterruption (Notification 🙂 : Used to centrally handle interrupt notifications

Examples of code for handling interrupt notifications in handleInterruption(Notification 🙂 :

@objc func handleInterruption(notification: Notification) {
    guard let userInfo = notification.userInfo,
        let typeValue = userInfo[AVAudioSessionInterruptionTypeKey] as? UInt,
        let type = AVAudioSession.InterruptionType(rawValue: typeValue) else {
            return
    }

    switch type {

    case .began:

    case .ended:
       
        guard let optionsValue = userInfo[AVAudioSessionInterruptionOptionKey] as? UInt else { return }
        let options = AVAudioSession.InterruptionOptions(rawValue: optionsValue)
        if options.contains(.shouldResume) {
           
        } else {
            
        }

    default: ()
    }
}
Copy the code

Route change treatment

When using music software, there is often a scene, such as switching from speaker to headset, or from headset to speaker, etc. Sometimes it can be dangerous to switch headphones to speakers while still playing the user’s audio content, because the audio the user is currently listening to may be very private.

Because of this demand scenarios, AVAudioSession provides line change notification, when the line on the mobile devices (such as speakers switch to headphones) changes, will trigger the AVAudioSession. RouteChangeNotification notification for developers, Developers must follow the iOS User Experience Specification to play or pause audio.

Example code for listening for route change notifications is as follows:

func setupNotifications() {
    let nc = NotificationCenter.default
    nc.addObserver(self,
                   selector: #selector(handleRouteChange),
                   name: AVAudioSession.routeChangeNotification,
                   object: nil)
}

@objc func handleRouteChange(notification: Notification) {
}
Copy the code

This notification is sent when the output audio or output device changes
Notification contains a userInfo dictionary to get the reason why a notification was sent and a description of the previous line

Example code for handling handleRouteChange(Notification 🙂 :

@objc func handleRouteChange(notification: Notification) {
    // Get whether the line has changed and why
    guard let userInfo = notification.userInfo,
        let reasonValue = userInfo[AVAudioSessionRouteChangeReasonKey] as? UInt.let reason = AVAudioSession.RouteChangeReason(rawValue: reasonValue) else {
            return
    }
    
    // Determine the cause of the change
    switch reason {

    case .newDeviceAvailable: // Find the new device
        let session = AVAudioSession.sharedInstance()
        headphonesConnected = hasHeadphones(in: session.currentRoute)
    
    case .oldDeviceUnavailable:// The old device is disconnected
        // Obtain the line description
        if let previousRoute =
            userInfo[AVAudioSessionRouteChangePreviousRouteKey] as? AVAudioSessionRouteDescription {
            headphonesConnected = hasHeadphones(in: previousRoute)
        }
    
    default: ()}}func hasHeadphones(in routeDescription: AVAudioSessionRouteDescription) -> Bool {
    // Find the first outlet and check if it is the earphone entrance
    return !routeDescription.outputs.filter({$0.portType = = .headphones}).isEmpty
}
Copy the code

AVAudioRecorder records audio

AVAudioRecorder is one of the AV Foundation interfaces used for Audio recording. It is an advanced package for Audio Queue Services. Recording with AVAudioRecorder is not complicated.

AVAudioRecorder creation process is very simple, there are two main steps:

Generate a URL, attached to AVAudioRecorder as the audio stream write address
Generate a dictionary Settings to format the audio stream, also attached to AVAudioRecorder

AVAudioRecorder create process example code:

do {
    self.recorder = try AVAudioRecorder(url: fileURL, settings: setting)
    self.recorder.delegate = self
    self.recorder.isMeteringEnabled = true
    self.recorder.prepareToRecord()
} catch {
    fatalError(error.localizedDescription)
}
Copy the code

The prepareToRecord() method initializes the resources required for recording, including creating files, and minimizes the delay for recording startup
The key value information in setting includes audio format, sampling rate, etc
The suffix of the URL file path must correspond to the audio format, otherwise there will be problems

Setting is used to regulate the recording format of audio stream, common key values are:

AVFormatIDKey: audio format
AVSampleRateKey: sampling rate
AVNumberOfChannelsKey: Indicates the number of channels
AVEncoderBitDepthHintKey: quantization number
AVEncoderAudioQualityKey: sound quality

When using AVAudioRecorder to record audio, you need to set the session classification of audio session to playAndRecord, create AVAudioRecorder, implement AVAudioRecordeDelegate protocol, The content of the AVAudioRecordeDelegate is very simple, mainly recording completion and recording error callback, other methods are basically obsolete.

Text to speech

The AV Foundation provides a speech synthesis framework for managing speech and speech synthesis. One of the most commonly used features is AVSpeechSynthesisVoice.

There are only two steps to get the text-to-speech function on your App:

Create an AVSpeechUtterance object and append content strings, speech parameters such as voice, rate, and so on
Append the AVSpeechUtterance object to the AvspeechSynthesized Voice instance, which controls the speech lifecycle

A code example is as follows:

Let speechutterance = AVSpeechUtterance(string: "The quick brown fox jumped over The lazy dog.") utterance. Rate = 0.57 / / rate utterance pitchMultiplier = 0.8 Utterance. PostUtteranceDelay = 0.2 utterance. Volume = 0.8 / / volume let voice = AVSpeechSynthesisVoice (language: "en-GB") utterance.voice = voice let synthesizer = AVSpeechSynthesizer() synthesizer.speak(utterance)Copy the code

AVSpeechUtterance instance also have corresponding a Delegate, namely AVSpeechSynthesizerDelegate, mainly for life cycle in the process of writing voice callback, have time the reader can look at our related API.

conclusion

This article mainly shares the following contents:

The nature of digital audio
AVAudioPlayer plays audio, which involves processing methods such as audio session classification, background playback, playback interruption, route change, etc
AVAudioRecorder records audio
AV Foundation text to speech

This is the AV Foundation in the audio processing of the introductory content, through this article, I hope to help you quickly understand digital audio and how to handle digital audio on iOS, the content involved in the article also has the corresponding code for your reference (source portal). If there is anything wrong or misdescribed in this article, please correct it.

reference

Voice (Baidu Encyclopedia)

Responding to Audio Session Interruptions

AVSpeechSynthesizerDelegate

Responding to Audio Session Route Changes

AVFAudio