An overview of the
This article shares some of the audio features of the AV Foundation, including:
- Digital audio
- Audio playback and recording
- Text to speech
Digital audio
Sound is a wave generated by the vibration of objects, which spreads through the medium (such as air) and can be perceived by human or animal auditory organs. Its essence is that the vibration of objects causes the medium (such as air) to vibrate, causing the surrounding medium (air) to produce dense changes, forming longitudinal waves with dense phases.
There are three important characteristics of sound, namely pitch, loudness and timbre. Pitch refers to the level of sound, which is determined by frequency. The higher the frequency, the higher the pitch (frequency unit is Hz, Hertz). The hearing range of human ear is from 20Hz to 20,00Hz, below 20Hz becomes infrasonic wave, and above 20,00Hz becomes ultrasonic. Loudness is the subjectively felt sound size (commonly known as volume), which is determined by the amplitude and the distance between people and the sound source. The greater the amplitude, the greater the loudness, and the smaller the distance between people and the sound source, the greater the loudness. Timbre is also called audio, the waveform determines the timbre of the sound.
The sound waveform is shown below:
- The x-coordinate is the frequency
- The y-coordinate shows the amplitude
Digital audio is a technology that uses digital means to record, store, edit, compress or play sound. Because sound can be decomposed into sinusoidal superposition of different frequency and intensity through Fourier transform, it is possible to convert analog signals into electrical signals and store them in the form of 0 and 1 on the computer.
Digital audio involves two important variables. One is the sampling frequency, which refers to the collection of data within the cycle into electrical signals for storage. The shorter the cycle, the higher the frequency, the more real the sound reduction. The other is quantization number, that is, the maximum number of data stored after sampling. The data stored by the computer is limited, and it is impossible to be infinitely accurate to the data collected, so it needs to be eliminated. The quantization number commonly used is 8, 16 and 32 bits.
The following figure is a collection data diagram of a certain digital audio:
Digitizing sound requires a lot of space if the original data is retained without any compression. For example, a 44.1khz, 16-bit LPCM audio file may take up 10MB of memory per minute. Therefore, the industry has launched a number of standard audio formats for digital audio compression. The following are some commonly used formats:
- WAV: An audio format developed by Microsoft that supports audio compression but is often used to store uncompressed lossless audio
- MP3: Common audio file compression technique used to dramatically reduce the number of audio files
- AAC: currently one of the most popular formats, compared to MP3, sound quality is better, the file is smaller, ideally can be compressed to 1/18 of the original file
- APE: lossless compression, can compress files for the original half
- FLAC: no compression
Audio playback and recording
IOS has many frameworks in Audio, including advanced frameworks such as AVKit and AV Foundation, and underlying frameworks such as Core Audio and CoreMedia. AV Foundation encapsulates these underlying frameworks and abstracts them into high-level interfaces, which is convenient for developers. The entire iOS audio and video processing framework is shown in the figure below:
Audio session
When using AV Foundation to process audio, a core object, AVAudioSession (audio session), is involved. The audio session is an intermediary between application and operating system interaction that schedules system audio functionality through semantic description.
To use audio sessions, you need to configure session attributes, namely, AVAudioSession.Category (Audio session classification). Different categories have different system permissions, as shown in the following table:
classification | role | Whether remixing is allowed | Audio input | Audio output |
---|---|---|---|---|
Ambient | Games, productivity apps | allow | allow | |
Solo Ambient | Games, productivity apps | allow | ||
Playback | Audio and video player | optional | allow | |
Record | Recorder, audio capture | allow | ||
Play And Record | VoIP and voice chat | optional | allow | allow |
Audio Processing | Offline sessions and processing | |||
Multi-Route | Advanced A/V applications using external hardware | allow | allow |
An AVAudioSession instance is a singleton object in an application. The developer cannot initialize an AVAudioSession instance directly through the constructor, but needs to return it through its singleton method, sharedInstance(). AVAudioSession configuration in the application’s life cycle can be modified, but usually only to its configuration, generally in the boot method application (_ : configure didFinishLaunchingWithOptions:), as shown in the following code:
do {
try AVAudioSession.sharedInstance().setCategory(.playAndRecord)
try AVAudioSession.sharedInstance().setActive(true, options: [])
} catch {
.
}
Copy the code
AVAudioPlayer plays audio
AVAudioPlayer AVAudioPlayer is the first choice for AV Foundation Audio playback, or the first choice for iOS Audio playback. It provides all the core functions of Audio Queue Service and is suitable for local playback or scenarios with no delay sensitive requirements.
Init (contentsOf: URL); init(Data: Data); init(Data: Data);
NSURL *fileURL = ... ; self.player = ... [self.player prepareToPlay];Copy the code
It is recommended to call prepareToPlay() first during initialization because the audio hardware is obtained and the AudioQueue buffer is preloaded before the play() method is called, reducing the delay between calling the Play() method and hearing the sound output. If the prepareToPlay() method is not called, an implicit call to something like prepareToPlay() will activate the audio when the Play() method is called.
AVAudioPlayer provides a list of lifecycle control methods for playback, as follows:
- Play () : Plays an audio file, and resumes the audio file stopped by pause or stop
- Pause () : Pauses audio playback, which can be resumed from Play without erasing prepareToPlay content
- Stop () : Stop audio playback, clear prepareToPlay content, can be resumed from play
AVAudioPlayer also provides some of the audio control property variables, as shown below:
- Volume: Changes the volume of a player. The value ranges from 0.0 to 1.0, in floating point units
- Pan: Player channel, ranging from -1 (far left) to 1.0 (far right), default is 0.0 (center)
- Rate: Adjusts the playback rate from 0.5 to 2.0
- NumberOfLoops: n > 0 numberOfLoops (n = -1
- IsMeteringEnabled: Whether to enable audio metering, that is, output visual audio metering data
background
When playing audio, a very common scenario is that after the App exits the foreground, it can still play audio continuously in the background until the user stops.
It is not difficult to play audio in the background. There are only two steps:
- Set the audio session category to Playback, which allows audio to play while the phone is silent
- You need to add an array of Required background modes in the info.plist file and add an item for App Plays Audio or Streams Audio/Video using AirPlay
With these two steps, audio playback can continue to serve in the background.
Interrupt handling
Sometimes the audio is interrupted by a phone call or Face Time call. When the user rejects the call or the call ends, the audio starts to play again from the paused position.
The successful realization of this series of operations depends on the interrupt notification of AVAudioSession. By listening on the interrupt notification, when the interrupt starts or ends, the system will tell the outside world the changes. The sample code is as follows:
func setupNotifications(a) {
let nc = NotificationCenter.default
nc.addObserver(self,
selector: #selector(handleInterruption),
name: AVAudioSession.interruptionNotification,
object: AVAudioSession.sharedInstance)
}
@objc func handleInterruption(notification: Notification){}Copy the code
- Interrupt notifications contain a userInfo dictionary with important information that determines the behavior of the audio, whether to pause or play
- HandleInterruption (Notification 🙂 : Used to centrally handle interrupt notifications
Examples of code for handling interrupt notifications in handleInterruption(Notification 🙂 :
@objc func handleInterruption(notification: Notification) {
guard let userInfo = notification.userInfo,
let typeValue = userInfo[AVAudioSessionInterruptionTypeKey] as? UInt,
let type = AVAudioSession.InterruptionType(rawValue: typeValue) else {
return
}
switch type {
case .began:
case .ended:
guard let optionsValue = userInfo[AVAudioSessionInterruptionOptionKey] as? UInt else { return }
let options = AVAudioSession.InterruptionOptions(rawValue: optionsValue)
if options.contains(.shouldResume) {
} else {
}
default: ()
}
}
Copy the code
Route change treatment
When using music software, there is often a scene, such as switching from speaker to headset, or from headset to speaker, etc. Sometimes it can be dangerous to switch headphones to speakers while still playing the user’s audio content, because the audio the user is currently listening to may be very private.
Because of this demand scenarios, AVAudioSession provides line change notification, when the line on the mobile devices (such as speakers switch to headphones) changes, will trigger the AVAudioSession. RouteChangeNotification notification for developers, Developers must follow the iOS User Experience Specification to play or pause audio.
Example code for listening for route change notifications is as follows:
func setupNotifications() {
let nc = NotificationCenter.default
nc.addObserver(self,
selector: #selector(handleRouteChange),
name: AVAudioSession.routeChangeNotification,
object: nil)
}
@objc func handleRouteChange(notification: Notification) {
}
Copy the code
- This notification is sent when the output audio or output device changes
- Notification contains a userInfo dictionary to get the reason why a notification was sent and a description of the previous line
Example code for handling handleRouteChange(Notification 🙂 :
@objc func handleRouteChange(notification: Notification) {
// Get whether the line has changed and why
guard let userInfo = notification.userInfo,
let reasonValue = userInfo[AVAudioSessionRouteChangeReasonKey] as? UInt.let reason = AVAudioSession.RouteChangeReason(rawValue: reasonValue) else {
return
}
// Determine the cause of the change
switch reason {
case .newDeviceAvailable: // Find the new device
let session = AVAudioSession.sharedInstance()
headphonesConnected = hasHeadphones(in: session.currentRoute)
case .oldDeviceUnavailable:// The old device is disconnected
// Obtain the line description
if let previousRoute =
userInfo[AVAudioSessionRouteChangePreviousRouteKey] as? AVAudioSessionRouteDescription {
headphonesConnected = hasHeadphones(in: previousRoute)
}
default: ()}}func hasHeadphones(in routeDescription: AVAudioSessionRouteDescription) -> Bool {
// Find the first outlet and check if it is the earphone entrance
return !routeDescription.outputs.filter({$0.portType = = .headphones}).isEmpty
}
Copy the code
AVAudioRecorder records audio
AVAudioRecorder is one of the AV Foundation interfaces used for Audio recording. It is an advanced package for Audio Queue Services. Recording with AVAudioRecorder is not complicated.
AVAudioRecorder creation process is very simple, there are two main steps:
- Generate a URL, attached to AVAudioRecorder as the audio stream write address
- Generate a dictionary Settings to format the audio stream, also attached to AVAudioRecorder
AVAudioRecorder create process example code:
do {
self.recorder = try AVAudioRecorder(url: fileURL, settings: setting)
self.recorder.delegate = self
self.recorder.isMeteringEnabled = true
self.recorder.prepareToRecord()
} catch {
fatalError(error.localizedDescription)
}
Copy the code
- The prepareToRecord() method initializes the resources required for recording, including creating files, and minimizes the delay for recording startup
- The key value information in setting includes audio format, sampling rate, etc
- The suffix of the URL file path must correspond to the audio format, otherwise there will be problems
Setting is used to regulate the recording format of audio stream, common key values are:
- AVFormatIDKey: audio format
- AVSampleRateKey: sampling rate
- AVNumberOfChannelsKey: Indicates the number of channels
- AVEncoderBitDepthHintKey: quantization number
- AVEncoderAudioQualityKey: sound quality
When using AVAudioRecorder to record audio, you need to set the session classification of audio session to playAndRecord, create AVAudioRecorder, implement AVAudioRecordeDelegate protocol, The content of the AVAudioRecordeDelegate is very simple, mainly recording completion and recording error callback, other methods are basically obsolete.
Text to speech
The AV Foundation provides a speech synthesis framework for managing speech and speech synthesis. One of the most commonly used features is AVSpeechSynthesisVoice.
There are only two steps to get the text-to-speech function on your App:
- Create an AVSpeechUtterance object and append content strings, speech parameters such as voice, rate, and so on
- Append the AVSpeechUtterance object to the AvspeechSynthesized Voice instance, which controls the speech lifecycle
A code example is as follows:
Let speechutterance = AVSpeechUtterance(string: "The quick brown fox jumped over The lazy dog.") utterance. Rate = 0.57 / / rate utterance pitchMultiplier = 0.8 Utterance. PostUtteranceDelay = 0.2 utterance. Volume = 0.8 / / volume let voice = AVSpeechSynthesisVoice (language: "en-GB") utterance.voice = voice let synthesizer = AVSpeechSynthesizer() synthesizer.speak(utterance)Copy the code
AVSpeechUtterance instance also have corresponding a Delegate, namely AVSpeechSynthesizerDelegate, mainly for life cycle in the process of writing voice callback, have time the reader can look at our related API.
conclusion
This article mainly shares the following contents:
- The nature of digital audio
- AVAudioPlayer plays audio, which involves processing methods such as audio session classification, background playback, playback interruption, route change, etc
- AVAudioRecorder records audio
- AV Foundation text to speech
This is the AV Foundation in the audio processing of the introductory content, through this article, I hope to help you quickly understand digital audio and how to handle digital audio on iOS, the content involved in the article also has the corresponding code for your reference (source portal). If there is anything wrong or misdescribed in this article, please correct it.
reference
Voice (Baidu Encyclopedia)
Responding to Audio Session Interruptions
AVSpeechSynthesizerDelegate
Responding to Audio Session Route Changes
AVFAudio