If you want to see the final result before you decide to read the article ->bilibili



Sample code download

IOS Audio Spectrum animation

This is the second in a series of articles. The first part covers audio playback and spectrum data calculation, and this part covers data processing and animation.

preface

In the last article we got the spectrum data, and we know that each element of the array represents the amplitude, so what is the relationship between the elements of the array? According to the principle of FFT, N audio signal samples participating in the calculation will generate N/2 data (2048/2=1024), whose frequency resolution △ F =Fs/N = 44100/2048≈ 21.5Hz, and the frequency interval of adjacent data is the same. Therefore, the 1024 data represent the frequencies of 0Hz, 21.5Hz and 43.0Hz…. respectively Amplitude at 22050hz.

Is it possible to animate 1024 data directly? Sure, if you happen to want to display 1024 animated objects! But if you want to be flexible with this number, you need to do band partitioning.

Strictly speaking, the result is 1025, because the 1025 value was discarded directly in the FFT calculation in the previous article by fftinout.imagp [0] = 0. This 1025th value represents the real part of the Nyquist frequency value. For why it is saved in the imaginary part of the first FFT result, look at the first article.

Frequency division

The more important reason for band division is this: According to psychoacoustics, the human ear can easily distinguish the tone difference between 100Hz and 200Hz, but it is difficult to distinguish the tone difference between 8100Hz and 8200Hz, although they are different by 100Hz respectively. It can be said that the change between frequency and tone is not linear, but some logarithmic relationship. Therefore, it is more auditory for human to divide the data from equal frequency intervals into pairwise increasing intervals when animation is implemented.

Open the project AudioSpectrum02-Starter and you’ll find that, a little differently from the previous AudioSpectrum01 project, it moves ffT-related calculations into the new class RealtimeAnalyzer. Makes the responsibilities of the AudioSpectrumPlayer and RealtimeAnalyzer classes clearer.

If you just want to browse through the implementation code, open the project AudioSpectrum02-Final and you have completed all the code for this article

Look at the code of the RealtimeAnalyzer class, which defines the frequencyBands, startFrequency, and endFrequency properties that determine the number of bands and the starting and ending frequency range.

public var frequencyBands: Int = 80 // Number of bands
public var startFrequency: Float = 100 // Start frequency
public var endFrequency: Float = 18000 // Cutoff frequency
Copy the code

You can now determine the new band based on these properties:

private lazy var bands: [(lowerFrequency: Float, upperFrequency: Float)] = {
    var bands = [(lowerFrequency: Float, upperFrequency: Float)] ()//1: The multiple of growth is determined according to the frequency spectrum and frequency band number: 2^n
    let n = log2(endFrequency/startFrequency) / Float(frequencyBands)
    var nextBand: (lowerFrequency: Float, upperFrequency: Float) = (startFrequency, 0)
    for i in 1. frequencyBands {//2: the upper frequency of the band is 2^n times the lower frequency
        let highFrequency = nextBand.lowerFrequency * powf(2, n)
        nextBand.upperFrequency = i == frequencyBands ? endFrequency : highFrequency
        bands.append(nextBand)
        nextBand.lowerFrequency = highFrequency
    }
    return bands
}()
Copy the code

The function findMaxAmplitude is then created to calculate the value of the new band by finding the maximum value of the original amplitude data in that band:

private func findMaxAmplitude(for band:(lowerFrequency: Float, upperFrequency: Float).in amplitudes: [Float], with bandWidth: Float) - >Float {
    let startIndex = Int(round(band.lowerFrequency / bandWidth))
    let endIndex = min(Int(round(band.upperFrequency / bandWidth)), amplitudes.count - 1)
    return amplitudes[startIndex...endIndex].max()!
}
Copy the code

The new analyse function receives audio raw data and provides the processed spectrum data:

func analyse(with buffer: AVAudioPCMBuffer)- > [[Float]] {
    let channelsAmplitudes = fft(buffer)
    var spectra = [[Float]] ()for amplitudes in channelsAmplitudes {
        let spectrum = bands.map {
           findMaxAmplitude(for: $0.in: amplitudes, with: Float(buffer.format.sampleRate)  / Float(self.fftSize))
        }
        spectra.append(spectrum)
    }
    return spectra
}
Copy the code

animation

Looks like we’ve got our data sorted out, so let’s roll up our sleeves and start animating! To open the custom view SpectrumView file, first create two CagRadientLayers:

var leftGradientLayer = CAGradientLayer(a)var rightGradientLayer = CAGradientLayer(a)Copy the code

Create a new function, setupView(), and set their colors and Locations properties, which determine the color and location of the gradient layer, respectively. Add these properties to the layer layer of the view, which will host the animation of the left and right channels.

private func setupView(a) {
    rightGradientLayer.colors = [UIColor.init(red: 52/255, green: 232/255, blue: 158/255, alpha: 1.0).cgColor,
                                 UIColor.init(red: 15/255, green: 52/255, blue: 67/255, alpha: 1.0).cgColor]
    rightGradientLayer.locations = [0.6.1.0]
    self.layer.addSublayer(rightGradientLayer)
    
    leftGradientLayer.colors = [UIColor.init(red: 194/255, green: 21/255, blue: 0/255, alpha: 1.0).cgColor,
                                UIColor.init(red: 255/255, green: 197/255, blue: 0/255, alpha: 1.0).cgColor]
    leftGradientLayer.locations = [0.6.1.0]
    self.layer.addSublayer(leftGradientLayer)
}
Copy the code

The View initialization functions init(frame: CGRect) and init? (coder aDecoder: NSCoder) so that the SpectrumView is properly initialized when it is created either in code or in Storyboard.

override init(frame: CGRect) {
    super.init(frame: frame)
    setupView()
}
required init? (coder aDecoder:NSCoder) {
    super.init(coder: aDecoder)
    setupView()
}
Copy the code

The key is to define a Spectra property to receive the spectrum data and observe the UIBezierPath of didSet to create a bar graph of two sound channels, wrapped by CAShapeLayer and applied to the mask property of the respective CAGradientLayer. You get a gradient histogram.

var spectra:[[Float]]? {
    didSet {
        if let spectra = spectra {
            // left channel
            let leftPath = UIBezierPath(a)for (i, amplitude) in spectra[0].enumerated() {
                let x = CGFloat(i) * (barWidth + space) + space
                let y = translateAmplitudeToYPosition(amplitude: amplitude)
                let bar = UIBezierPath(rect: CGRect(x: x, y: y, width: barWidth, height: bounds.height - bottomSpace - y))
                leftPath.append(bar)
            }
            let leftMaskLayer = CAShapeLayer()
            leftMaskLayer.path = leftPath.cgPath
            leftGradientLayer.frame = CGRect(x: 0, y: topSpace, width: bounds.width, height: bounds.height - topSpace - bottomSpace)
            leftGradientLayer.mask = leftMaskLayer
    
            // right channel
            if spectra.count> =2 {
                let rightPath = UIBezierPath(a)for (i, amplitude) in spectra[1].enumerated() {
                    let x = CGFloat(spectra[1].count - 1 - i) * (barWidth + space) + space
                    let y = translateAmplitudeToYPosition(amplitude: amplitude)
                    let bar = UIBezierPath(rect: CGRect(x: x, y: y, width: barWidth, height: bounds.height - bottomSpace - y))
                    rightPath.append(bar)
                }
                let rightMaskLayer = CAShapeLayer()
                rightMaskLayer.path = rightPath.cgPath
                rightGradientLayer.frame = CGRect(x: 0, y: topSpace, width: bounds.width, height: bounds.height - topSpace - bottomSpace)
                rightGradientLayer.mask = rightMaskLayer
            }
        }
    }
}
Copy the code

Among them is the role of translateAmplitudeToYPosition function Y value of the amplitude into view coordinate system:

private func translateAmplitudeToYPosition(amplitude: Float) -> CGFloat {
    let barHeight: CGFloat = CGFloat(amplitude) * (bounds.height - bottomSpace - topSpace)
    return bounds.height - bottomSpace - barHeight
}
Copy the code

Go back to the ViewController and pass the received data directly to the spectrumView in the SpectrumPlayerDelegate method:

// MARK: SpectrumPlayerDelegate
extension ViewController: AudioSpectrumPlayerDelegate {
    func player(_ player: AudioSpectrumPlayer, didGenerateSpectrum spectra: [[Float]]) {
        DispatchQueue.main.async {
            //1: Passes data to spectrumView
            self.spectrumView.spectra = spectra
        }
    }
}
Copy the code

After all this code, I can finally run it and see what it looks like! Er… It doesn’t look very good. Relax, relax with a cup of coffee, and we’ll deal with it one at a time.

Adjusting and optimizing

The poor effect is mainly reflected in the following three points: 1) the lack of matching between animation and music rhythm; 2) Too many jagged images; 3) The animation flashes clearly. Let’s start with the first question:

Rhythm matches

Part of the reason for the poor fit is that the current animation range is so small, especially in the middle and high frequencies. Let’s start with a magnification of 5x to see the effect and modify the analyse function:

func analyse(with buffer: AVAudioPCMBuffer)- > [[Float]] {
    let channelsAmplitudes = fft(buffer)
    var spectra = [[Float]] ()for amplitudes in channelsAmplitudes {
        let spectrum = bands.map {
            //1: Multiply by 5 directly after this function call
            findMaxAmplitude(for: $0.in: amplitudes, with: Float(buffer.format.sampleRate)  / Float(self.fftSize)) * 5
        }
        spectra.append(spectrum)
    }
    return spectra
}
Copy the code

The low frequencies have a lot more energy than the middle frequencies, but actually the low ones don’t sound as obvious. Why is that? Here comes the concept of loudness:

Loudness is the perceived amount of the sound corresponding to the intensity of the sound. Sound intensity is an objective physical quantity while loudness is a subjective psychological quantity. Loudness is related not only to sound intensity, but also to frequency. Pure tones of different frequencies have different sound pressure levels when they sound the same as pure tones at a sound pressure level of 1000Hz. Such different sound pressure levels, as a function of frequency to form the curve, known as the isoloudness curve. By changing the SPL of this 1000Hz pure tone, a set of equal loudness curves can be obtained. The 0 square curve at the bottom represents the minimum loudness of a sound that humans can hear, which is the threshold of hearing. At the top is the loudest sound a human can tolerate, known as the pain threshold.

It turns out that the human ear is sensitive to different frequencies. Even if two sounds have the same SPL, they will experience different loudness if they have different frequencies. For this reason, it is necessary to use some kind of frequency meter weight to simulate it as it sounds to the human ear. Commonly used weight methods are A, B, C, D, etc., A weight is the most commonly used, compared to the low-frequency part of the weight has the most attenuation, here will also use A weight.

Create A new function createFrequencyWeights() in the RealtimeAnalyzer class, which returns an array of coefficients for weights A:

private func createFrequencyWeights(a)- > [Float] {
    letΔ f =44100.0 / Float(fftSize)
    let bins = fftSize / 2 // Return the size of the array
    var f = (0..<bins).map { Float($0) * δ f} f = f.map{$0 * $0 }
    
    let c1 = powf(12194.217.2.0)
    let c2 = powf(20.598997.2.0)
    let c3 = powf(107.65265.2.0)
    let c4 = powf(737.86223.2.0)
    
    let num = f.map { c1 * $0 * $0 }
    let den = f.map{($0 + c2) * sqrtf(($0 + c3) * ($0 + c4)) * ($0 + c1) }
    let weights = num.enumerated().map { (index, ele) in
        return 1.2589 * ele / den[index]
    }
    return weights
}
Copy the code

Update the code in the analyse function:

func analyse(with buffer: AVAudioPCMBuffer)- > [[Float]] {
    let channelsAmplitudes = fft(buffer)
    var spectra = [[Float]] ()//1: creates the weight array
    let aWeights = createFrequencyWeights()
    for amplitudes in channelsAmplitudes {
        //2: The original spectrum data is multiplied by the weights in turn
        let weightedAmplitudes = amplitudes.enumerated().map {(index, element) in
            return element * aWeights[index]
        }
        let spectrum = bands.map { 
            //3: The findMaxAmplitude function will look for the maximum value from the new 'weightedAmplitudes'
            findMaxAmplitude(for: $0.in: weightedAmplitudes, with: Float(buffer.format.sampleRate)  / Float(self.fftSize)) * 5
        }
        spectra.append(spectrum)
    }
    return spectra
}
Copy the code

Run the project again and see what happens. Better, right?

Sawtooth eliminate

Then there is the problem of excessive serrations, by pulling adjacent lengths and lengthening shorter ones, often using weighted averages. Create the function highlightWaveform() :

private func highlightWaveform(spectrum: [Float])- > [Float] {
    //1: define the weight array, 5 in the middle of the array represents its own weight
    // Can be changed arbitrarily, the number needs to be odd
    let weights: [Float] = [1.2.3.5.3.2.1]
    let totalWeights = Float(weights.reduce(0, +))
    let startIndex = weights.count / 2
    //2: the first few do not participate in the calculation
    var averagedSpectrum = Array(spectrum[0..<startIndex])
    for i instartIndex.. <spectrum.count - startIndex {
        / / 3: zip: zip ([a, b, c], [x, y, z]) - > [(a, x), (b, y), (c, z)]
        let zipped = zip(Array(spectrum[i - startIndex...i + startIndex]), weights)
        let averaged = zipped.map{$0.0 * $0.1 }.reduce(0, +) / totalWeights
        averagedSpectrum.append(averaged)
    }
    //4: the last few do not participate in the calculation
    averagedSpectrum.append(contentsOf: Array(spectrum.suffix(startIndex)))
    return averagedSpectrum
}
Copy the code

The analyse function needs to be updated again:

func analyse(with buffer: AVAudioPCMBuffer)- > [[Float]] {
    let channelsAmplitudes = fft(buffer)
    var spectra = [[Float]] ()for amplitudes in channelsAmplitudes {
        let weightedAmplitudes = amplitudes.enumerated().map {(index, element) in
            return element * weights[index]
        }
        let spectrum = bands.map {
            findMaxAmplitude(for: $0.in: weightedAmplitudes, with: Float(buffer.format.sampleRate)  / Float(self.fftSize)) * 5
        }
        //1: Call highlightWaveform before adding to array
        spectra.append(highlightWaveform(spectrum: spectrum))
    }
    return spectra
}
Copy the code

Flashing optimization

The flash animation feels like a frame loss. The reason for this problem is that the value of the frequency band changes too much in the two frames. We can cache the value of the last frame and then carry out the process with the value of the current frame. Yes, weighted average again! $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $

// Cache the value of the previous frame
private var spectrumBuffer: [[Float]]?
// The larger the value, the slower the animation will be.
public var spectrumSmooth: Float = 0.5 {
    didSet {
        spectrumSmooth = max(0.0, spectrumSmooth)
        spectrumSmooth = min(1.0, spectrumSmooth)
    }
}
Copy the code

Next modify the analyse function:

func analyse(with buffer: AVAudioPCMBuffer)- > [[Float]] {
    let channelsAmplitudes = fft(buffer)
    let aWeights = createFrequencyWeights()
    //1: initializes the spectrumBuffer
    if spectrumBuffer.count= =0 {
        for _ in 0..<channelsAmplitudes.count {
            spectrumBuffer.append(Array<Float>(repeating: 0.count: frequencyBands))
        }
    }
    //2: index is used when assigning the spectrumBuffer
    for (index, amplitudes) in channelsAmplitudes.enumerated() {
        let weightedAmp = amplitudes.enumerated().map {(index, element) in
            return element * aWeights[index]
        }
        var spectrum = bands.map {
            findMaxAmplitude(for: $0.in: weightedAmplitudes, with: Float(buffer.format.sampleRate)  / Float(self.fftSize)) * 5
        }
        spectrum = highlightWaveform(spectrum: spectrum)
        //3: the use of zip has been described previously
        let zipped = zip(spectrumBuffer[index], spectrum)
        spectrumBuffer[index] = zipped.map{$0.0 * spectrumSmooth + $0.1 * (1 - spectrumSmooth) }
    }
    return spectrumBuffer
}
Copy the code

Run the project again to get the final result:

At the end

The animation implementation of the audio spectrum is now complete. I have no experience in audio and acoustics before, and the methods and theories involved in the two articles are all referenced from the Internet. There must be many mistakes, welcome to correct them.

Reference [1] wikipedia, octave frequency band, en.wikipedia.org/wiki/Octave… [2] Wikipedia, Loudness, zh.wikipedia.org/wiki/%E9%9F… [3] mathworks, A – weighting Filter with Matlab, www.mathworks.com/matlabcentr… [4] Animation effect: netease Cloud Music APP, MOO Music APP. If you are interested, you can compare the music in Canon piano edition with these two apps and find the difference.