Disclaimer: Original article, shall not be reproduced without permission.
Audio visualization is a very “beautiful” topic, and its complexity depends heavily on the visual scheme (some examples), which determines your technical scheme, such as three.js, pixi.js, etc.
No matter what rendering scheme you choose, the part of processing audio signals is the same. This article will elaborate on the processing of audio signals, hoping to popularize the basic knowledge of audio (due to the limited ability to avoid mistakes, welcome to point out).
The first five parts are mainly theoretical and basic concepts that you can skip if you are not interested.
- Github address: sound-processor
- Three examples (audio is slow to load on Git and takes a little longer to load) :
- Not;
- Demo2;
- Demo3.
What is sound?
Sound comes from vibration and spreads through sound waves. Countless hair cells in the human ear will convert vibration signals into electrical signals and transmit them to the brain through auditory nerve, forming “sound” in human subjective consciousness. After sound waves enter the ear, different parts of the cochlea are sensitive to sound differently because of the special structure:
High frequency sounds are perceived near the root of the cochlea, while low frequency sounds are perceived near the end of the cochlea. Therefore, people’s perception of different frequencies is non-linear, which is the basis of subsequent acoustic weight measurement.
Two, acoustic meter
Acoustic weights commonly include frequency weights and time weights, and their function is to simulate the nonlinear experience of human ear to different frequencies of sound:
- Insensitive to low frequency sound;
- The most sensitive area is in
1~5K Hz
Between; - Upper limit on
15~20K Hz
Between;
The auditory range of human ear is shown in the figure below:
2.1 Frequency meter weight
Frequency meter weight is acting on the frequency spectrum of audio signals, commonly used are: A, B, C, D four:
Among them, A weighting is closest to people’s subjective feeling, which will weaken the insensitive part of human ear in low and high frequency parts. Therefore, A weighting method should be selected in audio visualization, and detailed explanation can be read in Wiki.
2.2 Time weighting
Reality where the sound is continuous, voice of feelings is also in charge of the result of the accumulation (imagine, caused the eardrum vibrates, the first wave of acoustic vibration can not stop, the second wave of the voice came, so the actual vibration of the eardrum is the result of sound waves on the time accumulation), time weighting is the median of the sound is continuous time. For the signal that changes quickly, we can use the interval of 125ms to average, and for the signal that changes slowly, we can use the interval of 1000ms.
Three, sound measurement
Sound Pressure Level (SPL) is the most commonly used physical quantity in Sound measurement. The audible sound pressure range of human ear is 2×10-5Pa~20Pa, and the corresponding sound pressure level range is 0~120dB.
Sound pressure of a common sound
Sound pressure is often measured in decibels, and it should be noted that decibels are only a measure in themselves, representing the logarithmic ratio of the measured value to the reference value:
Definition of sound pressure level:
Where P is the measurement amplitude, and P ref represents the minimum sound pressure that the ear can hear at 1000 Hz: 20 uP.
Fourth, frequency range
First of all, continuous signals contain a large amount of data, so it is not necessary for us to process all of them. Therefore, we generally conduct sampling and divide continuous frequencies into intervals for analysis. Frequency range represents a frequency range, and frequency octaving represents a scheme of frequency division. Specifically, the ratio of the upper frequency to the lower frequency of a period of octaving is constant:
See this article, “What is octave?”
When N equals one, it’s one octave, or octave for short, and if N equals two, it’s one half octave. After the frequency path is well divided, the mean square of the spectrum distributed in the frequency path is worth obtaining the power spectrum of frequency octave:
Five, Webaudio audio processing
Audio visualization on the Web can’t be done without WebAudio API, the most important of which is getByteFrequencyData (document). This method can obtain the frequency domain signal after the time domain signal conversion. The detailed process is as follows:
- Obtain the original time domain signal;
- For its application
Blackman window
(Blackman window function), its function is to compensate the signal distortion and energy leakage caused by DFT; - The Fast Fourier Transform transforms the time domain into the frequency domain;
- Smooth over time, which is a weighted average of the signal over time (Webaudio only takes 2 frames);
- Convert to dB according to the above sound pressure formula;
- Normalization. Webaudio uses the following normalization methods:
Signal processing scheme in audio visualization
Combined with the above content, we think the reasonable way to deal with it is as follows:
6.1 the filter
One might say, well, getByteFrequencyData already has window filtering inside it, why do we need to filter it again?
Because the window function inside Webaudio is mainly used to compensate for signal distortion and energy leakage, its parameters are fixed. In audio visualization scenarios, visual perception often takes precedence over data accuracy, so we add a Gaussian filter to filter out spurts and smooth signals, and the degree of “smooth” can be arbitrarily controlled by parameters.
Right of 6.2 meter
Visual presentation should be associated with human subjective hearing, so weighting is necessary, JavaScript weighting implementation AudioJS/A – Weighting. In addition, we also provide additional time weighting, and we will calculate 5 historical data internally for average.
6.3 Frequency division
We will automatically divide the frequency range according to the incoming upper and lower frequency ranges and the number of top output frequency bands. The core code:
// Determine the frequency multiplier according to the frequency band number: N
// fu = 2^(1/N)*fl => n = 1/N = log2(fu/fl) / bandsQty
let n = Math.log2(endFrequency / startFrequency) / outBandsQty;
n = Math.pow(2, n); // n = 2^(1/N)
const nextBand = {
lowerFrequency: Math.max(startFrequency, 0),
upperFrequency: 0
};
for (let i = 0; i < outBandsQty; i++) {
// The upper frequency of the band is 2^n times the lower frequency
const upperFrequency = nextBand.lowerFrequency * n;
nextBand.upperFrequency = Math.min(upperFrequency, endFrequency);
bands.push({
lowerFrequency: nextBand.lowerFrequency,
upperFrequency: nextBand.upperFrequency
});
nextBand.lowerFrequency = upperFrequency;
}
Copy the code
Seven, the sound – processor
Sound processor is a small (gzip < 3KB) library for processing audio signals. As the underlying part of audio visualization, sound-processor uses a relatively scientific method to process original audio signals and output signals consistent with human subjective hearing. The internal processing process is as follows:
7.1 installation
npm install sound-processor
Copy the code
7.2 the use of
import { SoundProcessor } from "sound-processor";
const processor = new SoundProcessor(options);
// in means original signal
// analyser is the AnalyserNode
const in = new Uint8Array(analyser.frequencyBinCount)
analyser.getByteFrequencyData(in);
const out = processor.process(in);
Copy the code
7.3 the options
filterParams
: Filter parameter, object, defaultundefined
, indicates no filtering:sigma
: Sigma parameter of the Gaussian distribution. The default value is 1, indicating the standard normal distribution. The larger the sigma is, the more obvious the smoothing effect is0.1 ~ 250
Between;radius
: Filter radius, default is 2;
sampleRate
: sampling rate, which can be taken from the Webaudio context (audioContext.sampleRate
), typically 48,000;fftSize
: Fourier transform parameter, default is 1024;startFrequency
: Start frequency, default is 0.endFrequency
: Cut-off frequency, default 10000, cooperatestartFrequency
Can select any frequency band signal;outBandsQty
: Number of output bands, corresponding to the number of visual objects, default isfftSize
The half;tWeight
: Indicates whether to enable time metering. The default value isfalse
;aWeight
: Indicates whether to enable A. The default value istrue
;
7.4 Frequency interception
Generally, the frequency range of music is between 50 and 10000 Hz, but in practice, it can be smaller, such as 100 to 7000 Hz. It is difficult to get a unified perfect range for the sounds of different styles and instruments. In addition, different visual styles may also affect the frequency range.
Reference material
- Why acoustic weighting
- A-weighting
- What is sound pressure level?
- What is octave?
- AnalyserNode.getByteFrequencyData
- Step by step to teach you how to achieve iOS audio spectrum animation
- One-dimensional Gaussian distribution