Vx Search for “gjZKeyFrame” Follow “Keyframe Keyframe” to get the latest audio and video technical articles in time.
“Sound” is a physical phenomenon that we are all too familiar with. We make sounds when we sing, we hear sounds with our ears, we record and share sounds with our phones; As audio and video developers, we also deal with a lot of sound data in our work. But do you really know anything about sound?
In the previous article, “Representation of Sound (1)”, we raised the question: what happens when the human body goes from the “sound” we hear to the “audio data” we process with our phones and computers? ** Starting from this problem, we discussed “what is the definition of sound” and “what are the characteristics of sound”. Then we went on to discuss the next problem: “how to describe sound mathematically”.
3. How to describe sound mathematically?
Now that we have a definition of sound, and we have the characteristics of sound, then we can talk about the mathematical description of the characteristics.
3.1 Mathematical description of loudness
Loudness is a subjective mental quantity that reflects the intensity of sound perceived by the human ear, according to which sound can be arranged in a sequence from light to loud.
Corresponding objective physical quantities related to sound intensity include sound intensity and sound pressure. And to understand sound intensity, we need to understand the concept of “sound energy.”
Sound energy is the energy added to the medium when sound is transmitted in the medium. Since the sound wave is the vibration of a particle out of equilibrium, the sound energy is defined as the sum of the kinetic energy of the particle’s vibration and the potential energy of the particle out of equilibrium, in watts (W).
Sound intensity is the average sound energy passing through a unit area perpendicular to the direction of sound wave propagation in unit time, denoted by I. Sound intensity is measured in watts per square meter (W/m²). The range of sound intensity allowed by human ear is 0.000000000001~1 W/m², which is too large. In addition, psychophysical studies show that people’s perception of sound intensity is not directly proportional to sound intensity, but to its logarithm, so we introduce “sound intensity level” to represent sound intensity.
Although sound intensity is theoretically an objective measure of the amplitude of a sound wave at a point, and can be measured to obtain its value, it is not a quantity that is often used in everyday work to describe the amplitude of sound. Because the human ear is pressure-sensitive and pressure or pressure is relatively easy to measure in the field, sound pressure is more used to represent the amplitude of sound waves in practice at present.
Sound pressure refers to the pressure change caused by the vibration of sound waves as they pass through a medium. The unit is “Newtons per square meter (N/m²)” or “Pascals (Pa)”, denoted by P. When the sound is transmitted in the air, the vibration of the object drives the vibration of the surrounding air, forming the fluctuation of density and density, so the increment of pressure change is positive and negative alternately. The root mean square value of sound pressure is usually taken, which is called effective sound pressure. If not stated, the sound pressure usually referred to is the effective sound pressure. Sound pressure ranges from 0.00002 to 20 N/m², which is too large. Similarly, people’s perception of sound intensity is directly proportional to the logarithm of sound pressure, so we introduce “sound pressure level” to denote sound pressure.
The relation between sound pressure and sound intensity: in the free sound field, the sound intensity of a certain place is directly proportional to the square of the sound pressure there, and inversely proportional to the product of the density of the medium and the sound speed.
// Because of the format display problem, the complete content here is as follows:
The so-called degree is a dimensionless quantity that makes relative comparison. For example, sound intensity level and sound pressure level.
Sound Intensity Level (SIL) **** is measured in decibels (dB) by multiplying the logarithm of the ratio of any Sound Intensity to 10-12 W/m². Why do I multiply by 10 here? This comes from the definition of the “bell,” which is too large, and the “decibel,” which gives you a relatively small value; So take a tenth of that in decibels and magnify the calculated value to make it easier to see the difference between the values.
// Because of the format display problem, the complete content here is as follows:
Sound Pressure Level (SPL) is defined as 2×10-5 N/m². The unit of SPL is decibels (dB). Why is this 20? The formula for the relation between sound pressure and sound intensity mentioned above can be put into the formula for calculating sound intensity level to obtain the formula for calculating sound pressure level.
// Because of the format display problem, the complete content here is as follows:
The perception of sound by the human ear is related to sound pressure, but it is not only related to sound pressure, but also related to frequency. Sounds with the same SPL and different frequencies will sound different in loudness.
To quantitatively estimate the loudness of a pure tone, it is possible to compare the loudness of the pure tone with that of a certain sound pressure level at 1000 Hz. When these two sounds are considered to be of the same loudness acoustically, the sound pressure level of 1000 Hz pure tone can be defined as the loudness level of the pure tone of the frequency. The loudness level is set in the unit of Phon.
For example, a pure tone with a frequency of 1000 Hz must have a sound pressure level of 40 dB SPL if it is to have a loudness of 40 square, according to the loudness curve.
In the picture below, the horizontal coordinate is frequency and the vertical coordinate is sound pressure level. The waves are equal loudness contours. These curves represent the relationship between the frequency and sound pressure level of a sound in the same loudness level.
Loudness level takes into account both the physical effect of sound and the auditory physiological effect of human ear, indicating the subjective evaluation of sound by human ear.
** What we call decibels are sound pressure levels. * * such as:
// Because of the format display problem, the complete content here is as follows:
For example, the sound of an airplane taking off is 120 decibels. If we know the frequency of the corresponding sound, we can know its loudness level.
3.2 Mathematical description of tones
Pitch is the subjective perception of sound intensity by the human ear. The objective evaluation scale corresponding to tones is the “frequency” of sound waves. The pitch is determined by the vibration frequency, and the two are positively correlated.
We are familiar with the measurement of frequency, the unit is Hertz (Hz). So how is pitch measured? One measurement method is to call the unit of tone “MEL”, take the tone of pure tone with frequency of 1000 Hz and sound pressure level of 40 dB as the standard, called 1000 MEL, and other pure tones, the tone that sounds twice as high is called 2000 MEL, the tone that sounds twice as low is called 500 MEL, and so on. A tonal scale can be established over the entire audible frequency.
The tone of musical notes (polyphony) is more complex and can generally be considered to be determined primarily by the frequency of the pitch.
The corresponding relationship between tones and frequencies is shown as follows:
Below 500 Hz, the tone and frequency are almost linear, but at middle and high frequencies they are numerically related.
In addition, tones are usually recorded using “scientific tone notation” or a combination of letters and numbers (used to indicate the fundamental frequency).
Two notes that differ in frequency by an integer multiple sound very similar. Therefore, we put these sounds together in the same “tonal set”. ** If two notes are twice as different in frequency, they are said to be an octave apart. ** To describe a note completely, it is necessary to state both its category and which octave it is in. In traditional music theory, we use the first seven Latin letters: A, B, C, D, E, F, and G (in this order the notes are pitched upward), as well as variations (see below for more details) to indicate the different notes. The names of these letters are repeated over and over again, with an A (an octave higher than the previous A) above the G. To identify notes with the same name (in the same set of tones) but at different heights, the Scientific Pitch Notation ** specifies the position of the note using letters and an Arabic numeral used to indicate the octave. For example, the current standard pitch at 440 Hertz is CALLED A4 and goes up an octave to A5, extending indefinitely; As for A4 below, it is A3, A2, etc. Traditionally, for historical reasons, the number notation for the octave begins with the C note and ends with the B: C, D, E, F, G, A, B (in this order, the notes go up).
Sometimes we also add inflection marks, such as sharps and flats, to phonetic names. These symbols represent raising or lowering the original tone by a semitone, or in the case of twelve-equal temperament (now the most widely used tuning method) multiplying or dividing the original frequency by 2(1/12)=1.0594, i.e., multiplying the original frequency by 2(n/12) for raising n semitones and multiplying it by 2(-n/12) for lowering it by n semitones. Sharps are ♯ and flats are ♭. They are usually written after phonetic names, such as F♯ for F sharp and B♭ for B flat. Other diacritics, such as sharping or sharping (raising or lowering the original note by one whole note, or two semitones), are also used in traditional music. In the case of enharmonicity, we can use metaphone notation to remember the same tone as different notes. For example, the change of B sharp to B♯ is a homophone of C. However, by eliminating these heteronyms, the full chromatic scale adds five sets of tones to the original seven, and any two adjacent sets of tones differ by a semitone.
Notice that seven whole notes have only five semitones. There is no semitone between E and F, B and C. Specifically, there are 12 semitones in an octave. Seven of them (CDEFGAB) are called natural sounds and the other five are called inflected sounds. Natural sounds are usually separated by two semitones (two semitones apart can be called a whole tone), and some natural sounds (E and F, B and C) are separated by only one semitone.
The following diagram completely shows the chromatic scale up within an octave from C4 (center C) :
Common international score, male score, female score partial mark and frequency comparison table:
// Because of the format display problem, the complete content here is as follows:
The loudness of a sound indicates the volume of the sound, and the pitch indicates the frequency of the sound. These two are relatively easy to understand.
3.3 mathematical description of timbre
So how do you understand the timbre of a sound?
The real sound waveform is not a simple sine wave, but a complex wave. This complex waveform can be decomposed into a series of sine waves, which have fundamental frequency F0, which corresponds to the pitch of the sound, and harmonics with integer multiples of F0: F1, F2, F3, F4, etc., which correspond to the overtones of the sound, and their amplitudes have a specific proportion. This particular ratio gives each sound its character, which is timbre. If there is no harmonic component, the pure fundamental frequency sinusoidal signal is musical. Therefore, the frequency range of Musical Instruments includes fundamental and harmonics.
The pitch of the sound we talked about in the last video is determined by the fundamental frequency corresponding to the pitch. This is why singing the same tone, different people’s timbre is completely different: they are just the same fundamental frequency, harmonics are completely different.
Therefore, the sound timbre is determined by the harmonic spectrum, which can also be said to be determined by the sound waveform.
(Through the discussion above, we know how to mathematically describe the loudness, pitch and timbre characteristics of sound. In this process, many physical quantities and concepts are introduced, such as sound energy, sound intensity, sound pressure, sound intensity level, sound pressure level and loudness level related to loudness. Tones related to frequency, scientific tone notation, twelve-equal temperament, etc. Fundamental, pitch, harmonic, overtone, etc. related to timbre. These physical quantities and concepts are the tools and Bridges for the mathematical description of sound, and the mathematical model established based on these physical quantities and concepts is the basis of sound digitization. We will continue to discuss the digitalization of sound in the future, so stay tuned.)
Recommended reading
Representation of Sound, part 1: Definition and Characteristics of Sound