directory

1. Spectral subtraction principle

2. Realization of spectral subtraction

3. Realization of Berouti spectral subtraction

Speech noise reduction is the initial step of speech signal processing, and there are many mature algorithms. As a classical noise reduction algorithm, spectral subtraction is simple to implement and fast to run, and is widely used in the field of speech noise reduction.

1. Spectral subtraction principle

Spectrum subtraction, as the name implies, is to subtract the spectrum of the noise signal from the spectrum of the noisy signal (both the amplitude spectrum and the power spectrum can be used). Spectral subtraction is based on a simple assumption: assuming that the noise in speech is only additive noise, pure speech can be obtained by subtracting the spectrum of speech with noise, on the premise that the noise signal is stable or slowly changing. This assumption is based on the short-time spectrum (25ms), which means that the spectrum is stationary for a short time. Assuming that the noisy signal Y is composed of pure signal X and additive noise D, the power spectrum of the pure signal can be obtained by the following formula:

The key of spectral subtraction is the estimation of noise spectrum. It is generally believed that the first few frames of speech signal have no speech activity, so the first few frames of speech signal are pure noise signal, so the first few frames of speech signal are used to estimate the noise spectrum.

2. Realization of spectral subtraction

A simple spectral subtraction implementation consists of the following steps:

  1. The average noise spectrum is calculated according to the first few frames of the speech signal as the noise spectrum, generally taking the first 5 frames

  2. The speech is divided into frames, each frame is STFT, and then the noise spectrum is subtracted from the signal spectrum

  3. If negative values occur, set them to 0

  4. The time domain signal is obtained by istFT of the signal subtracted from the noise spectrum

  5. The time domain signal is reconstructed according to the speech frame length and frame shift length

In the spectral subtraction algorithm, if the amplitude spectrum of noisy speech (power spectrum is the same principle) subtracts from the estimated noise spectrum with a negative value, it indicates that the noise has been overestimated. The simplest way to deal with this phenomenon is to set the negative value to 0 to ensure the non-negative amplitude spectrum. But this processing of negative values results in small, independent peaks in random locations of the signal frame spectrum, known as musical noise. Berouti proposed an improved method to solve the problem of music noise.

3. Realization of Berouti spectral subtraction

The method to reduce music noise is to use the over-subtraction technology on the noise spectrum, and set a lower limit for the negative values after subtraction, instead of setting them as 0, which can be expressed as:

Alpha (greater than or equal to 1) is the overreduction factor, which mainly affects the distortion degree of speech spectrum. Beta (greater than 0 and less than 1) is the spectrum lower limit parameter, which can control the amount of residual noise and the size of music noise. During spectral subtraction, beta is a fixed value, while alpha is calculated based on the SNR of the current frame. Berouti spectrum subtraction consists of the following steps:

  1. The average noise spectrum is calculated according to the first few frames of the speech signal as the noise spectrum, generally taking the first 5 frames

  2. The speech is divided into frames and each frame is STFT

  3. Compute the SNR of the current frame and then compute the subtractive factor alpha

  4. The signal is processed according to formula (2)

  5. The time domain signal is obtained by istFT of the signal subtracted from the noise spectrum

  6. The time domain signal is reconstructed according to the speech frame length and frame shift length

Finally, let’s look at the effect of spectral subtraction, as shown below. It can be seen that Berouti’s improved method is superior to the original spectral subtraction method.

Original speech waveform:

Waveform after original spectrum subtraction:

Signal waveform after Berouti spectrum subtraction:

Pay attention to the voice algorithm group of wechat public account and click More->Code on the menu bar of wechat public account to obtain the Code