A list,
In speech denoising is the most commonly used method in spectral subtraction, spectral subtraction is a kind of development and application of a mature early speech denoising algorithms, the algorithm using the additive noise and not related to the characteristics of the voice, under the hypothesis that the smooth noise is statistics, with no clearance measurement to replace a speech during the noise spectrum estimate of the noise spectrum, with the voice signals with noise spectrum subtraction, Thus the speech spectrum is estimated. Spectral subtraction is widely used because of its simple algorithm and small amount of computation. It is easy to realize fast processing and can obtain high output signal-to-noise ratio. The shortcoming of the classical form of the algorithm is that the “music noise” with certain rhythmic fluctuation will be generated after processing.
When converted to the frequency domain, these peaks sound like multiple tones with random frequency changes from frame to frame. This is particularly pronounced in silent segments. This “noise” due to half-wave rectification is called “musical noise”. Fundamentally, the main causes of music noise are:
(1) The negative part of spectral subtraction algorithm is treated nonlinear
(2) Inaccurate estimation of noise spectrum
(3) Suppression function (gain function) has great variability
1 the principle
2 the flow chart
Disadvantages of spectral subtraction
1) As a result of half-wave rectification of negative values, small, independent peaks appear at random frequencies in the frame spectrum, which translate into the time domain. These peaks sound like multiple trills with random frequency variations from frame to frame, commonly known as “Musical Noise”.
2) In addition, the spectral subtraction method also has a small disadvantage that it uses the phase of noisy speech as the phase of the enhanced speech, so the quality of the generated speech may be rough, especially under the condition of low signal-to-noise ratio, which may reach the level of auditory perception and reduce the quality of speech.
In order to better understand spectral subtraction speech enhancement, a simple simulation of the algorithm is carried out here, and the simulation parameters are set as follows
Ii. Source code
% % three methods of speech enhancement test scripts % * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * % in audioread function can set read the speech signal % change the SNR value add noise % [Input, Fs] = audioread('sp01.wav');
Time = (0:1/Fs:(length(Input)- 1)/Fs)'; % Input = Input(:,1); %SNR is the signal-to-noise ratio (dB) of added noise and pure signal; [NoisyInput,Noise] = add_noise(Input,SNR); [spectruesub_enspeech] = spectruesub(NoisyInput); [wiener_enspeech] = wienerfilter(NoisyInput); [Klaman_Output] = kalman(NoisyInput,Fs,Noise); sound(Input,8000) sound(spectruesub_enspeech,8000) sound(wiener_enspeech,8000) sound(Klaman_Output,8000) %% Sig_len =length(spectruesub_enspeech); NoisyInput=NoisyInput(1:sig_len); Input=Input(1:sig_len); wiener_enspeech=wiener_enspeech(1:sig_len); Klaman_Output=Klaman_Output(1:sig_len); Time = (0:1/Fs:(sig_len-1)/Fs)';
% Time= ((0:1/Fs:(sig_len)- 1)/Fs)'; figure(1) MAX_Am(1)=max(Input); MAX_Am(2)=max(NoisyInput); MAX_Am(3)=max(spectruesub_enspeech); Subplot (3,1,1); plot(Time, Input) ylim([-max(MAX_Am),max(MAX_Am)]); xlabel('Time')
ylabel('Amlitude')
title('Raw signal')
subplot(3.1.2);
plot(Time, NoisyInput)
ylim([-max(MAX_Am),max(MAX_Am)]);
xlabel('Time')
ylabel('Amlitude')
title('Add noise signal') %% wiener draws % Time_wiener = (0:1/Fs:(length(wiener_enspeech)- 1)/Fs)'; figure(2) MAX_Am(1)=max(Input); MAX_Am(2)=max(NoisyInput); MAX_Am(3)=max(wiener_enspeech); Subplot (3,1,1); plot(Time, Input) ylim([-max(MAX_Am),max(MAX_Am)]); xlabel('Time')
ylabel('Amlitude')
title('Raw signal')
subplot(3.1.2);
plot(Time, NoisyInput)
ylim([-max(MAX_Am),max(MAX_Am)]);
xlabel('Time')
ylabel('Amlitude')
title('Add noise signal'Function [wiener_enspeech] = wienerFilter (testSignal) % testSignal = testSignal';
frame_len=256; Long step_len = % frame0.5*frame_len; % the step size at frame splitting, equivalent to overlapping50%
wav_length=length(testsignal);
R = step_len;
L = frame_len;
f = (wav_length-mod(wav_length,frame_len))/frame_len;
k = 2*f- 1; % Frames h =sqrt(1/101.3434)*hamming(256)'; The % Hanning window is multiplied by the coefficient because it is required to compound conditions; % testsignal = testsignal(1:f*L); % signal= signal(1:f*L); win = zeros(1,f*L); % Sets the initial value; wiener_enspeech = zeros(1,f*L); % -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- frame -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- for r = 1: k = y testsignal (1 + r (r - 1) * : L + (r - 1) * r); % For half of the overlap between noisy speech frames; y = y.*h; % Window each frame obtained; w = fft(y); % Take the Fourier transform for each frame; Y(1+(r-1)*L:r*L) = w(1:L); % put the Fourier transform in Y; End % -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- estimated NOISE -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- whose = stationary_noise_evaluate (Y, L, k); % % noise minimum tracking algorithm for each frame of the Fourier transform and noise estimation are shown above % -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - the winner -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- for t = 1: k X = abs(Y).^2; S=max((X(1+(t-1)*L:t*L)-NOISE(1+(t-1)*L:t*L)),0); G_k=(X(1+(t-1)*L:t*L)-NOISE(1+(t-1)*L:t*L))./X(1+(t-1)*L:t*L); S = sqrt(S); A1=G_k.*S; A = Y(1+(t-1)*L:t*L)./abs(Y(1+(t-1)*L:t*L)); % phase with noise in speech; S = A1.*A; Because the human ear has no obvious sense of phase, phase information of noisy speech is used in recovery. s = ifft(S); s = real(s); % takes the real part; wiener_enspeech(1+(t-1)*L/2:L+(t-1)*L/2) = wiener_enspeech(1+(t-1)*L/2:L+(t-1)*L/2)+s; % is superimposed and added in the real field; win(1+(t-1)*L/2:L+(t-1)*L/2) = win(1+(t-1)*L/2:L+(t-1)*L/2)+h; % window stack addition; end wiener_enspeech = wiener_enspeech./win; % remove the speech with enhanced gain caused by windowing; wiener_enspeech=wiener_enspeech'; End function [Output] = Kalman (NoisyInput,Fs,Noise0.0025; % window2.5ms
Hop_Percent = 1; % window shift ratio AR_Order =20; % Autoregressive filter stage Num_Iter =7; % iteration times % [Input, Fs] = Audioread ('six.wav');
% Input = Input(:,1);
% Noise = normrnd(0.sqrt(0.001),size(Input));
% NoisyInput = Input + Noise;
% Time = (0:1/Fs:(length(Input)- 1)/Fs)'; Len_WinFrame = fix(Len_windowT * Fs); Window = ones(Len_WinFrame,1); [Frame_Signal, Num_Frame] = KFrame(NoisyInput, Len_WinFrame, Window, Hop_Percent); % H = [zeros(1, ar_order-1),1]; % observation matrix R = var(Noise); % noise variance [FiltCoeff, Q] = LPC (Frame_Signal, AR_Order); % LPC prediction, the filter coefficient P = R * eye(AR_Order,AR_Order); Output = zeros(1,size(NoisyInput,1)); Output(1:AR_Order) = NoisyInput(1:AR_Order,1)'; OutputP = NoisyInput(1:AR_Order,1); % Number of iterators. I = AR_Order+1;
j = AR_Order+1; % kalman filteringfor k = 1:Num_Frame % Handles one frame at a time jStart = j; % tracks each iteration AR_Order+1OutputOld = OutputP; % Retain the first batch of AROrder preestimates for each iterationfor l = 1:Num_Iter
A = [zeros(AR_Order- 1.1) eye(AR_Order- 1); fliplr(-FiltCoeff(k,2:end))];
forIi = I :Len_WinFrame %Kalman filter basic equation OutputC = A * OutputP; Pc = (A * P * A') + (H' * Q(k) * H);
K = (Pc * H')/((H * Pc * H') + R);
OutputP = OutputC + (K * (Frame_Signal(ii,k) - (H*OutputC)));
Output(j-AR_Order+1:j) = OutputP';
P = (eye(AR_Order) - K * H) * Pc;
j = j+1;
end
i = 1;
ifl < Num_Iter j = jStart; OutputP = OutputOld; End % LPC [FiltCoeff(k,:), Q(k)] = LPC (Output(k)- 1)*Len_WinFrame+1:k*Len_WinFrame),AR_Order);
end
end
Copy the code
3. Operation results
Fourth, note
Version: 2014 a