A list,
Voice communication is one of the most natural and basic means of human communication. However, the speech signal, the information carrier in this communication, is a time-varying, non-stationary signal, which is considered to be stationary only in a very short period of time (usually 10~30ms). In the process of speech generation, processing and transmission, it is inevitable to be interfered by environmental noise, which greatly reduces the performance of speech signal processing system, such as speech coding and speech recognition system. In order to improve speech quality and intelligibility, various speech enhancement methods are adopted to suppress background noise according to the characteristics of speech and noise. However, speech signal denoising is a very complex problem, we must consider the characteristics of speech itself, the characteristics of the ever-changing noise, the characteristics of the human ear to speech perception and how the brain to process signals and other problems, so, the study of speech enhancement technology is an eternal topic in speech signal processing.
Although the theory and methods of speech signal denoising are far from being solved, researchers have proposed many methods for additive noise for different noises and applications in the past 40 years. Popular speech enhancement methods include Wiener filter, Kalman filter, spectral subtraction and adaptive filter. Wiener filter is an optimal estimation based on minimum mean square error under stationary condition, but it is not suitable for speech signal. Kalman filter overcomes the stationary condition of Wiener filter and can ensure the optimal minimum mean square error under non-stationary condition, but only applies to voiceless voice. Spectral subtraction is a common method, but in the case of low signal-to-noise ratio, the localization and naturalness of speech are damaged greatly, and musical noise is generated in speech reconstruction. Adaptive filtering is one of the most effective speech enhancement methods, but it is difficult to use in practice because of the need for a reference noise source that is difficult to obtain in the actual environment, and the same as spectral subtraction with music noise. At the same time, all the above methods need to know some characteristics or statistical characteristics of noise when performing speech enhancement. However, it is difficult to extract speech signals from noisy speech signals without prior knowledge of noise.
Wavelet transform is a time-frequency local analysis method developed rapidly in recent 10 years. It overcomes the shortcoming of fixed resolution of SHORT-time Fourier transform and can decompose the signal in multi-scale and multi-resolution. The wavelet coefficients obtained from the decomposition in each scale represent the information of the signal in different resolution. At the same time, wavelet transform is very similar to the auditory characteristics of the human ear, which is convenient for researchers to use the auditory characteristics of the human ear. It is a powerful tool to analyze the non-stationary signal of speech, so in recent years, many researchers use wavelet transform to process speech signals. The principle of wavelet transform denoising is that the energy of speech signal is concentrated in the low frequency band, while the noise energy is mainly concentrated in the high frequency band. In this way, the noise wavelet components on those scales where the noise wavelet coefficients are the main components can be zeroed or given a small weight, and then the processed wavelet coefficients can be used to reconstruct and restore the signal. At the same time, along with the development of the theory of wavelet transform, wavelet transform denoising constantly enrich, and achieved good effects, such as the 1992 Mallat is proposed by using wavelet transform modulus maxima de-noising, Donobo was proposed in 1995 by the nonlinear wavelet threshold denoising, this approach makes get spent used wavelet denoising, attracts many researchers.
1. Wavelet decomposition In speech enhancement, the purpose of signal decomposition is to concentrate the signal energy on a few coefficients in some frequency bands, so as to effectively suppress noise. In the wavelet transform method, researchers generally use orthogonal wavelet, because the orthogonal wavelet transform can remove the correlation in the original signal to the maximum extent, and concentrate its energy on a few sparse wavelet coefficients with relatively large amplitude. Wavelet decomposition only decomposes the low frequency components step by step, and does not decompose the high frequency components any more. This decomposition method can not meet the needs of both good time resolution and good frequency resolution, so researchers began to use orthogonal wavelet packets to decompose speech signals, so as to facilitate the use of human ear auditory masking effect for speech enhancement. If you have any literature the use of wavelet packet algorithm have flexible time-frequency analysis ability, and can better right ear frequency analysis of basement membrane of choose and employ persons, according to the transformation of the relationship between scale Bark and frequency scale, with fixed wavelet packet decomposition method to 0 ~ 4000 Hz frequency band is divided into 52 spectrum, corresponding 18 Bark scale, thus under the condition of mono, Compared with the traditional spectral subtraction method, the speech enhancement effect has higher clarity and intelligibility. In some literatures, when using wavelet packet decomposition, grade 5 decomposition was used to obtain 17 frequency bands corresponding to Bark scale, and grade 6 decomposition was used in some literatures to obtain 24 critical bands. Their purpose is to take full advantage of the auditory characteristics of the human ear, and the noise does not need to be completely suppressed during speech enhancement, as long as the residual noise is not perceived, so as to reduce unnecessary speech distortion while denoising.
Generally, researchers generally adopt fixed decomposition order when using wavelet packet decomposition, which is generally above 5. A large number of experiments show that the wavelet decomposition series has a great influence on the denoising effect of the algorithm. Too many decomposition series will cause the loss of some important local characteristics of the signal, the signal-to-noise ratio will decrease instead, and the computation is large, and the delay is large. If the decomposition series is too small, the corresponding modulus maximum of noise cannot be sufficiently attenuated, and the signal and noise cannot be distinguished well, resulting in unsatisfactory noise reduction effect and limited SNR improvement. Therefore, the use of fixed wavelet decomposition series limits the noise reduction performance of the algorithm to a large extent. Therefore, a novel decomposition series adaptive selection method has been proposed in the literature, which effectively improves the performance of wavelet min de-noising algorithm, but further leads to delay and computation.
Due to the shortcomings of the first generation wavelet, such as long time delay, relatively complex algorithm and large memory requirement, the literature uses adaptive lifting wavelet to enhance speech. Experimental results show that the proposed method can reduce the complexity of the algorithm, eliminate noise greatly and keep good speech intelligibility. According to the wavelet theory, the orthogonal wavelet decomposition can not guarantee the linear phase of the intermediate process in the decomposition process, which is not conducive to the processing of speech signal, while the biorthogonal wavelet decomposition can guarantee the phase distortion of the intermediate process. In this paper, biorthogonal wavelet packet is used for speech denoising, and good results are achieved. At the same time, because wavelet decomposition ultimately depends on filter banks, it inevitably brings delay, which limits the application scope of wavelet theory. Therefore, it is necessary to design low-delay filter to realize wavelet decomposition, which lays a solid foundation for further application of wavelet theory.
In short, in the process of using wavelet decomposition research, from the initial development of wavelet decomposition to the wavelet packet decomposition, using the human ear hearing characteristics of wavelet packet decomposition, decomposition of adaptively choosing series, lifting wavelet decomposition, in order to ensure the linear phase using biorthogonal wavelet packet decomposition and low time delay filter design, is to make full preparation for the subsequent processing work, In order to better improve the denoising performance.
2. The principle of mode maximum denoising is that the mode maximum of speech signal increases or stays the same with the increase of scale, while the mode maximum of noise decreases with the increase of scale. According to this characteristic, people remove the modulus maxima of noise, retain the modulus maxima of speech, and then use the reserved modulus maxima to reconstruct speech, to achieve the purpose of noise removal. The specific steps of modulus maximum denoising are as follows: discrete binary wavelet transform for noisy speech, decomposition scale is generally 4 or 5; The modulus maximum points corresponding to the wavelet coefficients on each scale are calculated. On the maximum scale, if the aperture value is selected, the point whose maximum modulus is less than the tightness value is set to zero, and vice versa. The propagation point is searched, the mode maximum point generated by speech is retained, and the mode maximum point generated by noise is removed. The denoising speech is reconstructed using the reserved maximum points of each scale.
Although the denoising method based on the modulus maximum of wavelet transform has a good theoretical basis, there are many factors affecting the calculation accuracy in practical application, and the effect of denoising is not satisfactory. There are literatures that use interpolation method of frequency response characteristics of wavelet transform to reconstruct modulus maxima of wavelet transform on low scale, and finally construct iterative projection operator method of analytic form to reconstruct signal according to compression mapping principle, but the performance of this method is limited. At the same time, there are many technical problems to be solved in the concrete operation of this method, such as how suitable the decomposition scale is; There are only a limited number of modulus maximum points in the reconstruction, so the reconstructed signal must have errors with the original signal, so how to construct the wavelet coefficients similar to the original signal limits the further application of this method. There is little literature on this method.
3. Correlation denoising method The principle of correlation denoising method is that the wavelet coefficients of speech signal have strong correlation between different scales, while the wavelet coefficients of noise have no obvious correlation between different scales. The main steps of correlation denoising method are: calculate the correlation of wavelet coefficients of the same spatial position of adjacent scales CWj, k, CWj, k=wj, KWJ +1, k, j represents scale, k represents position, wj, k represents the KTH wavelet coefficient of the JTH scale. Wj +1, k represents the KTH wavelet coefficient of the j+1 scale. Compare the magnitude of correlation and wavelet coefficient. If the correlation is large, it indicates that it is a signal and the wavelet coefficient is retained. Otherwise, it is considered to be noise, and the wavelet coefficient is set to zero. The denoising signal is reconstructed using the processed wavelet coefficients. In this method, the calculation of correlation enhances the edge characteristics of the signal and makes it easier to extract the features of the signal. Some literatures have obtained good results in denoising by this method.
However, in this method, once the wavelet decomposition process is biased, the calculated correlation can not accurately represent the real correlation of K point, and the performance of dependent correlation denoising method will be reduced. Some literatures have introduced a region – dependent denoising method to solve the above problems. This method mainly considers the wavelet coefficients at k point and the wavelet coefficients near K point, so as to reduce the influence of the wavelet coefficient deviation. However, this method needs to calculate the correlation coefficient at each point, which is relatively complicated and has not aroused extensive research.
4. Threshold denoising Method In 1995, threshold denoising method was first introduced by Donob. The nonlinear wavelet transform threshold denoising [Ca] proposed by him has made wavelet denoising deeply studied and widely used.
The theoretical basis of threshold denoising method is that the wavelet coefficients of noise and useful signal have different manifestations in amplitude. In low frequency band, the wavelet coefficients of speech signal are greater than that of noise, and vice versa in high frequency band. In this way, a proper min value is set for the wavelet coefficients of each layer to separate the signal from the noise. The steps of the algorithm are as follows: carry on wavelet transform to the noisy signal; The nonlinear stuffy value of wavelet coefficients is processed. The denoising signal is reconstructed using the processed wavelet coefficients.
There are two key problems in wavelet threshold denoising method :(1) threshold application method; (2) Specific estimation of threshold value. These two problems directly affect the performance of denoising.
5, mixed denoising method pure wavelet denoising method although can achieve better results, but in the case of low signal-to-noise ratio and colored noise, speech is not very good. In order to make use of the advantages of wavelet transform and remove noise better, the current research trend is to combine various wavelet methods or wavelet methods with other methods. In order to eliminate music noise, a low-variance spectral estimation method based on wavelet threshold is proposed in some literatures. Experiments show that multi-band spectral estimation combined with wavelet threshold can suppress music noise and enhance the quality of speech better than spectral subtraction. However, the processing effect of this method is inferior to that of gaussian white noise. Some literatures have applied adaptive filtering to low scale wavelet coefficients in wavelet domain and spectral subtraction or Wiener filtering to high scale coefficients.
Experimental results show that this method combines the advantages of wavelet denoising, adaptive filtering and spectral subtraction, and the damage to speech is less than the threshold denoising, and the music noise is also reduced, but the computational complexity and delay are introduced. In order to prevent the high frequency of sounds as noise is removed, there are literature, first of all, according to the energy of the wavelet coefficients QingZhuo judgment, if it is, is only a minimum scale low frequency components of denoising, in order to retain, otherwise the denoising on all scales, thus keep surd suppress noise at the same time as much as possible information, improve the enhanced speech naturalness, reduce the distortion degree of voice enhanced, But in the case of colored noise, the noise is not clean. Have literature speech enhancement methods, add the steps are: voice signals with noise by wavelet transform is first broken down into several critical band, and then to extract a series of component from the former to feedback subsystem, and using the average normalized time-frequency energy threshold value to guide the forward feedback subsystem, and inhibiting stable noise, at the same time, using the improved wavelet threshold inhibit unsteady noise and color noise, Finally, fixed soft threshold is used to enhance voiceless voice. This method combined with artificial neural network denoising method has been successfully used in the field of speech recognition, but the delay is too long to be used in real time processing. Some literatures introduce Kalman filter in the wavelet domain, and successfully combine the advantages of Kalman filter with the advantages of wavelet decomposition to simulate the perceptual characteristics of human ear, which can suppress unsteady noise and colored noise to a certain extent, and have little distortion of speech. Some literatures have introduced spectral subtraction method in wavelet domain to calculate masking threshold and optimal weighting coefficient, and introduced parameter method based on noise estimation to enhance the sound, which has achieved good performance of removing various noises, but the algorithm is more complex. Considering that although the above method can remove stable, unstable, white and colored noises to a certain extent, the denoising effect is not good when the SNR is very low, and there is still a small amount of musical noise, some literatures have proposed a method to decompose speech by using bionic wavelet transform. Compared with the wavelet transform, the bionic wavelet transform has the advantage that its scale in time-frequency domain can be adjusted not only according to the frequency of the signal, but also with the instantaneous amplitude of the signal and the first-order sub-coefficient. Experiments show that this method can preserve original pure speech better. If the above methods can be applied on the basis of bionic wavelet, better results should be obtained.
Ii. Source code
clear all; % read speech file % [speech,fs,nbits]= wavRead ('1.wav'); % defines the parameter % winsize=256;
n=0.04;
size=length(speech);
numofwin=floor(size/winsize);
ham=hamming(winsize)';
hamwin=zeros(1,size);
enhanced=zeros(1,size);
x=speech'+n* randn(1,size);
noisy=n* randn(1,winsize);
N=fft(noisy);
nmag=abs(N); % % framefor q=1:2*numofwin- 1
frame=x(1+(q- 1)* winsize/2:winsize+(q- 1)* winsize/2);
hamwin(1+(q- 1)* winsize/2:winsize+(q- 1)* winsize/2) =... hamwin(1+(q- 1)* winsize/2:winsize+(q- 1)* winsize/2)+ham;
y=fft(frame.* ham);
mag=abs(y); phase=angle(y); % Amplitude spectrum minus %for i=1:winsize
if mag(i)- nmag(i)>0
clean(i)=mag(i)- nmag(i);
else
clean(i)=0; End End % Recombines % spectral=clean.*exp(j* phase); % Inverse Fourier transform and superposition additive % enhanced(1+(q- 1)* winsize/2:winsize+(q- 1)* winsize/2) =... enhanced(1+(q- 1)* winsize/2:winsize+(q- 1)* winsize/2)+real(ifft(spectral)); End % removes the hamming window gain %for i=1:size
if hamwin(i)==0
enhanced(i)=0;
else
enhanced(i)=enhanced(i)/hamwin(i);
end
end
recoefs1(i)=sgn(recoefs1(i))*(0.003*abs(recoefs1(i))0.000003) /0.002;
otherwise recoefs1(i)=recoefs1(i);
end
elseif output1(i)= =0
recoefs1(i)=recoefs1(i);
end
end
count5=fix(count3/zhen);
for i=1:count5
n=160*(i- 1) +1:160+160*(i- 1);
s=sound(n);
w=hamming(160);
sw=s.*w;
a=aryule(sw,10);
sw=filter(a,1,sw);
sw=sw/sum(sw);
r=xcorr(sw,'biased');
corr=max(r);
if corr>=0.8
output2(i)=0;
elseif corr<=0.1
output2(i)=1;
end
end
for i=1:count5
n=160*(i- 1) +1:160+160*(i- 1);
if output2(i)= =1
switch abs(recoefs2(i))
case abs(recoefs2(i))<=0.002
recoefs2(i)=0;
case abs(recoefs2(i))> 0.002 &abs(recoefs2(i))<=0.003
recoefs2(i)=sgn(recoefs2(i))*(0.003*abs(recoefs2(i))0.000003) /0.002;
otherwise recoefs2(i)=recoefs2(i);
end
elseif output2(i)= =0
recoefs2(i)=recoefs2(i);
end
end
Copy the code
3. Operation results
Fourth, note
2014a