Abstract: Starting from the causes, research status and technical difficulties of nonlinear acoustic echo cancellation, this paper introduces in detail the dual-coupling acoustic echo cancellation algorithm launched by Huawei Cloud Audio and video team and the experimental test results.

The nonlinear acoustic echo cancellation problem is very common and difficult in practical acoustic systems, so far there is no particularly effective solution. There is also very little published literature on nonlinear acoustic echo cancellation. Huawei Cloud has been focusing on the audio and video industry for more than 20 years. How does huawei Cloud deal with nonlinear acoustic echo cancellation and what is the effect? Fan Zhan, an expert of Huawei Cloud audio and video, will introduce in detail the dual-coupling acoustic echo cancellation algorithm launched by Huawei Cloud Audio video team and experimental test results from the causes, research status and technical difficulties of nonlinear acoustic echo cancellation. The following is to share the shorthand content:

Introduction: Why share nonlinear acoustic echo cancellation Techniques

Today’s lesson is the nonlinear acoustic echo cancellation technology, chose the direction, is based on two main reasons: first, nonlinear problem of acoustic echo cancellation is a technical difficulty in the industry for many years, the problem in the actual acoustic system is very common, and it’s difficult at the same time, so far, has not been particularly effective way. I guess you might be interested in this topic.

Done before there is another reason why, some technical research, publicly in the existing literature, introduce nonlinear acoustic echo cancellation information there is very little, so want to borrow such an opportunity, introduce some huawei cloud the latest progress in this field, hope to subsequent research has some help to you, but also the experts to do the technology communication.

Today’s introduction includes four parts:

1, the first part of what is nonlinear acoustic echo, its generation principle, research status and technical difficulties;

2. The second part focuses on the double coupling acoustic echo cancellation algorithm;

3. The third part is to test the performance of the algorithm through experiments.

4. Finally, a few brief summaries.

First, nonlinear acoustic echo

1. What is nonlinear acoustic echo

So let’s go straight to the first part, what is nonlinear acoustic echo? Here is a diagram representing the path of acoustic echo, with the transmitting end on the left and the receiving end on the right. The signal that we send out is first transformed by D/A, from the digital domain to the analog domain, and then it goes through A power amplifier, which amplifies it and drives the speakers, so that it makes sound. The resulting sound travels through an air channel, is picked up by A microphone at the receiving end, and then passes through A power amplifier, and finally through A/D transformation from analog to digital. So the y[k] here is the echo signal that we received.

2. How to judge linear echo and nonlinear echo

So the question is, is the echo y[k] that we receive linear or nonlinear? Or how should we judge it?

I think to solve this problem, the core is to know clearly every link below here, see what they are linear systems and nonlinear systems, if all link is linear, then naturally y [k] is a linear echo, or as long as there is a link is non-linear, then the echo is nonlinear echoes.

Here I divide the whole echo path into four parts: A, B, C and D. So let’s see, which part of ABCD is most likely to be nonlinear? The answer should be B. That is, the power amplifiers and speakers in the echo path, for reasons that will be discussed later.

Now I want to explain A little bit more why A, C, and D are not nonlinear. First of all, the A’s and the D’s here are easy to tell, they’re both linear time invariant systems. Is difficult to determine the C, because in some more complex scenarios, acoustic echo often after several different paths of multiple reflection arrived at the receiving end, at the same time will have the very strong reverberation, even in more extreme cases, the relative displacement between the speakers and microphone also produces change, causes the echo path will be rapid changes over time. The combination of these factors often leads to a sharp degradation of the performance of echo cancellation algorithms, or even complete failure.

Some of you might ask, isn’t this complicated, isn’t it nonlinear? I think C should be a linear time-varying acoustic system, because the principle of superposition is the main basis for us to distinguish linear from nonlinear. These complex scenes mentioned above still satisfy the principle of superposition, so C is a linear system.

There is a power amplifier in B, and there is also a power amplifier in C. Why, after being amplified by the power amplifier of B, nonlinear distortion may occur, while the power amplifier of C will not produce nonlinear distortion? The main difference between the two is that the output of B is a large signal after amplification, which is used to drive the horn. However, the output of C is still a small signal after amplification, which usually does not produce nonlinear distortion.

3. Causes of nonlinear acoustic echo

I’ve listed two reasons for nonlinear acoustic echo. One reason is the miniaturization and low price of acoustic devices. The acoustic devices referred to here are the power amplifiers and speakers mentioned in B.

Why does the miniaturization of acoustic devices tend to produce nonlinear distortion? This needs to start from the basic principle of speaker sound, we all know that the nature of sound wave is a physical vibration, and the basic principle of speaker sound is through the current to drive the speaker diaphragm vibration, the diaphragm will drive the surrounding air molecules corresponding vibration, so as to produce sound. If we want to make a loud sound, we need to use more electricity per unit of time to drive more air molecules to vibrate.

Suppose you have two speakers of different sizes, and they use the same power to drive. For the larger speaker, because it has a larger area of contact with the air, it can cause more air molecules to vibrate per unit of time, so it will produce more sound. The problem with a small speaker is that if you want to make as much noise as a large speaker, you need to increase the power of the drive, which leads to a saturation distortion of our power amplifier, which leads to nonlinear distortion. This is one of the main reasons why the miniaturization of acoustic devices is easy to produce nonlinear distortion. It’s a little bit easier to understand, but I’m not going to talk about it.

The second reason is that the acoustic structure is not designed properly. The most typical example is the unreasonable vibration isolation design of acoustic system. Between speakers sound unit with mike receiving unit, usually need to do isolation processing, if there is no vibration isolation treatment, so in the process of the horn sound, he produced by the vibration by physical way passed to mike at the receiving end, the mike received acoustic signal modulation, which is essentially a kind of random vibration, nonlinear vibration, So it’s bound to introduce nonlinear distortion.

We did a survey of the major mobile phone models on the market, focusing on the acoustic characteristics. We were surprised to find that more than half of the phone models on the market have less than ideal acoustic characteristics, which correspond to the “poor” and “very poor” categories. We usually use the mobile phone to play games, or voice calls, often appear echo leakage and double cutting problems, which is directly related to the poor acoustic characteristics of mobile phones.

Of course, this data only applies to mobile phones, and there are many other electronic products like mobile phones on the market, which should have similar problems. This set of data tells us that nonlinear distortion is a common problem in electronic products in our daily life, and I believe that the research on this problem will be a very valuable and meaningful direction.

4. Research status of nonlinear acoustic echo cancellation technology

A previous search of the IEEE digital Library on “acoustic echo cancellation” found a total of 3,402 articles, including conference papers, journals, magazines, books, etc. I searched for “nonlinear acoustic echo cancellation” in the same way and found only 254 articles, less than 1/10 of the previous ones, which means that nonlinear acoustic echo cancellation is a relatively cold research area in the whole acoustic echo cancellation field.

If this direction is so valuable and meaningful, why is it so cold? One answer I can think of is that it’s very difficult, very challenging. Let’s take a look at the technical difficulties.

5. Technical difficulties in nonlinear acoustic echo cancellation

I compared linear and nonlinear echo cancellation problems from six different dimensions. The first dimension, the system transfer function. In a linear system, we consider the system transfer function as a slow time-varying system, and we can approach the transfer function by adaptive filtering to effectively suppress the echo. However, in a nonlinear system, the system transfer function is usually rapidly changing and mutating. If we use a linear method to approximate it, the update speed of the filter will not keep up with the change speed of the system transfer function, which will lead to the unsatisfactory acoustic echo cancellation.

The second dimension is the optimization model. In linearity, we have a very complete linear optimization model, from the construction of the objective function to the solution of the system optimization problem, the whole context is very clear. In the nonlinear system, there is a lack of an effective model to support it.

The next four dimensions correspond to four problems, which are the four difficult problems in linear echo cancellation field, and these problems also exist in nonlinear field. For example, strong reverberation. If we hold a video conference in a small conference room, the sound will be reflected by the wall for many times, bringing strong reverberation, and the reverberation will take a long time. In order to suppress such strong reverberation echoes, it is necessary to lengthen the length of the linear filter, which brings a new problem: according to Widrow’s adaptive filtering theory, the longer the length of the filter, the slower the convergence rate, and the greater the weight noise, which leads to the unsatisfactory echo cancellation under strong reverberation.

The second problem is the delay jump problem. Delay hopping is a common problem in real – time audio and video calls. The main phenomenon is that the delay relationship between the signal collected by the MAC terminal and the echo reference signal will jump. After each jump, the signal needs to be re-aligned, and some echo will be missed.

The third problem is the howling problem. Noise detection and noise suppression are recognized as classical problems in the field of echo.

Finally, there are two lectures. Double talk is an important index to evaluate the performance of echo cancellation algorithms, but it is also a difficult problem to deal with, because double talk can easily lead to the divergence of filter coefficients.

Combining these dimensions, we can see that nonlinear acoustic echo cancellation is a very challenging research direction.

Double coupling acoustic echo cancellation algorithm

This is an algorithm proposed by our team. Its main characteristic is that it combines some characteristics of nonlinear acoustic echo in the process of constructing filter model, so it also shows inherent advantages in suppressing nonlinear echo.

1. Modeling of nonlinear acoustic echo system

Go back to the acoustic echo path diagram. We have simplified this model. The horn tip on the left is represented by Wn, which is assumed to represent the nonlinear echo path transfer function. At the same time, we use Wl to represent the right side of the horn, that is, the McEnd, which represents the linear echo transfer function. Based on such mathematical assumptions, the received signal Y can be represented as the result of the convolution of the transmitted signal X with these two transfer functions.

Then we simplified the model appropriately, mainly based on mathematical decomposition. We assumed that the nonlinear transfer function could be decomposed into a combination of linear and nonlinear system functions, and the intermediate equation would be obtained.

Then, the final expression is obtained by variable substitution of the intermediate equation. The physical meaning of this expression is very clear. We can see that the whole echo path can be expressed as the sum of linear echo path and nonlinear echo path, which is its physical meaning.

2. Dual-coupling adaptive filter

Based on such a mathematical model, we then construct a new filter structure, which is called dual-coupled adaptive filter. Compared with the traditional linear adaptive filter, this filter is different in two aspects. The first difference is that the traditional linear filter has only one learning unit, while our filter has two learning units, namely the linear echo path filter here, which is represented by Wl. There is also a nonlinear echo path filter, which we denote by Wn.

The second difference is that we have added a coupling factor between the two filters, which aims to cooperate with the two filters to better work, so that they can give full play to the maximum efficiency, and even play the effect of 1+1 > 2.

3. Double coupling filter design

After the structure of the filter is determined, we need to design the filter coefficients. The design process is summarized into three steps. The first step is to build optimization criteria, the second step is to solve the weight coefficients of filter — Wl and Wn, and the last step is to build coupling mechanism.

The first step is to build optimization criteria. I think building optimization criteria is probably the most important step in the whole filter design, because it determines the upper limit of filter performance. What is a good optimization criterion? In my opinion, a good optimization criterion needs to be effectively matched with the physical characteristics of the problem. Therefore, before constructing the optimization criterion, we first analyze the characteristics of nonlinear acoustic echo, hoping to explore some physical characteristics of nonlinear acoustic echo through this analysis.

Our analysis is based on the above function, we call it for short time correlation, it represents the two signals, in a short time of observation time window of “T” such a scale waveform similarity degree, it’s important to note that this function it is statistically, because our mathematical expectation of its operation. At the same time, we also added a phase correction factor in the last term of the molecule, in order to align the initial phase of the two signals.

Based on the short-time correlation function constructed above, we analyzed a large number of acoustic echo data and selected several groups of typical data: the green curve corresponds to a group of echo data with very good linearity. We can see from this data that the short-term correlation is very high over the whole range of time T, reaching more than 0.97, close to 1. Yellow curve, the corresponding data has relatively weak nonlinear distortion, so when the time T increases, the short-term correlation gradually decreases, and finally tends to a relatively stable value. The red curve is the data with strong nonlinear distortion that we choose. In order to make an effective comparison between these three sets of data, we also give a blue curve, which is the short-term correlation between signal and noise, which is very small in the whole time range T.

Through the comparison of such a set of curves, two conclusions can be drawn. The first conclusion is that the short-time correlation function constructed by us can objectively reflect the linearity characteristics of the acoustic system. The better the linearity is, the greater the value will be. The second conclusion: for the system with strong nonlinear distortion, it still has a strong correlation in the short-term observation window (such as T<100ms), which can be seen from the red curve.

Based on such characteristics, we then constructed a new error function, called “short-time cumulative error function”. And you’ll notice that we’re accumulating residuals over an observation time window, T.

Based on this error function, we further construct a new optimization criterion, which is called “minimum mean short-time cumulative error criterion”. We hope that through the constraints of the optimization criterion, the weight coefficient of the filter can meet two characteristics. The first characteristic is that the filter can achieve the optimal in the statistical sense, that is, the global optimal, so we add mathematical expectation operation into the objective function. At the same time, we also hope that it is optimal in the scale of a short-term observation time window, that is, local optimal, so within the mathematical expectation, we also conduct short-time integration for the error.

This optimization criterion is fundamentally different from the traditional linear adaptive filter, because the traditional linear adaptive filter is based on the minimum mean square error criterion, which is only optimal in the statistical sense, without local optimal constraints.

4. Double coupling filter design

So let’s first solve for Wl here, which is the linear filter. The main solution method is to assume that Wn is the optimal solution of the nonlinear filter, and then substitute the optimal solution into the previous optimization equation to obtain the optimization objective function simplified above.

At this point, we also make some prior assumptions, assuming that the first and second order statistics of the nonlinear filter are equal to 0, we can further simplify the optimization problem above, resulting in the very familiar equation, the Wiener-Hopf equation. This result tells us that the optimal solution of linear filter is consistent with the optimal solution of traditional adaptive filter, which is the theoretical optimal solution of Wiener-Hopf equation. Therefore, we can adopt some existing mature algorithms, such as NLMS algorithm and RLS algorithm, to solve it iteratively. This is Wl’s design.

Let’s take a look at Wn’s design. The design of Wn is similar to that of Wl. It is also necessary to substitute the optimized linear filter into the initial optimization problem, which can simplify the previous optimization problem into the following equation. After a series of variable substitutions, the optimal solution of the nonlinear filter is finally obtained, which has the form of least square estimation.

Step 3 Build the coupling mechanism. Before I get into the coupling mechanism, let me describe the characteristics I expect of this coupling mechanism. I want the linear filter to dominate in a very linear acoustic system, and the nonlinear filter to be dormant or off; Conversely, when the nonlinearity of the acoustic system is very strong, nonlinear filters are expected to play a dominant role, while linear filters are in a semi-dormant state. In practical acoustic systems, nonlinear and linear states are constantly alternating and superimposing, so we hope to build a mechanism to control the coupling of these two states.

In order to design the coupling mechanism, linearity and nonlinearity characteristics must be measured. So, we define two factors, the linearity factor and the nonlinearity factor, which correspond to these two equations on the left. The basic idea of coupling control is to substitute the values of these two factors into NLMS algorithm and least square algorithm to adjust the learning speed of the two.

In order to facilitate you to have a qualitative understanding of the double coupling acoustic echo cancellation algorithm, I have drawn another set of curves, the left group of graphs corresponds to the linear echo scene. Let’s first look at the NLMS algorithm. The yellow curve represents the real system transfer function, and the red curve is the result of the NLMS algorithm. It can be seen that in linear scenes, the linear filter obtained by NLMS algorithm can effectively approximate the real transfer function, and thus can effectively suppress the linear acoustic echo.

Now let’s take a look at this double-coupling algorithm. In a linear echo scene, the double-coupling nonlinear filter is in a dormant state, so its value tends to 0. At this time, the linear filter plays a leading role.

Now let’s look at the nonlinear acoustic echo scene on the right. We assume that the nonlinear distortion occurs mainly between t1 and T2, and you can see that the yellow line has a sudden change at that time. For the NLMS algorithm, when the nonlinear distortion occurs, its linear filter approximates the nonlinear distortion. But because the speed of learning can’t keep up with the speed of the filter changes, there is always a large gap between it and the real value. At the same time, when the nonlinear distortion disappears, it takes some time to return to the normal state, so in the whole time, there will be the problem of echo leakage.

Next, we will look at the double-coupling algorithm. After the nonlinear distortion appears, the linear filter will enter a relatively dormant state, which is the coupling mechanism mentioned above, which will slow down its update speed. Therefore, its value will change slowly during the whole time when the nonlinear appears.

After entering the nonlinear distortion state, the nonlinear filter starts to work, it will quickly track the change of nonlinear characteristics, and when the nonlinear distortion disappears, the nonlinear filter enters the hibernation state again. By combining these two filters, it is possible to track the variation of the acoustic echo path effectively. This is just an example; the reality is often much more complicated.

Next, we compare the characteristics of the two filters, mainly from four different dimensions. The first is the optimization criterion. NLMS algorithm is based on the minimum mean square error criterion, while the dual-coupling algorithm is based on the minimum mean short-time cumulative error criterion, so their optimization criteria are different.

The second is the optimal solution of the theory. The NLMS algorithm has the Solution of the Wiener-Hopf equation, and the linear filter of the dual coupling algorithm also has the solution of the Wiener-Hopf equation, and the nonlinear filter has the least square solution.

The third dimension is the operation, the NLMS operation is O (M), M is the order of the filter, and the double coupling algorithm operation is followed by an extra O (N2), because it has two filters, N is the order of the nonlinear filter, and the square here is because the least square operation requires the inverse operation of the matrix, So it’s a lot more computation than the linear NLMS.

The third is the control mechanism. The NLMS algorithm has only one filter, and its control is realized mainly by adjusting the step size, which is relatively simple to control. The double coupling algorithm needs to control the coupling of two sets of filters, and the complexity of control is much higher.

3. Analysis of experimental results

Here, I mainly compare the performance of dual-coupling algorithm and NLMS algorithm in two experimental scenarios. The first is single-lecture test scenario, and the second is double-lecture test scenario.

Let’s take a look at the single speaker test scenario. The first example is for strong nonlinear distortion. The three images on the left respectively represent the original signal’s spectrogram, the spectrogram after echo cancellation by NLMS algorithm, and the spectrogram of double coupling algorithm. The darker the color, the greater the energy. The figure on the right represents the echo rejection ratio, and the larger the value, the better. The red curve is the echo rejection ratio of the dual-coupling algorithm, and the black line is the echo rejection ratio of the standard NLMS algorithm.

It can be seen that the echo suppression ratio of NLMS algorithm can only reach about 10 dB after convergence, which is relatively low. After convergence, the dual-coupling algorithm can reach more than 25 dB, that is to say, it is 15 dB more than NLMS algorithm, this advantage is very obvious.

Next we look at the second example, with a spectrogram on the left and echo suppression ratio on the right for weak nonlinear distortion. Echo rejection ratio and convergence rate are the main indexes to evaluate the performance of single talk. First, the NLMS algorithm, after convergence, can suppress about 22 to 25 decibels. The convergence speed of this algorithm is very slow, and it will enter the state of relative convergence after about 100 frames.

The dual-coupling algorithm, when stabilized, can suppress 35 to 40 decibels, which is about 15 to 20 decibels better than NLMS. At the same time, it has a very obvious advantage: convergence speed is very fast, almost after the echo arrived, he instantly entered the convergence state.

The following figure compares echo suppression ratios for different mobile phone models. Red is the dual-coupling algorithm, and blue is the NLMS algorithm. From this set of data, we can see that the dual-coupling algorithm generally improves the echo suppression ratio by more than 10 decibels compared with the NLMS algorithm, which has a relatively large advantage.

Finally, enter the dual-speaker test scenario. I will first introduce the test example. This set of data is the data of a video conference. The graph on the left is the original microphone signal spectrogram, and the graph on the right is the echo reference signal spectrogram.

Echo rejection ratio and near-end speech distortion were the main indicators used to evaluate the performance of dual speakers. The above three images are spectrograms after echo cancellation, and the middle image is the result of NLMS algorithm. We can see that its echo suppression is not very ideal, no matter in the single speech or in the double speech, there are more echo residues. The graph at the bottom is the spectrogram obtained by the double-coupling algorithm. It can be seen that echo suppression is relatively clean in both single and double speakers, and there is little damage to the proximal speech in double speakers. This data corresponds to the video conference scenario, so the last step of NLP processing needs to be done.

The figure above is the output of NLP based on the dual-coupling algorithm. We can see that after processing, the whole spectrum is very clear, the echo is very clean, and the spectrum is not too damaged, and the double speech is very transparent.


Finally, I will make a brief summary. Today, I mainly introduce three aspects. The first is to understand the nonlinear acoustic echo, the causes of its generation, the current research status and technical difficulties.

Next, we focus on the double coupling acoustic echo cancellation algorithm of Huawei Cloud audio and video. Our main contribution is reflected in two aspects. The first aspect is to build a double coupling adaptive filter structure. The second is to propose the minimum mean short-time cumulative error criterion and solve it. After solving it, we get that the linear filter of the double-coupled filter is of the form of the optimal solution of the Wiener-Hopf equation, and the nonlinear filter has the least square solution.

Finally, we test the performance of this algorithm through experiments, and find that it achieves obvious performance improvement in strong nonlinear distortion scenarios, linear scenarios, and dual-lecture scenarios. Echo suppression ratio increased by more than 10 dB; It converges faster, in less than 30 milliseconds. However, this algorithm also has defects: the amount of computation is too large; There are many coupling control links, which are relatively complex.

