Google recently released SoundStream, an AI-based audio codec. According to Google, SoundStream is the first neural network codec that can encode different sound types while providing high-quality audio that runs in real time on a smartphone CPU. Earlier this year, Google released an ultra-low bit rate audio compression codec called Lyra. Within a year, Google released two ai-based audio codecs. What is the difference between the two codecs? Why is Google so focused on low-bit-rate audio compression? Will SoundStream become a general-purpose audio codec, or will it focus on a specific domain? Is it possible for the new Lyra to replace Opus?

SoundStream Technical interview #004#

With these questions in mind, LiveVideoStack spoke with Jamieson Brettle, senior product manager for SoundStream’s audio codec development, and Jan Skoglund, senior software engineer.

LiveVideoStack: Hello Jamieson, Jan. Congratulations to Google on SoundStream. SoundStream’s launch is big news in audio and video technology, and Chinese audio engineers are watching its progress closely. To give you a deeper look at this new AI audio codec, we’ve prepared some questions for you to answer.

——

Q1: Now that people have more and more bandwidth, why is Google focusing on low bit rate audio compression?

Jamieson&Jan: while the infrastructure is improving, it still takes time for the Internet to become fully universal. In addition, the demand for bandwidth by users and applications means that even as available bandwidth continues to increase, demand will outstrip supply. So we do everything we can to reduce bandwidth consumption and improve the overall user experience.

Q2: What are the main differences between the new SoundStream and Lyra, the neural network audio codec released earlier this year?

Jamieson&Jan: the first version of Lyra used a built-in synthesis engine based on WaveRNN, while SoundStream used a network similar to an autoencoder. SoundStream will be at the heart of the new version of Lyra.

Q3: Why did Google develop two AI codecs — SoundStream and Lyra? Can Google reveal its Roadmap for this? How will SoundStream integrate into Lyra?

Jamieson&Jan: audio encoding using ML is still in its infancy, and as research in this area continues to increase, we are seeing rapid advances in AI codec. With ongoing projects, we are able to quickly productize research and apply the best codecs to practical applications. Future versions of Lyra will use SoundStream as the underlying engine. As a result, today’s developers can continue to use the same Lyra API, but with significantly improved performance.

Q4: According to the paper, SoundStream has surpassed Lyra in terms of sound quality (at the same bit rate), robustness to all kinds of audio signals (voice, music, noiseless and noisy), algorithmic delay and computational complexity. Will Lyra be completely replaced?

Jamieson&Jan: we’ve seen SoundStream come a long way in terms of sound quality, noise robustness, and handling all kinds of audio signals. The new SoundStream engine will replace the autoregressive engine found in the first version of Lyra.

Q5: From the experimental results of this paper, the performance of 12kbps SoundStream seems to be approaching saturation. Does Google think that AI audio encoding is only suitable for low rate scenarios? Are there opportunities for AI audio coding to surpass traditional coding at medium and high rates, such as typical AAC rates?

Jamieson&Jan: we think AI codec will benefit a variety of bandwidths and applications. We are now working on improving neural network-based audio encoding at higher bit rates.

Q6: Is SoundStream suitable for voice, music and mixed signal codec at low rates?

Jamieson&Jan: SoundStream doesn’t categorize sound types, it handles different sounds simultaneously.

Q7: Do neural network codecs have obvious advantages in complexity over traditional signal processing codecs?

Jamieson&Jan: so far, in neural network codecs, the encoding complexity is lower and the decoding complexity is higher, which usually results in its overall complexity being much higher than codecs such as Opus. However, over time, we believe that there are several ways to improve neural network coding efficiency through improved hardware support and new algorithm improvements.

Q8: Will SoundStream become a general-purpose audio codec, or will it only focus on specific areas?

Jamieson&Jan: the initial applications will likely focus on real-time communications, but in the future SoundStream is expected to be used for universal coding.

Q9: Now that SoundStream will be integrated into the next, improved version of Lyra, is it possible that this new Lyra will replace Opus in the future?

Jamieson&Jan: at least in the short term, Opus and Lyra will co-exist. In fact, our team continues to study and improve Opus.

Q10: What’s next for Google in audio compression?

Jamieson&Jan: we will continue to use ML and traditional encoding methods to improve the efficiency of audio compression and continue to explore a variety of application areas.

Translation/editing | Alex

Thanks to Wang Jing, Wang Lizhong and Wang Zhe for providing clues and reviewing this interview.


In the scanQr codeLearn more about the conference