What are some technical challenges to solve with the latest viral interactive podcasting?

Hey, do you listen to podcasts?

According to some statistics, the most popular podcast in the United States reached 23.7 million viewers in a single month in January last year, before the pandemic affected people’s lives. With the popularity of RTC technology and the change of people’s lifestyle, podcasting has evolved into a new form. In January, podcasting took off in its new form as “interactive podcasting,” fueled by RTC technology and Elon Musk’s “traffic.” This period of time at home and abroad “interactive podcast” scene more and more fans.

“Interactive podcast” innovation and highlights

“Interactive podcast” is a new online interest/topic type language chat interactive scene, whether celebrities/big V, or ordinary netizens can open or participate in a broad and broad interactive communication, and the audience can “raise their hands” at any time on the mic to participate in real-time interactive communication.

“Interactive podcasting” sounds like a casual language chat room, but there are many differences in content, user relationships, information distribution, and so on:

Many people have dissected the “product success theory” behind it on the Internet. So what has changed about the technology behind “podcasting” to “interactive podcasting”? What should be optimized for a good audio interactive experience?

New forms of podcasting technical changes

I. Real-time interactive technology endows “Content Equality”

Many people say a big part of the success of interactive podcasting is “content parity,” which means that anyone can create a podcast room on the platform to initiate a topic discussion, or join the podcast room at any time to interact. However, to achieve “content equality”, also need to be supported by technology. Why do you say that?

First of all, compared with podcasting, the most important thing of “interactive podcasting” is that the host, guests and every audience can interact with each other in real time for voice communication, giving the audience a stronger sense of interaction and participation. However, in order to interact with the microphone, not only ultra-low delay transmission is required, but the network framework also needs to support two-way transmission. However, CDN only supports one-way transmission, so the interaction with the microphone cannot be realized. This requires a global real-time transmission network such as Agora SD-RTN™ that can interact with users around the world through low-latency data transmission. This is the network level change in the technology behind it.

Nova audio engine provides professional equipment quality audio experience

On the other hand, in order to ensure sound quality and sense of listening, traditional podcasts will buy professional equipment and choose a suitable environment to record, so as to avoid surrounding noise and echo. In “interactive podcasts,” users can pick up their phone and talk to someone or listen to a show in the room anytime, anywhere, without the need for professional equipment to have the same quality of sound experience. This is due to a set of software algorithms from decoding, 3A algorithm to replace the role of professional equipment, so that non-professional users can also have good sound quality and experience.

In terms of codec, Soundnet has integrated Agora Nova™, a self-developed codec optimized for real-time audio interaction, into the SDK. In order to provide a better sound quality experience in pure voice scenarios, Nova™ does not use the same 8kHz or 16kHz sampling rate as other voice codecs, but uses 32kHz sampling rate to capture more speech detail. At the same time, through theoretical derivation and a large number of experimental verification, a set of simplified speech high-frequency component coding system was designed to optimize Nova™ coding complexity. On the guarantee of anti-packet loss capability, we also choose the most balanced scheme on the premise of guaranteeing coding efficiency. Through experimental verification, this scheme not only ensures coding compression efficiency, but also guarantees the recovery rate when packet loss occurs. Agora Nova™ delivers superior speech coding quality to Opus in both subjective and objective evaluation systems.

In terms of 3A (noise reduction, echo cancellation, auto gain), the Sound Net SDK is smart enough to recognize various environments, fully eliminate echoes, and provide superb dual-speak performance. At the same time, the acoustic net SDK can accurately detect the noise signal in the noise reduction module, and dynamically adjust the type and parameters of the noise reduction algorithm, effectively eliminate all kinds of noise without damaging the voice quality. Automatic audio gain, even in a noisy environment can be excellent user experience, maximum audio quality to ensure a clear interactive podcast experience.

Because in the interactive podcast scenario, users may come from all over the world. So in order to ensure that users around the world can get a consistent and smooth experience. The multi-dimensional network estimation model of acoustic network communication can intelligently identify network link and user network environment, and then adapt the bit rate and frame rate according to user network environment, device performance and network link. At the same time, the excellent jitter buffer mechanism and anti-packet loss algorithm can ensure smooth audio calls in 80% packet loss cases.

In addition, Agora achieves the beautification of human voice by coordinating the tone, timbre, dynamics, rhythm and spatial effect of human voice. Meanwhile, it supports temporal and spatial processing of one or more frequency bands of human voice, so as to achieve the purpose of improving sound quality and adjusting human voice timbre.

Based on the above network transmission, encoding and decoding, noise reduction and echo cancellation, bit rate adaptive, weak network confrontation and a series of complete technology, we launched the “interactive podcast scene solution”.

Sound net interactive podcast scene solution

Low latency interaction with global coverage

The transmission of the interactive podcast scenario scheme is based on Agora SD-RTN™. Sd-rtn ™ is deployed globally in over 200 countries and territories to provide “dedicated” quality and interactive experiences for real-time audio and video. Based on intelligent routing strategy and self-developed transport protocol Agora AUT, the global coverage rate of high-quality network transmission exceeds 99%.

Globally consistent high sound quality experience

Sonnet’s NOVA™ voice engine delivers superior audio capture and effective frequency range at the same bit rate as the industry’s leading codecs, providing sound fidelity. NOVA™ delivers a lower bit rate at a given sampling rate, reducing the strain on the user’s bandwidth and ensuring a good audio experience under any weak network conditions.

At the same time, the sound net adopts the industry leading software 3A algorithm, can intelligently adapt to all kinds of environment, without damaging the sound quality of the voice, effectively eliminate all kinds of noise, echo. Maximize the audio interactive experience.

Tenfold flexible network architecture design, easy to cope with sudden traffic surge

The SD-RTN™ network, designed for real-time transmission, is designed with an ultra-elastic network architecture designed to handle more than 10 times the load. Can calmly deal with a lot of listed companies, large flow platforms, explosive customers sudden flow surge.

Security compliance, compliance with global information security and privacy protection regulations

Agora has met all the requirements of ISO 27001, ISO 27017, ISO27018 and other relevant standards, and has passed and obtained the worldwide certification issued by DNV. Our network architecture and infrastructure conform to SOC2 standards, ensuring that all physical and virtual access is effectively managed, monitored and controlled. Meanwhile, Sonnet also hired global privacy protection and security experts, including Trustwave Holdings, to audit, and passed third-party privacy protection audit, as well as security experts in network penetration, application vulnerability and compliance assessment audit tests, and fully comply with GDPR, CCPA, COPPA, HIPAA. And China’s data Security Law (draft), personal Information Protection Law (draft) and other relevant international and domestic regulations.

In terms of privacy protection, Sonnet does not access or store any personally identifiable information (PII) of users at all, Only operational information is collected that is necessary to provide the service — this includes IP addresses (which identify users’ location to comply with regional regulations and network connections), metering data (since soundnet is charged by time of use) and quality of experience data (which helps customers monitor the quality of their experience through a crystal ball).

In terms of information security, Voidnet provides application developers with a number of default and configurable security options such as authentication, data encryption, and network geo-fencing to protect their audio and video streaming data. Agora SDK provides built-in AES encryption algorithm for customers to choose directly. The encryption key is managed by the customer’s application and transmitted between end-user devices outside the Agora network.

Sonnet also works with a number of the world’s most trusted security organizations to ensure that vulnerabilities are discovered and informed to help customers quickly carry out the necessary fixes.

Interactive audio best practice support

In the past seven years, Sonnet has served thousands of customers, including Chanba, Pocket Werewolf Kill, Lychee, Momo, etc., accumulated the best practices of various real-time audio and video interactive scenes, and has sufficient practical experience and various contingency guarantees.

XLA experience quality assurance

In July 2020, Sonnet defined and launched XLA, the first experience quality standard in real-time interactive industry, based on nearly one trillion minutes of user experience data and massive subjective experience evaluation of users. If any index fails to reach the standard, Snet can pay up to 100% of the compensation. This commitment also demonstrates snet’s technical strength and service quality in the field of real-time interaction. At present, no other service provider in the market can offer similar commitment guarantee for real-time audio and video services.

More than “Quick implementation”

In fact, in February of this year, we had a developer in our community build and open source an application in 2 days using the Agora Web SDK (click here to learn more). In the app, he says, it only took about seven lines of code to implement the audio interaction. The architecture of the current mature interactive podcast scenario is shown below. Users can access the podcast room through the podcast room list or other level 1 entry. In the live broadcast room, there will be a virtual platform (channel), the host and guests in the channel dialogue. ‘

If this process is translated into API call logic, it looks like this:

This is how we play our most common interactive podcast. But as users use it, we see a lot of other ways to play it. Developers can use the Agora API to provide the best sound quality for different types of gameplay. Let’s take a look at some typical gameplay parameter combinations:

Elon Musk’s typical guest talk game

In this case, the guests are relatively fixed, there will not be frequent up and down the mic situation. And all the guests are on voice chat. So we can set the Audio Profile as the Speech Standard. AudioScenariosheding is set to Default. At this time, the default sampling rate of 32 kHz will be used for voice coding, and the maximum coding rate is 18 Kbps.

RtcEngine. SetAudioProfile (the AUDIO_PROFILE_SPEECH_STANDARD, the AUDIO_SCENARIO_DEFAULT)Copy the code

Open communication

Some of you may have seen a room where the owner spoke of the room as an open place for communication, where all the audience could apply to get on stage. In this case, there will be frequent mic ups and downs. At this time, we can set AudioProfile as Music Standard and AudioScenario as ChatroomEntertainment.

RtcEngine. SetAudioProfile (the AUDIO_PROFILE_MUSIC_STANDARD, the AUDIO_SCENARIO_CHATROOM_ENTERTAINMENT)Copy the code

Online concert

Perhaps wanting to organize a party with only “song and dance shows”, during the Spring Festival, some users started organizing friends to hold online concerts in interactive podcasts. To ensure sound quality, we need the support of music encoding. We can set AudioProfile to MusicHighQuality to provide high sound quality support. Set the AudioScenario to GameStreaming to ensure a good real-time interactive experience even with high sound quality.

RtcEngine. SetAudioProfile (the AUDIO_PROFILE_MUSIC_HIGH_QUALITY, the AUDIO_SCENARIO_GAME_STREAMING)Copy the code

Here are just three examples of how much more is emerging around the interactive podcast scene. If you have a new idea, but don’t know how to implement it based on the Agora API, you can share it with us by Posting in the RTC community. For more details, please call 400 632 6626.