The online music field has always been a hot spot of investment by major capital giants. From the capture of copyright to the current “fight on the cloud”, the war of mainstream platforms has shifted from copyright resources to the expansion of innovation. But now, online karaoke is becoming an important weight to seize the “cloud music” market.

According to statistics, as of 2019, the number of online k-Song users has reached nearly 300 million, and the Internet penetration rate has reached 67.9%. In the accumulative user distribution of online music field, the proportion of online karaoke users is increasing gradually. Because of the epidemic, people’s long-accumulated offline consumption demand is transferred to the online, which adds a full weight of dry wood to the already hot online karaoke industry.

The current mainstream online karaoke scene technical solution

Let’s first review the common online chorus play in the current online karaoke industry, and what are the technical difficulties faced by the real “real-time chorus”?

Users who have experienced online karaoke know that almost all online karaoke is realized by recording chorus and serial chorus. Take lead singer A, chorus B and listener C as examples:

Recording chorus: Lead singer A sings according to the accompaniment – click upload – – chorus B chooses the accompaniment with A’s song and sings again – – Indirectly completes the chorus after recording – click upload – – and audience C performs online on-demand synthesis.

Serial chorus: Lead singer A initiates A chorus (local backing mix) – “Lead singer A’s singing + accompaniment sent to chorus B -” Chorus B joins in singing together.

The technical architecture of serial Chorus has the following shortcomings in user experience:

  • For the lead singer, there is a big delay in hearing the chorus return. Therefore, in general, the vocalist does not take the chorus’s audio stream, and the vocalist cannot hear the singing of the secondary vocalist. As a result, the vocalist does not know the singing effect of the chorus, so the online karaoke atmosphere is very weak.
  • It is difficult to support a chorus of three or more people. Real-time multiplayer chorus is very complicated to implement in serial schemes, and it is difficult to really implement.
  • When listeners listen to a song, they may feel that the chorus is not on time, which affects the results.

Hence the launch of “Real Time” online multi-player chorus program.

Real-time chorus solutions

Based on the above technical problems, the integrated real-time chorus solution is launched, which directly hits the pain points of user experience with the advantages of ultra-low delay, multi-terminal synchronization, multi-group chorus, excellent sound quality and other functions.

The framework of the real-time chorus scheme is as follows:

In the duet mode, the lead singer and the chorus can hear each other’s voices; in the multiplayer mode, the chorus can hear each other’s voices, and almost no delay is felt, thus achieving real-time chorus in a real sense.

  • The lead singer and each chorus end obtain BGM from the local at the same time, and start singing along with the accompaniment
  • The host sends two audio streams, local BGM and Mic
  • The chorus end does not subscribe to the BGM audio stream of the lead end, but only subscribe to the Mic audio stream of the other end
  • Viewers subscribe to all audio streams to enjoy the singers’ “zero delay” chorus

In the real-time chorus scheme, three problems are mainly solved: high quality of sound, ultra-low delay, synchronous accompaniment and number limitation:

In the realization of real-time scheme, based on sound quality assurance, the delay optimization of ** “acquisition, pre-processing, coding, transmission, decoding, rendering” ** whole link is carried out, and the delay is reduced to a scientific and reasonable 66ms ultra-low sensory delay.

High quality

From low bit rate narrowband speech to high quality stereo music through the industry leading voice engine, and supports sampling from 8kHz (narrowband) to 48kHz (full band), up to 196K bit rate. The leading self-developed 3A algorithm (echo cancellation AEC, automatic noise suppression ANS, automatic gain control AGC) effectively solves the echo, scream, noise and other problems that may occur in communication. It further ensures excellent sound quality.

Built-in real-time bel Canto function. Based on the original low latency, high quality, in view of the singing scene using link joint algorithm framework, and more modules for vocal pitch, timbre, rhythm, tempo, space, atmosphere, and even art types such as dimension adjustment, make the song more beautiful and more fit accompaniment, at the same time preserving the characteristics of the original singer voice.

Ultra low delay

The delay at the device side includes the delay generated by the acquisition, pre-processing and encoding at the acquisition side, the delay generated by the receiving, decoding and rendering processes at the player side, and the network delay generated at both ends after encoding and before decoding. We are in the codec algorithm for tuning, layer upon layer optimization of each link in the delay factor.

Network delay must be paid attention to in the real-time chorus. The terminal user network is complex and operators are uneven, which can easily cause network jitter. Service nodes are deployed globally and the nearest access policy enables users to access data nodes with the best quality nearby. Congestion control algorithm, Qos/QoE optimization strategy, multi-person communication flow control algorithm and so on can effectively reduce the stuck delay in communication.

Synchronous accompaniment

Chorus synchronization: precise service time, the lead singer and chorus agree on the precise singing time to play the song.

Audience synchronization: In order to ensure the synchronization of lyrics in multiple aspects and the synchronization of viewing and chorus, SEI is adopted in the scheme for lyrics synchronization, and the information of lyrics and audio and video are transmitted in the same media channel, so as to ensure the synchronization of lyrics and audio and video. The progress of lyrics is sent by the lead singer, and the audience side receives the timestamp, and then highlights the key lyrics according to the timestamp. Achieve the effect of simultaneous display of lyrics.

The number of limit

In traditional online karaoke, the maximum number of people is 2. The more people there are, the more uncontrollable factors will increase, resulting in poor experience effect. Real-time chorus solution: In view of the problem of multi-person real-time chorus landing, chorus users adopt local accompaniment, forcing time difference to it. Multi-person real-time chorus has the same effect as two-person chorus. Currently, the maximum number of simultaneous users is 50, and there is no limit on the number of users watching.

The real-time chorus solution supports multi-person real-time chorus, which can carry out ultra-low delay communication under the premise of high pitch quality. The end-to-end delay is as low as 66 ms, and the accompaniment and voice between each end can be accurately synchronized. It is not limited by the number of chorus members, and the access cost is low and the expansion is easy.