Author: Liao Xiangli Planning: Wang Chen Review & proofread: Xiao Hang editor & typesetting: Wen Yan

Netease cloud music most of the original audio and video technology are applied to data processing in the library, based on the audio and video service algorithm can experience, cloud music library team team collaboration with the audio-visual method, set up together with netease cloud music audio and video algorithm processing platform, for the entire cloud music provide unified algorithm of audio and video processing platform. This article will share how we used Serverless technology to optimize our entire audio and video processing platform.

This paper will be introduced to you from three parts:

  1. Current situation: application of audio and video technology in netease Cloud Music, problems encountered before the introduction of Serverless technology;

  2. Selection: Considerations when investigating Serverless solutions;

  3. Landing and outlook: what transformation we have carried out, the final landing effect and future planning.

The status quo

As a music-oriented company, audio and video technology is widely applied in many business scenarios of netease Cloud Music. In order to make people feel more vividly, five common scenarios are listed here:

  1. By default, the user hears the sound quality of the standardized bit rate that we have pre-converted using the audio transcoding algorithm, but wants to switch to a worse or better sound quality due to limited traffic or their own requirements for higher sound quality.

  2. Users can use the function of listening and recognizing songs in cloud Music APP to identify music in the environment, which uses audio fingerprint extraction and identification technology behind.

  3. For some VIP songs on the platform, in order to give users a better audition experience, we will do refrain detection, so that the audition can directly locate the climax clip, here we use the refrain detection algorithm.

  4. In the k-song scene of cloud music, we need to display and score the pitch of the audio. Here, we use the pitch generation algorithm to improve the basic data of k-song.

  5. In order to better meet the listening experience of users of small languages on the cloud music platform, we provide transliterated lyrics for Japanese and Cantonese, where the algorithm of automatic Roman sound is used.

As can be seen from the above scene, audio and video technology is widely used in different scenes of cloud music and plays an important role.

From our audio and video technology, we can make a simple division, which can be divided into three categories: analysis and understanding, processing, creation and production. Some of these are processed by SDK on the end. The more part is to provide universal audio and video capabilities in the form of services through algorithmic engineering and back-end cluster deployment management, and this part is the focus of our share today.

In the servitization deployment of audio and video algorithms, we need to know a lot of characteristics of related audio and video algorithms, such as deployment environment, execution time, whether to support concurrent processing, etc. With the increase of our landing algorithms, we summarize the following rules:

  1. The execution time of the algorithm is long: the execution time is always proportional to the length of the original audio. The Range of the audio and video duration in many scenarios of cloud music is very large. Based on this feature, we often adopt the asynchronous mode in the design of the execution unit.

  2. Audio and video algorithms have multilingual features: Cloud music algorithm including the c + +, Python, such as language, docking context will be a great environment, in order to solve this problem, we adopt the way of standardized conventions and image delivery, decoupling each kind of environment to work, so the follow-up to support image deployment, will become one of the focuses of our technology selection.

  3. The appeal of flexibility is growing: Cloud music platform, from when I was in 500 w, now more than 6000 w, online stock of incremental versus gap is more and more big, when we fast implementation of an algorithm, not only to consider the access of increment, more want to consider the rapid processing of stock, so in system design, can separate the minimum size of execution unit, Facilitate rapid expansion.

Based on our understanding of engineering and characteristics of audio and video algorithm processing, the overall architecture of cloud music audio and video processing platform is as follows:

For the common part of different audio and video algorithm processing, we have made a unified design, including algorithm processing visualization, monitoring, quick trial and processing data statistics, etc. For the allocation of resources also designed a unified configurable management mode, so that the common part of the whole system can be as abstract as possible and reuse.

The most critical part of the whole audio and video algorithm processing platform, which is also the focus of today’s sharing, is the interaction and design of the execution unit. Cloud music solves many efficiency problems in connection and deployment through unified connection standards and mirrored delivery. For the use of resources, due to the existence of new algorithms and inventory/incremental services, we used internal private cloud host application/recycling and content containerization before going on the cloud.

In order to better describe the operation process of cloud music execution unit, we further refine it, as shown in the figure below:

Through the message queue to decouple the execution unit interaction with other systems, inside the execution unit, we concurrent degree by controlling the message queue to fit different concurrency algorithm performance, try to control execution unit’s main work only for the computing algorithm, eventually the system capacity, we can do the minimum size of expansion.

In this mode, we have implemented more than 60 audio and video algorithms, especially in the past year, the servitization algorithm accounted for half, these algorithms to cloud music 100+ business scenes to provide service capabilities. However, more complex algorithms and more business scenarios put forward higher requirements for our service efficiency, operation and maintenance deployment and flexibility. Before we went to the cloud, we had used more than 1000 cloud hosts and physical machines of different specifications internally.

The selection

With the increase of business scenarios and algorithm complexity, although by a lot of way to simplify the internal business scenarios, algorithm such as butt, but more and more mixed stock, incremental processing algorithm, different size, the flow of business scenarios and different business scenarios may reuse the same kind algorithm, let us in dealing with the time machine resources, far more than our time in development.

This also prompted us to start to consider more ways to solve the problems we encountered, the most direct three pain points.

The first is that the difference between inventory and increment becomes larger, and as new algorithms are implemented more and more, we spend more and more time coordinating resources to deal with inventory and increment. Secondly, with the increase of algorithm complexity, we need to pay attention to the overall specifications and utilization rate of the machine when applying/purchasing the machine. Finally, we hope that the processing of the stock can be accelerated, so that there are enough resources in the processing of the stock, and the time when the stock is inconsistent with the increment can be compressed in the processing of massive audio and video data. In general, we want to be flexible enough that audio and video algorithm services don’t have to focus on machine management.

However, the actual transformation is not only about the final service capability, but also about the ROI invested. Specifically,

  • Cost: including two aspects, the implementation cost of transformation and the cost of computing resources. The former can be evaluated in combination with specific schemes to obtain the required man-days of investment. In addition, the flexibility and expansion of the transformation in the future is also a point we need to consider. The latter can be estimated through the cloud vendor’s official cost calculation model, combined with our execution data. The key to our cost selection is that the IT cost will not increase significantly in the future under the condition that the transformation cost is acceptable.

  • Operating environment support: As mentioned earlier, the operating environment of cloud music is relatively diversified, which is deployed in the way of mirror delivery; There is relatively good CICD support within the team, which requires that future upgrades, deployment transactions, such as specification configuration, simplify the focus of developers on machines, etc. We hope that after the transformation, we do not need to spend too much time and energy on such matters, and pay more attention to the algorithm execution itself.

  • Elastic capability: In addition to the size of the computing resource pool provided by cloud vendors, we also pay attention to the startup speed of elastic computing power, whether instances can be reserved for fixed scenarios, and whether flexible and flexible capabilities can be provided to better support business development.

This all fits the definition of Serverless, with no server management required to build and run applications, excellent flexibility, and so on. Based on the above considerations, we chose the method of public cloud function calculation, which can directly map our current calculation execution process, and also meet the subsequent attempts to arrange algorithms through Schema. I’m going to focus on the process of introducing functions to calculate FC.

Fall to the ground

We did a quick trial of functional computation FC in a week, but a complete, highly reliable architecture requires more considerations. So our reform is focused on the work force only shot out of the FC, through the calculation of function system on the external input and output of the whole remain unchanged, and the system with flow control ability, can in case of special circumstances, the reversion to the private cloud, guarantee system of high reliability, concrete structure modification as shown in the figure below:

The adaptation of the cloud music development environment to functional computing is a key part of the transformation, and we focused on deployment, monitoring, and hybrid cloud support. In terms of deployment, we make full use of the support of function calculation on CICD and the support of image deployment to realize the automatic pull of image; On the monitoring design, on the one hand, using the cloud monitoring alarm function, on the other hand put it into our internal existing parameters of the monitoring system, make the development of the whole operational process to maintain consistency, finally from the code design, consider compatible with the realization of a hybrid cloud deployment, finally finished our audio and video processing platform Serverless transformation.

From the charging policy calculated by the function, we can see that there are three factors affecting the final cost: memory size, times of triggering calculation, and the cost of outgoing traffic on the public network. Directly from the technical architecture, we may pay more attention to the first two, in fact, the cost of traffic is also a large sum of money, this for us, is also a focus of attention.

According to the cost characteristics calculated by the function, in the case that the storage system still uses the netease private cloud, the audio and video algorithm with less outgoing traffic from the public network is selected in the first phase. As for the low outbound traffic of the public network, I will take an example. Feature extraction of audio is carried out. For example, if an audio is entered, a 256-dimensional array is extracted, the result is only a 256-dimensional array, which is far smaller than the traffic of the audio itself, so the outbound traffic cost will be less.

In the first stage of introducing function calculation, the algorithm of feature extraction class is improved by 10 times. The algorithm of sparse class can be understood as an algorithm with low daily usage, which can achieve great cost savings. In addition, the mirror cache acceleration capability calculated by the function optimizes the startup speed of our node so that all service pulls can be done in seconds. These works reduce a lot of operation and maintenance costs in algorithm operation and maintenance processing, enabling us to focus more on the algorithm and business itself.

The picture on the upper right is an example of the operation of one of the algorithms in cloud music. It can be seen that the range of elasticity is very large, and the function calculation well meets this appeal.

In the future, we hope to further liberate our human investment in operation and maintenance through Serverless technology, and will try to solve the problem of public network traffic, so that more scenes of audio and video algorithms can be realized naturally. Secondly, with the further improvement of algorithm complexity, the use of computing resources becomes more complex, and it is hoped that GPU instances can be used to optimize the computing process. Finally, in the business scene of cloud music, there are more and more real-time audio and video processing scenes. Similarly, it also has obvious fluctuation characteristics of peak and trough. We hope to accumulate more experience of Serverless service, and ultimately contribute to the development of cloud music real-time audio and video technology.

Author: Liao Xiangli, joined netease Cloud Music in 2015, the head of cloud Music music library research and development.

For more information, please scan the QR code below or search wechat (AlibabaCloud888) to add cloud native assistant! For more information!