Abstract: How to guarantee the practice of real-time audio and video service experience? Why do we need a media network? How can we improve our practices in real-time audio and video experiences?
This article is shared from huawei cloud community “How to decrypt Huawei cloud native media Network to ensure real-time audio and video service quality”, the original author: audio and video manager.
Hello, everyone. I am Huang Ting from Huawei Cloud. I am currently responsible for the design of huawei cloud video architecture. Today I will share with you the practice of how Huawei cloud native media network ensures real-time audio and video service experience.
I will share from the above parts. First, I will explain why we need a media network. Secondly, we will introduce the overall architecture design of Huawei cloud native media network. Finally, we will share our practice on how to improve the real-time audio and video experience.
1. Why a media network
1.1 Video content expression, all industries have the demand for video distribution
Why do we need a media network? I summarize three main reasons. The first reason is that we see a clear trend of content expression in video, and there is a strong demand for video distribution in many industries. Take a small example that I personally experienced. During the Spring Festival this year, my family wanted to take off the ring they had worn for many years. Because they wore it for a long time, their fingers became thick and they could not take it off. At first, our first reaction was to go to the shopping mall to ask a salesperson to take it off. Later, I searched “take the ring” on Douyin, hoping to give it a try. I found a very simple solution in the search results. The video didn’t last long, so I quickly took off the ring without damage to the ring or pain to my fingers. If you are interested, you can search for it. In fact, this is a manifestation of the video-based expression of knowledge content. This trend has appeared in many fields. In addition to short videos, such as e-commerce live broadcasting, online education, cloud games and other industries, the video-based development trend of content expression has also emerged.
1.2 With the emergence of new media expression forms, audio and video technology is becoming more and more demanding
The second reason is that we see a lot of new forms of media expression in the future. New forms of expression, such as VR and the recently popular free perspective, will bring users a more immersive experience. However, its requirements for audio and video technology are all-round improvement, mainly including bandwidth, delay, rendering complexity and so on. You can see the figure on the left. Taking VR as an example, if you wear VR helmet to watch videos, the bit rate required to achieve the ultimate retinal experience is very large. Through simple calculation, it is about 2Gbps bit rate. Moreover, there are more factors affecting VR experience than flat video: refresh rate, field Angle, resolution, low MTP delay, posture tracking, eye movement tracking, etc.
1.3 The Internet does not promise quality of service to users
We generally analyze a product from two dimensions: demand side and supply side. So the first two are kind of demand-side analyses, and now let’s look at the supply-side analysis. A very important supply side of real-time audio and video services is the Internet infrastructure. We all know that the Internet has basically no promise of quality of service for users. How do you understand that? First of all, the cost of building the Internet is very expensive, for example, you need to pull cable in the seabed, the laying cost is very expensive, including human and material resources, and the other part is the cost of wireless spectrum, such as 3G, 4G, 5G spectrum. Therefore, the construction of the Internet must consider sharing, sharing needs to use reuse and exchange technology. How do you understand the exchange? Take a look at this simple diagram below. Suppose we want to build four network nodes A, B, C, and D; If there is no swap, six wires are needed for pairwise interconnection. But if swapping is used, only 4 wires are needed. So from the cost point of view, need to exchange technology; We know that switching generally has two types of technology, one is Circuitswitching and the other is Packet switching. Circuitswitching is characterized by capacity reservation, but there is a waste of resources, because once reservation, even if there is no data transmission, bandwidth resources are occupied. Packet switching technology is link resource sharing, so lower cost can be achieved. At that time, considering the cost factor, the Internet design chose Packet switching technology for evolution. Because Packetswitching is selected, coupled with the best effort forwarding mode, it brings a series of problems such as packet loss, repeated packets, delay, and disorder. Therefore, we conclude that packet loss, repetition, delay and disorder are inherent attributes of this generation of Internet.
Here you can think about why the Internet was not designed to solve this problem at the network layer. Or a bigger question, if we were to redesign the Internet today, what would we do? Will the Internet try to solve these problems? The second question to consider is how to solve the problems of packet loss, repetition, delay and disorder in our daily application development.
1.4 Inspiration to us
The previous analysis gives us some inspiration. First of all, we think we need to build a media network to bridge the gap between the supply side and the demand side. The supply side is the Internet infrastructure, and the demand side is the rapid development of audio and video services. Second point: use this network to meet the high demand for audio and video distribution in different industries. Third, use this network to meet the challenges of the new technologies that will emerge in the future.
2. Introduction to huawei cloud native media network architecture
That explains why we need a media network. Next, I will introduce huawei cloud native media network architecture.
2.1 Huawei Cloud Native Media Network
Everybody can think that huawei cloud native media network cloud native video service is a technology base, based on the zhang yun, a native of media network will build up a series of broadcasting from production to processing to distribute to cloud native video services, such as a CDN, live, RTC, and so on, through these cloud native video services to support the above one thousand lines of all sectors of the customer. Our cloud native media network mainly includes seven features: flatness, meshing, intelligence, low latency, flexibility, diversity and end-to-end cloud collaboration.
2.2 Wide coverage: Support a variety of access modes to achieve global connectivity
Next, I will introduce Huawei cloud native media network, three important architectural design objectives. Because we serve people all over the world, we need to have a global deployment network first. This network mainly solves three problems: first, it needs to support multiple access modes, second, the interconnection of nodes; The third is to consider a highly available design for redundant coverage.
First of all, because we are a PAAS service, there are many customers from different industries. Take cloud conference as an example, many customers have very high requirements on the security and quality of cloud conference, so he hopes to access the network from his enterprise campus through a private line. However, some customers want their users to be able to access this network anytime and anywhere to distribute business, such as some Internet customers, at this time, they need to support Internet access. In addition, because the flow of a large number of our businesses ends at the edge, we mainly use single line access of China Telecom, China Unicom and Mobile to save the cost of service bandwidth. In China, the problem of cross-carrier network resource exchange is solved through three-line machine room or BGP resources. Overseas, we will preferentially choose IXP nodes with rich network resources to access. Use Huawei cloud infrastructure or high-quality Internet resources to achieve cross-border connectivity. In addition, we should consider the high availability design in the deployment planning. The common means of high availability design is to increase redundancy. We took site redundancy and bandwidth redundancy into consideration in the planning. We will ensure that users in the coverage area have at least 3 sites that can provide services corresponding to the quality requirements. In addition, when planning resources, we plan the bandwidth twice as much as the service needs to cope with some emergencies.
2.3 Industry-wide: meet the different business requirements of entertainment, communication, industry video and so on
Because we are a Paas service, we should not affect the features of other customers just because we meet the needs of one customer, and we should try to meet the needs of different customers as quickly as possible. The technology puts forward the requirements of three aspects: first, because need to meet the business needs of different industries, so the agility of business application development is very important, we need to make new functionality to any global edge nodes can quickly launch, in order to reduce the risk of new features to online at the same time, we need to support new features in the different edge gray online. We call this approach to development Living on the Edge.
The second technical requirement, which is a very important design principle for us, is that Edge Services are autonomous. Edge Services is a series of micro Services that we deploy around the network nodes of the media network, which we collectively call Edge Services. Each Edge service must be independent and autonomous, because we are a distributed media network and we do not want the failure of a single node (such as network failure) to affect our entire network Services. So each Edge service must be independent. What is self-government? When the Edge and control center network have some temporary failures, then I must ensure that the Edge Services can be internally autonomous, that is, its local Services can still be provided. We can see that four micro-services are simply listed on the left, among which local scheduling is to reduce the dependence on global scheduling. When some temporary faults occur in the edge and control center network, the edge can still provide services. In addition, our architecture within Edge Services is primarily divided by microservices. Its core purpose is to help us to quickly and flexibly launch some features. For example, we have protocol adaptation micro-service inside Edge Service, so that when we need to support new terminals and adapt some protocols, we can quickly launch a new protocol adaptation micro-service, which can be quickly launched. It does not affect the support of online terminals.
The third technical requirement is that an Overlay network needs to be able to flexibly define its routes. For example, Huawei cloud conference needs to support a large number of high-level government-level conferences, which have very high requirements on security and quality. We need to ensure that all packets of the conference entering our media network go through the backbone network of Huawei cloud, avoiding the use of Internet resources for transmission. There are also some customers who are sensitive to price. For these customers, we will try to use cost-effective network resources to forward their packets. This requires a programmable overlay network for flexible network routing and forwarding.
2.4 Whole process: Provide the whole process service of media production, processing, distribution and broadcasting
A third and more important design goal is that our architecture needs to be able to provide end-to-end services from production to processing to distribution to playback. Our customers are mainly divided into two categories. One is cloud native. Many Internet customers are in the cloud at the beginning of their birth, so they can easily use our cloud services. However, some customers need to transform from traditional offline to online. In order to serve such customers, our production and processing system is based on Huawei Cloud Stack, which supports flexible and rapid deployment online and offline. At the same time, we also provide convenient SDK. It is cross-terminal and low-power to help customers cover more terminals. The last technical requirement is that the whole real-time media processing pipeline can be flexibly arranged and dynamically managed. For example, in our joint innovation project with Douyu last year, we helped Douyu move the end-to-end special effects algorithm to Edge Services. This directly brings three benefits to Douyu. The first benefit is that the amount of development work is reduced. The original special effects algorithm needs to adapt to different terminals and different chips. The second benefit is faster iteration of the effects algorithm, which can be experienced by customers simply by deploying the new effects algorithm in Edge Services. The third advantage is that more terminal models are covered, because many of the special effects that are traditionally developed on the end side cannot be experienced by low-end machines. If it is put on our Edge Services, it can quickly meet the requirements of many low-end models.
2.5 Layered Architecture design: Adapt to the characteristics of the Internet
Finally, let’s share a very important architectural layering design idea. We borrowed the design idea of computer network system. Imagine what our application development experience would be like without the hierarchical computer network. Maybe I need to list the nodes of the entire network topology and find the optimal path to send my message from A to destination B. In this process, I also need to deal with various network anomalies, such as packet loss, retransmission, disorder and so on, which is obviously very unfriendly to application development.
Computer network system design is to solve these problems. The first is the idea of layering. There is a link layer at the bottom layer to shield the differences of transmission technologies of different links. For example, we support the application of the upper layer after 5G without modification. In the upper layer is the network layer, which has two main functions: forwarding and routing, so each application does not need to define its own forwarding path. At the top is the End to End layer. This is for the upper transport layer, the expression layer. A general term for the application layer. The purpose of layering is to modularize, reduce the coupling degree, and each layer focuses on solving the problems of each layer.
Our cloud native media network architecture layer also uses this idea for reference. We enhance the design at the network layer to improve the delay and arrival rate of packet forwarding. We use the self-developed real-time transmission protocol in Endto End layer to make the upper real-time audio and video application development easier. This allows us to focus our application development more on business logic. At the same time, we abstract out the media processing module, so that the audio and video related coding and decoding technology, pre and post processing technology, can evolve independently, rapid innovation.
2.6 Architecture Layer Design -Network Layer
Before introducing some of our key designs at the network layer and End to End layer, let’s first look at what’s wrong with the network layer. At the beginning of the design of the Internet, there was a very important quality attribute, namely, the high availability of interconnection. As we know, the Internet is composed of tens of thousands of ISPs. If any ISP fails, the network can still communicate normally. Among them, BGP protocol is a very important design, which mainly considers connectivity, but does not do some quality of service perception. We can see in the picture on the left, user A wants to send A message to user B, which is cross-carrier. It is likely to cross the Internet and cross many ISPs, which will bring many problems, such as aggravating packet loss and retransmission. Moreover, many of these key problems are non-technical factors. For example, the network policy of many operators for a certain network may not be of the best quality, but may be of the best cost, for example, there are some routing policies for cold potatoes or hot potatoes.
The second reason is that the operator may perform a device upgrade tonight, which requires the operation and maintenance personnel to perform some configuration changes. During the configuration change process, human error may cause link failure, or there may be a hot event in these areas, which may cause congestion.
In order to solve this problem, we decided to enhance the network layer. Here we mainly have two technical means; One is underlay, one is overlay.
1) First, underlay. We use Huawei Cloud global network infrastructure to improve network access and interconnection quality. Once entering our Underlay network, we can avoid competing with other Traffic on the Internet for bandwidth, which not only improves quality but also ensures security.
2) Secondly, overlay. In addition to self-built backbone network, we will also deploy some overlay nodes to optimize packet transmission path and efficient forwarding based on different Qos objectives, rather than random forwarding of packets. Our design principle at the network layer is also a very classical design idea of separating the control plane from the data plane. Simply speaking, the control plane is responsible for routing and controlling the operation of the entire network, while the data plane is responsible for forwarding.
In order to make data forwarding easier, we also adopt a very classical design idea in the network: the idea of source routing algorithm, the core purpose is also to reduce the complexity of forwarding equipment. Specifically, when a message into our first forwarding nodes of network, the system will forward the message to go through all the node information, including the destination node are encapsulated in the message header, so each forwarding node after receiving the message, only need to parse message header, knew the next-hop where to send, so can greatly reduce the complexity of the forwarding device.
Another very important design principle is that we do not make reliability commitment requirements for the network layer. Although we do not guarantee reliability, we still use redundant error correction, multipath transmission and other technologies to improve the delay and arrival rate of packet forwarding. That’s why we call this layer the network layer, and it’s still focused on routing and forwarding. Just a few enhancements.
2.7 Architecture Layer design -End to End Layer
The enhancement of network layer can help us to achieve lower delay forwarding and higher arrival rate. Next is our End to End layer. Here you can think about a problem first. As mentioned above, the Internet has so many inherent properties, such as packet loss, disorderly order and retransmission, which seems very unfriendly to developers. But the development of the Internet is very prosperous, there are generations of Internet applications email, Web, IM, audio, video and other businesses, what is the reason?
Here I would like to share my thoughts. A very important point is the protocol. There are many important protocols in the End to End Layer, which greatly lowers the technical threshold of our application developers, such as TCP, HTTP, QUIC, etc. Every generation of Internet applications has seen the emergence of a protocol. The core design goal of End to End Layer is to define a good protocol and development framework to make application development easy.
How do you do that? You can see the diagram on the left, and the middle part is a general function diagram of our own real-time transport protocol, which we will provide a unified interface on its north side. This set of northbound interfaces allows us to develop both real-time audio and video services and reliable messaging services. At the same time, let’s take a look at its southern orientation. The protocol stack shields the underlying differences of using UDP or ADNP protocols, so that application development will become easier.
The protocol stack is designed to make application development easy. Therefore, we also abstract two modules, NQE and QOS, through which the two modules provide callback methods to quickly feed back the network information to the upper application, such as coding module. The coding module can quickly adapt to the network conditions, to adjust its coding parameters.
Another very important design principle is efficiency. As we know, as mentioned above, there will be many IoT terminals in the future. The IoT terminal has a great feature, which requires very high power consumption. We hope to consider this problem at the beginning of the design of the protocol stack. Therefore, we do not want to easily add extra unnecessary copies in this layer. ALF’s design principle is followed here, which is also very classic. The RTP was designed with this principle in mind.
In addition, the design of our protocol stack also refers to the design idea of QUIC. Supports multiplex, network multipathing, huawei LinkTurbo, and priority management. Here we share a small experience, is in the development of free perspective and VR business, the bandwidth requirements are very high, at this time we will enable the multipath function, can obtain a relatively large improvement in experience.
2.8 Target architecture of Huawei Cloud Native Media network
Finally, I make a brief summary of the target architecture of the entire media network.
1) Simply speaking, it is to simplify complex problems and divide and conquer them. Each layer can be decoupled from each other and evolve rapidly through layered design.
2) Each Edge service is independent and autonomous to improve the availability of the whole service;
3) By dividing Edge services into micro-services, we can adapt to customers’ needs more flexibly and realize the rapid launch according to the micro-service level.
3. Real-time audio and video service quality assurance practice
In the third part, I will share some of our practices in real-time audio and video service quality assurance. Here are mainly some thoughts on algorithm design, and the previous part is mainly some thoughts on architecture.
3.1 Video, audio and network are key system factors affecting experience
As shown above, we did an analysis of the relevant dimensions that affect the experience. From objective indicators to subjective indicators, and then to the relationship between QoE made a simple map. Through analysis, we found that the core three systematic factors affecting the quality of real-time audio and video service experience are video, audio and network. Next, I will introduce the algorithm practice of these three parts respectively.
3.2 Video coding technology
First, let’s look at video encoding. We put the video coding technology in accordance with the design objectives of a simple classification, the first category, its design objectives is how to scientifically reduce the redundancy of video coding, reduce the impact of coding distortion on the subjective feelings of the human eye. Because our real-time audio and video services are mainly oriented to people, there are some very classic optimization ideas, such as: starting from people, analyze the visual characteristics of human eyes, and optimize the coding algorithm based on these characteristics. The figure simply lists several categories of human visual characteristics with relatively high correlation to coding.
There is another optimization idea, that is, from the source, that is, from the content, we will analyze the characteristics of the content of different scenes to optimize the coding algorithm, for example, the characteristics of computer-generated images have low noise, large flat area and so on.
The second design objective is how to scientifically increase redundancy to resist the impact of weak network transmission on subjective perception of personnel. Here are a few types of encodings that add redundancy, such as extreme full I frame encodings, in-frame refresh modes, and long term reference frames and SVC encodings. In some spatial video services, in order to improve the delay of spatial positioning, we will use some full I frame codes combined with some common codes to reduce the delay of spatial positioning. In cloud games, in order to reduce the burst of large I frames, we use the coding method of in-frame refresh. In real-time audio and video services, long term reference frame and SVC are common encoding methods.
3.3 PVC perceptual coding
Here are some of our specific coding techniques. Our cloud video team, together with Huawei 2012 Central Institute of Media Technology, improved the PVC perceptual coding algorithm from the analysis of human vision system. Our algorithm went through several iterations. The latest perceptual coding 2.0 algorithm realizes 1Mbps bit rate to provide 1080P 30 frame hd experience; The main improvement ideas of the algorithm are as follows: Firstly, scene and region are distinguished by pre-analysis and coding feedback information. The main highly sensitive areas in real-time call scenes include face area and static area. For different scenes and regions, different encoding parameters and bit rate allocation strategies are adopted. For example, low bit rate is allocated to non-highly sensitive regions. 2.0 Algorithm On the basis of 1.0, we added AI technology in the aspect of code control. Compared with the previous combination of fixed bit rate and resolution, the new method is based on AI perceptual code control to obtain the optimal combination of bit rate and resolution in different scenarios, so as to achieve better subjective effects under low bandwidth.
3.4 SCC code
The second coding technique is SCC coding, which is mainly used for coding computer-generated images, such as screen-sharing scenes in education or conferences. Our algorithm delivers a 65% improvement in compression performance compared to the X265 Ultrafast. Our coding speed has increased by 50%. For the screen sharing scenario, we also addressed some of its unique issues. When sharing, we often share some pictures and pictures, such as Word or PPT. This kind is relatively static. At this time, encoding parameters will generally adopt a low frame rate and try to ensure its image quality, but in many cases, the shared text and text will switch to shared video. Without a good perception of this, our experience of watching a video is a discontinuous image, similar to a GIF.
In order to solve this problem, we adopt the complexity analysis based on video time-space, which is adapted to the frame rate of video coding. This allows you to have a high-quality image in a static graphic, and it also ensures smoothness when switching to video sharing.
The second problem we solved was the color degree distortion caused by sampling from YUV444 to YUV420 scenes, because we know that most of the time the screen shares static graphics, and the color degree requirements are relatively high. However, when it is sampled from YUV444 down to YUV420, the signal in the UV domain will be greatly attenuated. The left picture shows the effect before the new algorithm is used, and the right picture shows the effect after the new algorithm is applied. Obviously, it can be seen that the font in the right picture is clearer and the color degree distortion will be smaller. The core here is the use of low complexity color correction algorithm.
3.5 Adaptive long-term reference frame coding
The first two coding techniques are down redundancy while the adaptive long term reference frame coding technique is scientific lifting redundancy. To better understand this, let’s simplify the problem and understand what a fixed long term reference frame is. We see the picture on the top left. The red is the I frame, the green is the long term reference frame, and the blue is the normal P frame. By such a reference frame, the original normal Ipppp forward reference dependency is broken, so that when its P2 or P3 is lost, the subsequent P5 decoding will not be affected, and the decoding can continue, which will improve its fluency. However, there are still some disadvantages, such as this green long-term reference frame P5 is missing, because the subsequent P frames rely on it, so they cannot be decoded. The second problem is fixed. Because of the long reference frame, it will bring a certain redundancy, which will lead to a decline in the quality of the same bandwidth. Therefore, we hope that when the network is good, we can reduce the redundancy as much as possible to improve the picture quality, so we put forward the method of adaptive long-term reference frame.
The core idea of adaptive long-term reference frame is two points. The first one is to add a feedback mechanism at the decoding end to tell the encoding end that I have received the long-term reference frame. After the encoding end knows that the frame has been received, it will encode with reference to this frame. The second is to add a mechanism of dynamic MARK long-term reference frame, that is, I will dynamically optimize the step length of long-term reference frame encoding according to the QOS situation of the network. When the network is good, the step length will be shorter, and when the network is bad, the step length will be longer.
However, the addition of feedback mechanism will bring a problem. When the RTT of some network models is long, my feedback cycle will be long. And feedback message may be lost, also need feedback again, this will lead to long-term reference frame of the step length is very long, once the step length, its coding quality will drop, even down to the point of business can not accept, at the time of our optimization algorithm is also given that, when the reference frame of the step length is too long for a long time, We force the P-frame to refer to its nearest long-term reference frame, rather than relying entirely on feedback mechanisms. This will bring two good optimization effects. One is that in the case of sudden packet loss, its picture fluency becomes better. Meanwhile, it has a good ability of network adaptation, which can give consideration to both fluency and picture quality.
3.6 Network transmission technology: seek the optimal solution of interactivity and quality
The first is some video coding technology to share, next look at our practice in network transmission. The core goal of our definition of network transmission is to achieve the optimal solution of interactivity and quality. We know network transmission technology, mainly to resist packet loss, delay resistance, resistance to jitter. Common techniques such as ARQ and FEC, unequal protection, expansion and dithering estimation, cache, etc., in addition to do shake resisting packet loss, also need to have a congestion control and congestion control core purpose is to make “sending rate as much as possible to close to” available “rate”, as far as possible at the same time maintain a low latency, if send rate and network bandwidth available do not match, Packet loss, jitter, or low bandwidth usage may occur. There is also a very important source channel linkage. The dynamic long-term reference frame we saw above is a way to dynamically adjust the coding parameters through the information of the channel. Based on this linkage, we can better improve our experience.
3.7 Based on reinforcement learning, improve bandwidth prediction accuracy and QoE experience quality
The algorithm of bandwidth prediction is very important in congestion control and source channel linkage. The traditional approach is to make some prediction of bandwidth under different network models by using artificial experience and some decision tree algorithms. However, the effect of this approach is not particularly ideal in complex scenes, so we hope to improve this point through reinforcement learning.
The main idea is network QoS based on the feedback of the receiver, which mainly feeds back four information: receiving rate, sending rate, packet loss rate and delay jitter. Based on these information, the prediction accuracy of bandwidth can be improved through reinforcement learning. After algorithm optimization, our HD ratio has been improved by 30%, and the lag rate has decreased by 20%.
3.8 Audio 3A technology: improve audio clarity
Finally, I’ll share some technical practices in audio. A good 3A algorithm is essential for a speech articulation experience. We applied AI technology to the 3A algorithm to improve the voice experience.
First, we applied AI to echo cancellation, which is a very important step in the whole 3A. Traditional algorithms in steady state conditions of echo cancellation, has been relatively mature, generally processing is better, but when there is some environmental changes, such as I took my mobile phone hands-free calling, in the home, from the room to the balcony, the environment has changed, echo cancellation will face many challenges, can better deal with these problems by means of AI. Especially for the scene of double speech, our new algorithm solves the problem of missing echo and missing word.
Followed by noise, the noise of the traditional, such as the noise of the steady state, such as fan, air conditioning is relatively good, and the noise reduction algorithm based on AI not only can we better smooth processing noise, in response to such as keyboard, mouse percussive sound or water, cough this sudden noise scenario, we can also quickly for noise suppression.
Another important part of 3A is automatic gain. In the case of a call, automatic gain is mainly based on the recognition of human voice. At this time, the detection of human voice VAD is very important, we also through AI technology to improve the accuracy of human voice detection, improve the effect of automatic gain.
3.9 Audio Packet Loss Recovery technology: Reduces the impact of packet loss on audio experience
Another technology that is somewhat different from video technology is the technology of audio packet loss recovery. The figure on the left is also a classic technology map of packet loss recovery, which is mainly divided into two categories, one is active packet loss recovery, the other is passive packet loss recovery.
Active packet loss recovery technology mainly includes common FEC, ARQ, etc. There are three main methods of passive recovery: interpolation, insertion and regeneration. The optimization idea of the algorithm is the same as that of video, which starts from the study of people. Video studies the characteristics of human eyes and vision, while audio studies the mechanism of human voice, and the fundamental frequency information reflects the vibration frequency of vocal cords to a certain extent. The envelope information reflects the mouth shape to a certain extent. Based on these two information combined with AI’s vocoder technology, the recovery level of audio packet loss can be achieved at about 100 milliseconds. We know that the sound of a Chinese character is generally 150 milliseconds to 200 milliseconds, the traditional PLC signal based recovery method, generally can do 50ms audio signal recovery, now we based on AI is able to do 100ms audio signal recovery.
3.10 Case 1: Huawei Changlian, the world’s first full-scene audio and video call product
Finally, I will share two cases. Our products not only serve external customers, but also internally support many other Huawei products and services. I have been joking that it is actually more difficult to support internal customers, and it is even more difficult to support huawei’s internal customers, whose requirements are very high. Now we support The Changlian service of Huawei mobile phones, which is the world’s first full scenario (in addition to supporting mobile phones, We also support the real-time audio and video calling products of Huawei’s large screen, Huawei’s tablet, Huawei’s notebook, watch and bracelet. We help Changlian to achieve high-quality 1080p30 frame calling effect under the condition of 1Mbps bit rate.
3.11 Case 2: Webinar: Integrating conference and live broadcast, making it easier to hold a conference
Supporting two is harder than supporting one internal Huawei customer. The second internal customer we support is Huawei Cloud Conference. The webinar scenes of Huawei Cloud Conference are also developed based on our real-time audio and video services. Now we can support three audiences in a single webinar, among which one hundred are interactive. In the second half of this year, our cloud conference product will support 10,000 square meters of audience and 500 square meters of interaction in a single webinar.
4, summarize
Finally, I want to summarize what I’ve shared today. First of all, we can clearly see that video business is driving the development of the entire Internet technology, including audio and video coding/transmission technology, as well as edge computing and edge network technology. So we need a service or system to bridge the gap between the Internet infrastructure (supply side) and the fast-growing video business (demand side).
Secondly, today’s sharing is just the beginning. With the increasing application scenarios of real-time audio and video technology, driven by data, our cloud native media network architecture and various algorithms will continue to be optimized.
Finally, I hope huawei cloud native video service can walk into the “new era” of video together with everyone.
Thank you.
Click to follow, the first time to learn about Huawei cloud fresh technology ~