Welcome to Tencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article was first published in cloud + community, shall not be reproduced without permission.

Author: evergreen | tencent video cloud terminal technology leader

In the fourth quarter of 2017, Tencent Cloud terminal team cooperated with wechat to put Tencent cloud accumulated for many years on wechat in the form of SDK, thus opening up the audio and video capabilities. Today I mainly introduce to you about the small program audio and video, and then first do a self-introduction. I am also a classmate of Tencent Cloud. We have a very good video cloud, where all the live broadcast, on-demand and real-time call scenarios are implemented. I am mainly responsible for the video cloud terminal technology, which is also around the old line today. I would like to talk to you about how we put audio and video technology on the small program.

Today we are going to focus on these parts. First of all, why are we doing this? Because a lot of friends say Tencent cloud is not interested in this thing I don’t listen to, because there is no demand for that. In fact, we can take a look at it, there is a lot of segmentation, there is a lot of commercial value and space to explore.

It’s worth noting that today’s topic is mainly about principles, about technical routes. This piece is mainly shared, not too much advertising meaning. If you have time to listen patiently, I think it is good. If you haven’t heard this before, I guess you can be a semi-expert on audio and video. Finally, I would like to talk about the fast landing. I have been using WEBRTC, which Teacher Huang just said, to achieve a goal, so that I can finish a very good thing by myself. Before that, there were at least two people, one front desk and one background.

The advantage of small program everybody knows, do not have installation cost, for example at ordinary times go to brush baidu stick bar, you point open advertisement you have no desire to continue to download. These ads need to flash, how many people install? It’s a math game.

There are also some low-frequency scenarios that are just needed. If you really solve the problem of APP installation, such as the Mobike mentioned by Teacher Huang just now, if you don’t install the APP before, the APP will crash if you install more. So that’s where this scenario comes in handy. There is a fact that just said, the advertising effect is very good, we have no brush moments of friends encountered long press scan two-dimensional code into our introduction page or introduction of the small program, there are a lot of advertising in the small program is spread through this way. First, the effect of spreading in the circle of friends is actually easy, and second, the transmission of information is better.

Since it has so many advantages, I will naturally think how to combine the field of audio and video, how to make such a good scene? Say we have live, on-demand and audio and video apps. Live broadcasting, such as Yingke, Huajiao, Douyu, etc., on-demand broadcasting, such as Youku, Tudou, iQiyi, as well as video, we hold video conferences on wechat. These scenarios combined with wechat small programs have great market prospects?

Double-sided audio and video

Talking about double-sided audio and video, if two people talk is two friends or very good relationship, this is called making friends. If you don’t know people to chat, it used to be called naked chat, it’s an unhealthy industry. But if you put it in the enterprise, put it in the customer service system it is completely different, for example, online customer service system, there is a big advantage, compared to some traditional telephone system. We can see such a scene, driving to work in the morning, late boss did not wait for the return, very anxious. At this time, waiting for the car to wipe, you wait for the trial of the people come over, blocked traffic 1,000 or 2,000 penalty off. At this time, I hope online video can be solved. We have a lot of customers trying to solve problems with apps. But many users are very resistant. If you click a link with him at this time and search for an insurance in the small program, you can complete this simple process. Then you can connect the video call, take a picture here, and then the person there will have a look at your rights and responsibilities, so that both the insurance company and the user are very fond of it.

One more remote review, we’ve worked with the courts before and there are cases like this. Court in a big city, the plaintiff and the defendant are two rural boys, because a little trivial things quarrel. In fact, the court only needs to do a simple remote thing, such a small thing, you have to go to the big city to litigate, a lawsuit is not so easy to fight. There will be several trips back and forth. Now rural 4G and WIFI coverage is also good, courts only need to solve things remotely, which has a lot of commercial space and profit space.

These scenes have a lot of points that can be combined. We have a lot of functions, including live broadcast and on-demand broadcast, which is very low cost.

The principle of

What I want to focus on today is to talk about the principles, and this is actually the main part.

This piece actually tells the story before talking about this thing. There is also a cooperation with the wechat team. At that time, we hope that the technology of Tencent cloud can be directly put into the APP of wechat. Some students on wechat said that we set some requirements, just like the Us Department of Defense set requirements for others. How many technical indicators do you need to meet? The requirements are very high. Wechat has several standards: first, it is easy to use; second, it should be extensible and customizable; our developers can get the requirements of making various scenes; third, it can achieve live audio and video broadcasting; fourth, fifth, sixth, seventh, eighth and ninth. I think this requirement is too exaggerated.

I like to be challenged. You know, Kalaskov of the former Soviet Union once made a great gun, which was used to block the main weapons in Vietnam at that time. In fact, it was very simple in design concept. Like that little workshop in Afghanistan where a couple of old guys can do it. Second reliable; Never stuck, the critical moment stop, a press out. Unlike Indian made guns, they don’t fire when they should. It’s very powerful. The design of this gun is something we should continue.

We were also wondering if we could think about how to design such a concept, such an advantageous solution. The final result is of course quite ideal, we can be regarded as in this direction to do some efforts, also made some achievements.

First of all, let’s not talk about the technical architecture. Let’s talk about the audio and video components embedded in wechat. It is a free version, we have polished this product for more than two years, and now it is updated once or twice a month. And this SDK has two parts, one is audio and video uplink, the other is audio and video downlink.

What problem does uplink solve? Upstream call push stream, is the local picture after collection and then preprocessing, some people may ask what processing to do? Beauty, for example, is a very down-to-earth need. Again to reduce noise, sound also need to reduce noise, audio may be other background is very bad, listen to the very uncomfortable. And then coding, we need to push that down by an order of magnitude of 10, 20 times. Finally through the network module to the cloud, now basically audio and video research and development are dependent on this, but the effect and stability are relatively checked, but now the network cost has begun to come down, so there is no need to do so, now you can directly use the Cloud.

And then down, commonly known as pull flow. The original up, now down, this is called play. Play, in fact, from top to bottom, especially when the network speed is fast or slow, you will find that play a card a card, this effect is not good, so be sure to add a cache, like a reservoir, when appropriate in the optimization, decoding, rendering.

You’ve got uplink, you’ve got downlink, you’ve got playback, that’s the architecture. Wechat from the label to the following SDK, and then to the network and then to the other end, so that the link is connected. With links we have two basic principles that can be combined into colorful events.

Technology evolution, the first is the corresponding upward, with the two label after we live function by cloud is in the middle, is everybody see reflected, such as density, Chinese prickly ash can experience, basically can do all of the some things, including all aspects of these messages, there are various aspects of delay are good, but the full screen effect is better than the original do good.

Why do I add a cloud to this plan? The mic that I have in my hand is part of the audio system throughout the venue, and it’s responsible for picking up the sound, and actually, as I said, pushing the stream, and processing it. I expect there will be a digital processing circuit in this area, which will do some cleaning and integration of the sound, and then teach the subsequent system to do step by step amplification. What is the stepwise amplification procedure? We may Tencent cloud tens of thousands of machines, all over the country to see to expand to ten thousand machines. You can think of the cloud as a signal amplifier, a single point of the circle, infinite copies, so that everyone can pull a high-quality audio and video stream in the nearest machine room, and that can solve the lag problem and the fluency problem.

When you have such an amplifier, add up and down, and you have a high-concurrency solution. The advantage is relatively cheap, like Tencent cloud prices can be looked at, the price is very low. In addition, its quality is very good, and can do multi-resolution switch, but because there is a large amount of data, so the delay is at least 2 seconds.

And then we’re going to do power +, upgrade. The live scene is coming, but the DS scene still needs to be done. When will DS scenes be needed? Now I need to do remote control. The requirement of online doll clip scene in 2017 is extremely strict. The delay is extremely high. If you can do 100 milliseconds that’s really cool, so we’re going to be a little bit more technical. We thought of two schemes, one is UDP acceleration, one is delay control. Doll machine is a remote control, normally 2 to 5 seconds, the real machine requirements is 500 milliseconds to transfer it to the machine side of the past. What additional technology points do we need to accumulate? The UDP protocol was designed with the idea that the world is for the common good in mind. So the idea is that if you move, if I move, the scene becomes a problem, a little bit of a jam, and it starts to slow down. You’re going to get hurt doing high-delay scenes, and sometimes you just want to be tough. So what to do? We might have to switch. I’ll use UDP. When the network is bad I also want to continue to send. The second color control function actually made Tencent cloud’s solution to the risk conference held at the end of last year stand out from the rest of the crowd. The delay control did not need to be based on the time stamp, so we ensured that the delay of live broadcast to the audience was controlled within 3s.

With such a link, remote remote control, remote interaction can be done, but it is still one-way.

With the one-way thing, the delay is very low. I will follow you all the way with the low delay, and you will follow me all the way. Will this matter be settled? So professional audio and video features come out? It’s not that simple, we still need to fix a lot of technical points. Noise, cancellation, echo suppression, etc.

Let’s first say all the way, all the way down is one-way, RTC mode is used for the two ways, now after the mode is selected RTC, the delay on both sides is 500 milliseconds, two-way call can be solved, behind the technical level we need to make such a thing. Let’s say I have a little bit more latency, can’t I have a little bit less latency control? But losing the data must be very bad. University when the teacher will say audio and video solutions you use UDP on the line, but after the compression of the code and data really want to lose the solution can not come out, so really can not lose. So what to do? We might just be retracting time where you don’t see it. I’m talking about a 40-50 minute speech, and it’s okay to leave out a word or two. What we do is to delete the extra time gap, there is a lot of space in the speech can be worked on. For example, we can do some articles on each point, and cut out the data that we think is not very important. In this case, the sound is not wrinkled, and it feels a little different from the original, but the content is normal. So in this case we’re going back in time.

Sometimes there will be a two-way audio and video broken sound, the problem of popping, the Angle of the function to learn when to settle this problem is quite simple, I want to put the soft voice a bit, to do some echo suppression, such as I speak now, you see he has no strength of circulation, in fact it is echo cancellation of electronic components. We have to go on the software solution, is to broadcast the original sound to it to cancel, so that will achieve echo suppression, otherwise two people call will hear infinite echo.

The above part of the acoustic processing is not one or two days can be done, this piece to you may have to raise a research and development team, the team has a lot of acoustic experts, audio and video experts, now the small program has the advantage of RTC. I think we’ve done something relatively universal.

With this two-way audio and video, we continued to push the technology up and down the room IM. Because double you me all the way all the way, is very simple, very clear, many people, is very troublesome, so can’t play like this, so I need a total control system, to coordinate each side state, coordinate each side output, then including who to talk, who doesn’t speak some coordination, the concept of the need to have a room, the management of the room. In addition to IM, do the program, you can solve the multi-party solution.

In fact, in the multiplayer solution, the server side needs to do more things, and it also needs A concept similar to room management, which synchronizes the status of A, B, and C. We actually do not have so many things built into the small program, in fact, there is a point that wechat is mainly simple, do not make so complex. So put forward a Rtcroom solution, is attached with some additional logic, this piece in Tencent cloud mobile live solutions can be found in the corresponding things, or in the small program audio and video can also be found in the corresponding information.


After finishing this part, we walked down the whole technical route, from simple live broadcast to DNS and two-way, in fact, most of the audio and video scenes can be covered. But then someone came up to me and said, we’re Webrtc, and Apple is in it. At this point we want to talk about the differences and how we optimize Webrtc.

Difference, now if you directly in wechat to do Webrtc still have a lot of restrictions, the first browser kernel, in different phones may not be consistent, serious fragmentation. Apple is embedded, if you want to use this thing, small program can’t implement. So far, any support for Webrtc has been agreed upon by Apple and Google, which is a long process. So in the small program this piece is to do some grounding qigong energy, not to see apple grandpa and Google dad eye.

Moreover, there will be some differences in the design concept. Many concepts of Webrtc are based on unreliable chain structure, and our small programs can be solved by relatively cheap cloud, which is the comparison with Webrtc.

We’re not in competition. We can actually be a team with Webrtc. Tencent cloud background recently plans to connect the two systems. After the release of wechat in April, you will be able to communicate with Chrome on the mini program. This part is a bit difficult, interested friends can look at the solution or technical content.

When it comes to getting through, this involves another set of solutions, small program +Webrtc, this piece on the original basis of the protocol into ROM first can be.

It was very painful for him to get started quickly. Now we can directly find the corresponding package on Teacher Huang’s system, and we can upload the package inside. And debugging is particularly convenient. I’m going to stop here and see if you have any questions.

Q/A:

Q: As an individual developer, SOMETIMES I see some open information of individual developers, such as audio and video related things are not open. I think to develop such products, it needs to apply for qualification and costs a lot. So small procedures in audio and video this aspect will not be more open in the future, so that the cost is a little lower.

A: There seem to be some simple ones. It’s hard for you to watch the inside story of the live broadcast. In fact, I really want it to be open, and I have taken the initiative to talk to the leaders of wechat about this for several times. My wechat classmates told me a very realistic concern. In the past, if there were pornography and politics involved in the APP, it had nothing to do with the APP. Therefore, they were very cautious about this issue. In fact, the core of the submission of qualifications is a problem. If there is a problem with this qualification, we will make a control measure here. It just wants to make a self-protection plan. Personal developers are the ability to personally debug, and some insider stuff, which is actually relatively easy to commit to.

Q: Will on-demand and live broadcast have caching in between?

A: Actually, the cache is relatively small. For example, youku and Tudou are on-demand and videos are uploaded. You can watch them from the middle and at the beginning, which is on-demand. Live streaming means I’m uploading in real time with the camera on. If you have too many, there may be a lag of heel height. Tencent Intranet is open, there is no need to do so much cache, the real cache is mainly you just saw this piece, this piece will have our cache area, there are a series of optimization algorithms, this is how big and then do cache. In fact, the only cache in the entire system is this one.

Q: I just wanted to add cache because the latency is higher. In addition, teacher Chang mentioned pornography and politics. In your speech, you said that the sound processing could be done at every second interval, and there would be some beauty when making videos, so you have a high control over these contents. In this case, in the middle of the level similar to the kind of detection of pornographic and political directly can be screened.

A: There are systems now that require human intervention. I am now such an attitude, has not completely let go of the state, now is an error, the host may be my shape is relatively yellow, so yellow, this needs to lead a human, there is a delay in this. First report people to the detection system, the more the audience line, the higher the cost of your false alarm.

Q: If we are doing live broadcasting, although we can do some porn and video processing, through some AI and monitoring, is it possible for us to do some recording and broadcasting? If something happens, we need to provide some proof. What you mentioned is directly required on our platform.

A: Is it recorded?

Q: Is it possible to record it live?

A: This must be no problem, because you can record everything as long as you open A button, or record it from time to time. But in the cloud solution you can move it over, is in the high technical content of the place, you have to put the audio and video again slow, such as financial accounts, and court are recorded, the only problem is that the record is a cost, is the need to pay. We correspondingly promote the recording of the political is open, commercial is not open. In fact, for many of our big customers, the annual cost of bandwidth takes up a part, and they save for a month, which is a very large amount.

Question and answer

What technology can short video access applets?

reading

Liu Yi: How to use small program technology to solve the problem of enterprise sales

Li Ming: Micro channel game technology sharing

Yu Guoliang: Architecture design and development of wechat mini-games

Has been authorized by the author tencent cloud + community release, the original link: https://cloud.tencent.com/developer/article/1084503?fromSource=waitui