Goose factory YouWen | through small program audio and video and webRTC

Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

The author: Tencent video cloud terminal technical director evergreen, graduated in 2008 to join the tencent, has been engaged in research and development related work, client has successively participated in PC QQ, mobile phone QQ, QQ group, and other products project, at present in tencent video cloud team is responsible for the audio video terminal solution optimization and fall to the ground work and help customers under controlled research and development costs, To obtain the industry’s first-class audio and video solutions, our current product line includes: interactive live, on-demand, short video, real-time video call, image processing, AI, etc.

Mind map for this article

Let me introduce you separately

What is applets audio and video?

In 2017, Tencent Video cloud team and wechat team jointly integrated the video cloud SDK with wechat mini program, and opened internal functions in the form of <live-pusher> and < live-Player > tags. With these two tabs, developers can live stream, low-latency surveillance, two-person video calls, and multi-person video conferencing.

So what is WebRTC?

WebRTC (Web Real-Time Communication) is a technology that supports real-time voice or video conversations in Web browsers. It is a technology acquired by Google after it acquired GIPS. No plug-in is required in Chrome. Real-time audio and video calls can be programmed using javascript.

What’s the difference?

If you’re a pragmatist like me, let me give you a simple pragmatic conclusion: Applets took the phone, WebRTC took the PC.

If you are interested in technology, we can list the differences from multiple technical perspectives. Here is a detailed table for comparison:

Implementation principle: LitEavSDK of Tencent Video Cloud was embedded into wechat to realize the audio and video capabilities of the small program, and then opened up the audio and video capabilities of SDK through and < live-Player > labels. So the applet tag serves as a developer API, and the internal SDK is really used for audio and video functionality.

WebRTC came from Google’s acquisition of GIPS (it has to be mentioned here that the first team I joined Tencent was QQ team. At that time, QQ’s audio and video products were purchased from GIPS company, but due to various unreliable, they later turned to self-research). So the technology was kept intact and incorporated into the core of Google’s Chrome browser. Apple has also recently started supporting WebRTC capabilities in its Safari browser.

The main protocols of small program audio and video are the RTMP push stream protocol, which is the most commonly used in the field of live broadcasting, and the HTTP-FLV playback protocol. These two protocols have been settled for many years, and the data on the Internet is also crowded.

At the bottom of WebRTC, RTP and RTCP are two data protocols. RTP is mainly used for audio and video data transmission, while RTCP is generally used for control.

Fragmentation problem of mobile end Because the small program audio and video is uniformly implemented by wechat, and the wechat team tries to require function alignment for each version, otherwise it would rather not be uploaded, there is basically no fragmentation problem.

WebRTC is more awkward here. On the one hand, the fragmentation of Android system itself makes the specific performance of WebRTC appear to be “a hundred flowers bloom”. At the same time, the current embedded WebView of iOS (that is, a variety of embedded web pages opened in wechat and other apps) does not support WebRTC, which is also a very troublesome problem.

Scalability small program audio and video, following the release of WeChat what’s the problem generally is the flow of the current code correction, and then follow the next release, so in general a function point (such as add a facial features to pusher) or a problem point (such as enlarge) does not support gestures from the establishment to achieve (or solution) only need a month’s time, And the coverage of the new version of wechat APP is really fast.

WebRTC, by contrast, is not a team or company problem, because it has gone the standard route, so each new feature sets the standard and then pushes browser vendors (including Apple) to follow suit. There are more stories and more time.

Desktop browser believe you have found that in the previous several issues of analysis, my view is inclined to small program audio and video. Indeed, in the current domestic mobile field, Google and Apple can not have the final say, the real final say is wechat.

But when it comes to desktop browsers, Chrome’s current presence in the PC browser market gives WebRTC a huge advantage, allowing developers to do what they want without having to install plug-ins.

By contrast, there is no native Support for Chrome, so if you want to connect applets to audio and video on a PC, you need to install a browser plug-in or invoke a local EXE application (similar to opening a QQ chat window on a web page) via a pseudo-protocol like WxLite ://start.

It’s not a zero-sum game

Small program audio and video and WebRTC support is not zero and bo Art, both sides have their own advantages and disadvantages, so in line with the “beat them, join them” idea, Tencent video cloud team in 2018 Spring Festival back, on the continuous began small program audio and video and WebRTC interoperability related work.

At present, need to report to the developers is that in the latest version of wechat, small program audio and video has been able to WebRTC through, at present in the PC Chrome browser can be with small program real-time audio and video communication.

// to-do

Of course, if you want to know how this function works, you can read on:

Know WebRTC well

Just like marriage, if you decide to choose another person as your partner for the rest of your life, you must first learn a lot about him/her, such as personality, temperament, hobbies, etc.

Similarly, if we want to get through the small program audio and video and WebRTC, we must also know more about WebRTC. Here I will say that I have some understanding of WebRTC this “person” in character.

First of all, she’s not very pretty, but she has a lot of substance.

Saying WebRTC is ugly is just a metaphor. I mean, it is not cheap to learn WebRTC. Although Google has made a lot of easy-to-understand PPT to teach you how to get Start, you still need to calm down to learn it completely. Slowly adopt her as a self-approved target. But if you’re in love for the first time (that is, your first exposure to live audio and video), you’ll find that learning WebRTC is itself a process of learning the technical details of live audio and video.

Secondly, she likes to accommodate others very much and can support all kinds of architectural schemes.

WebRTC likes to accommodate people, is also a metaphor, WebRTC support background architecture is very many (such as Mixer, Mesh, Router), and Google think these background implementation is relatively simple, so there is no open background related source, nor provide a unified background solution. This kind of open design idea is great, but the side effect is high implementation cost. When it comes to real projects, it’s easy for small companies or developers to be blocked by this technological barrier. In particular, the rigid requirements of recording and archiving require a significant amount of time spent in custom development to truly apply WebRTC to enterprise solutions.

Program establishment

After understanding these characteristics of WebRTC, our interworking scheme is relatively clear:

First of all, the characteristics of small program audio and video interface is simple, fast start, this is the advantage of small program; This is WebRTC’s disadvantage, so instead of exposing a dozen interface classes to WebRTC on the applet side, we continue to use the <live-pusher> and <live-player> tags for applet audio and video to solve the problem.

Secondly, there is no official implementation of WebRTC background, which means there is a lot of room to play, Tencent video cloud can achieve a set of WebRTC background and its small program audio and video using RTMP background through. To put it simply, Tencent video cloud to act as a matchmaker between small program audio and video and WebRTC (more specifically, it should be a translator) role.

But anyone who has seen footage of state leaders talking to each other on the News network knows that this kind of translation can slow down communication. Small program audio and video and WebRTC communication, the introduction of a translator in the middle, is not the communication delay increased?

In fact, no, because the small program audio and video and WebRTC video coding standard in the general application scenario is the same, are H.264 standard, this is the audio format is different. This means that the translator has very little to do, and both sides can basically say what the other is saying, so the delay doesn’t increase much.

A successful handshake

The following figure shows Tencent Video cloud in the small program audio and video and WebRTC interoperability issues adopted by the program:

(1) Firstly, the small program of wechat terminal pushes the audio and video stream to the Tencent Cloud RTMP server through Tencent Video cloud SDK.

(2) Secondly, Tencent Cloud RTMP server will conduct preliminary conversion of audio and video data, and then pass it to the real-time audio and video background cluster of Tencent Video Cloud.

(3) Again, the real-time audio and video background will once again hand over the data to a module called WebrtC-proxy, where WebrtC-Proxy will translate the audio and video data from the applets into a “language” that WebRTC understands.

(4) Finally, on the PC Chrome browser, you can communicate with WebrTC-Proxy through the built-in WebRTC module of the browser, and then see the video image of the small program.

(5) If the above four processes are reversed, two-way video calls can be realized; Tencent video cloud as the star structure of the central node, multiple end (whether small programs or Chrome browser) access, it can form a multi-person audio and video solution.

Get through the room logic

It is not enough just to complete the handshake between applets and WebRTC, because behind a successful audio and video call, it is not only simple to transfer audio and video data from one end to the other, but also state synchronization and state coordination between members.

For example, in a multi-party video call, the call and connection process is involved. If one party hangs up, others will receive the notification of hanging up. At the same time, if a new participant joins, others will be notified accordingly. There are many components in WebRTC, such as RTCPeerConnection, that handle the logic of appeal. However, the WebRTC interface introduces a lot of new terms, and there is still a certain threshold for beginners to get started. To simplify the logic here, we introduce a concept called “room”.

A Room is a group of people who participate in a video call. For example, in A double call, two people A and B can be considered to be in the same room. For example, in A multi-person conversation, five people (A, B, C, D, E) can also be considered in the same room.

Given the concept of a room, we can describe the state in two simple actions: if a person joins a video call, it is understood that he or she has entered the room; If one person quits a video call, it is understood that he or she has left the room (LeaveRoom). And on the door of the room is always written: “Who is present in the room”.

With the room concept, we can functionally align the applet’s two simple <live-pusher> and <live-player> tags with WebRTC’s complex SET of apis, without even modifying the interface we defined in the first release:

(1) url interface no longer passes RTMP :// address, but passes room:// address. Room :// For details on how to use the protocol, please refer to our documentation DOC.

<live-pusher> <live pusher> <live pusher> <live pusher> <live pusher> <live pusher> <live pusher> <live pusher> <live pusher> <live pusher> <live pusher> During a video call, the comings and goings of members in and out of the room are also notified to your applet code by this event.

(3) Each item in the ROOM_USERLIST is a binary (if it’s a 1V1 video call, there’s only one person in the ROOM_USERLIST) : userID and playURL. Userid is the user, playURL is the user’s remote screen play address. All you need to do is use the <live-player> tag to play the images and sounds of these remote screens.

(4) At the end of WebRTC, you can refer to our WebRTC API, which is more suitable for beginners compared with the native API of WebRTC.

How to access quickly?

If you want to get through webrTC and small program audio and video communication within a day, THEN I recommend you not to start from scratch, because it will cost you too much time to step on the pit and bugfix. I recommend you to directly use < WebrtC-room > which is packaged by us. This solution can help you complete fast access. And can meet certain customization requirements.

In addition, don’t forget wechat => discover => small program => Tencent Cloud video cloud, experience Tencent cloud official Demo WebRTC interoperability effect oh.

Question and answer

What audio and video solutions can applets achieve?

reading

Small program into the pit factory YouWen guide | goose

Wechat small program: short-term can not be overestimated, long-term can not be underestimated

Typical application scenario analysis of small program audio and video

Has been authorized by the author tencent cloud + community release, the original link: https://cloud.tencent.com/developer/article/1115258?fromSource=waitui