How do you make a live video call between two people? The first thing you might think of for live streaming is flow -> push flow -> pull flow. However, if this process is put on the front end to achieve, it can only be seen on the rooftop. With the advent of WebRTC, however, the situation has been reversed.

With WebRTC, the front end can not pay attention to the process of “production flow -> push flow -> pull flow”, and it is easy to realize live or even real-time audio and video calls.

What is WebRTC? What is the underlying principle? How does it work? Let the barbecue brother to tell you!

What is WebRTC?

WebRTC full name Web Real-time Communication, Web instant messaging technology. It is a real-time communication solution initiated by Google. The reason why it is called a scheme, rather than a protocol, is that it covers audio and video collection, communication establishment, information transmission, audio and video display and other complete set of implementation solutions. The initiative makes it possible to quickly implement an audio and video communication application.

If you are a Web developer, the WebRTC API provided by the browser can easily achieve the collection and playback of audio and video, to achieve the establishment of end-to-end communication channels, and through the establishment of channels to achieve audio and video data sharing.

Although WebRTC looks like a browser-only thing, thanks to Google’s open source spirit, it can be fully platformed by compiling C++ code. So, if you want to remotely control a Windows PC via the web, you can let your C++ kids take a wave of WebRTC, WebRTC also supports real-time desktop capture oh!

How does WebRTC enable end-to-end audio and video sharing?

Traditional resource sharing is more through a transfer server exchange. Upload the desired resource to a fixed public network server in advance, and then access the resource through the IP address. The advantage of this form of good reliability is very strong, because the resource server is fixed, will not be affected by the transmission side and user’s network, very flexible and reliable!

However, the real-time performance is very poor. You can download files only after the other party uploads the files. Of course, this process can also achieve real-time, can set up a file stream transfer in the server, so that the file stream arriving at the server is immediately transferred to the pull party. So why not just omit the server?

P2P connection

P2P (peer to Perr) is a kind of network technology and network topology. Devices that have established P2P connections can achieve one-to-one information transmission and exchange without being forwarded by third-party services.

So how do you create a P2P connection? Here we first look at the real world Internet communication is like this!

The real online world

The network in the real world looks like this:

At present, the Internet protocol (IP) version of most of the deployed Internet is IPV4. However, IPV4 uses 32-bit binary addresses and can generate 4.3 billion IP addresses. If each user terminal accesses the Internet with an independent IP address, the 4.3 billion IP addresses will be insufficient.

Therefore, the current network structure is basically multi-device terminals through one or more layers of NAT proxy access to the Internet, that is, LAN.

What is NAT?

NAT: Network address translation (NAT) is a technology for connecting devices on a private network to a public network.

So how does NAT work?

  • If device A wants to send A request to server 172.20.98.44:7777 => 8.8.8.8:23456, the request first reaches NAT. NAT changes the source ADDRESS, source port, and corresponding verification code of the packet, and then sends the request to the server. In this case, A flow mapping relationship is formed.
172.20.98.44:7777 = > 6.6.6.6:12345 = > 8.8.8.8:23456Copy the code
  • After the server processes the request, the response data is returned to NAT. NAT modifies the destination IP address, destination port, and corresponding verification code according to the mapping relationship, and then sends the response data to device A.
8.8.8.8:23456 = > 6.6.6.6:12345 = > 172.20.98.44:7777Copy the code

This is how NAT enables Intranet devices to access the public network server.

A device on the Intranet can access a public network server through NAT. However, in a P2P connection, two devices may be on different Intranet. How do they establish a P2P connection?

NAT penetration technology

A P2P connection between two devices on two intranets is referred to as NAT traversal. Generally speaking, P2P can establish UDP connection and TCP connection, so this mechanism is also known as UDP hole or TCP hole. Because the transport layer protocol used by webRTC is UDP, I will mainly explain the UDP punching principle here.

  • First, add a messenger server. The messenger server is used to discover and record the ports mapped to Intranet devices in NAT and the public IP addresses in NAT. Such servers are also known as STUN servers. According to NAT features, when device A sends A request to the STUN server, A mapping relationship is formed: 172.20.98.44:7777 => 6.6.6.6:12345 => 3.3.3.3:34567. In this case, the STUN server responds 6.6.6.6:12345 to device A. Similarly, device B responds to the request to obtain the mapping.

  • Exchange mappings. Devices A and B need to exchange their NAT mappings to prepare for the connection. You need another protocol to swap.
  • After the exchange, device A sends A request packet to device B’s ADDRESS 8.8.8.8:23456. This request is not initiated by device B. For security purposes, after receiving the request packet, NAT-B does not forward the request packet to device B, but discard the request packet. All subsequent packets from address 8.8.8.8:23456 will be forwarded to device A. Similarly, device B sends the same request to 6.6.6.6:12345 to let NAT-B know that all subsequent requests from 6.6.6.6:12345 will be forwarded to device B.
  • After that, device A and device B can set up A P2P connection and send messages happily. Of course, this connection needs to be maintained by the heartbeat packet to prevent it from being closed.

Above is a complete UDP move process, after the completion of the hole, the device can achieve P2P connection across NAT.

WebRTC is an end-to-end connection based on UDP

Based on the previous cognition, WebRTC implementation of end-to-end audio and video transmission also needs to deal with UDP holing process. We don’t implement this UDP piercing process when we call WebRTC to create end-to-end connections (if we do, it would be contrary to the purpose of WebRTC, which is to make it easy to set up audio and video instant messaging). What’s more, we need to call the API according to the process to complete the creation of WebRTC.

Learn the WebRTC API usage from creating a WebRTC connection

Let’s take a look at the process of creating a connection:

Signaling server: a service used for information exchange. Its role in the WebRTC process is to act as an end exchange for the information needed to establish a connection. This signaling server implementation is outside the scope of the WebRTC scenario because it is closer to the business itself and there are more implementations for different business scenarios.

Although WebRTC compatibility is good (with the exception of the arrogant Internet Explorer), each browser has a difference in the top-level API, so we need a gasket to address this difference: webrtcHacks/ Adapter

Break it down:

  1. Example Create an RTC instance. WebRTC provides an API, RTCPeerConnection, for instantiating connections. Configuration is optional. Configuration is used to configure the STUN/TURN server information. Here, THE STUN/TURN server also needs to be built by ourselves. If there is a need, see the deployment of STUN and TURN servers.
let connection = new RTCPeerConnection([configuration]);
Copy the code

If configuration is not configured, the connection can only be made on the Intranet.

  1. Access the camera microphone device. The following code enables you to turn on the camera microphone (at which point the browser requests authorization) and get the media stream without relying on any plug-ins. Constraints refers to the media type of the request and its corresponding parameters. MediaStream is a mediaStream. You can assign mediaStream to the srcObject property of the video element for real-time media streaming. More information about mediaDevices can be found here
navigator.mediaDevices.getUserMedia(constraints)
    .then(function(mediaStream) { ... })
    .catch(function(error) { ... })
Copy the code

GetUserMedia Can obtain media streams only when the local environment is localhost and the trusted domain name (HTTPS) is used. Otherwise, an error occurs.

  1. Add the media stream to the RTCPeerConnection instance.
connection.addTrack(mediaStream.getVideoTracks()[0], stream);
Copy the code
  1. Exchange of SDP. WebRTC uses an offer-answer response mode to exchange offers. The initiator creates the Offer SDP through createOffer and sends the Offer SDP to the receiver through the signaling service. The receiver creates the Answer SDP through createAnswer and sends the Answer SDP to the initiator through the signaling server. Both parties need to set the SDP generated by themselves and from the peer to the connection by setLocalDescription and setRemoteDescription.

Session Description Protocol (SDP) is a text-based Session Description Protocol. It is not a transport Protocol. It relies on other transport protocols (such as SIP and HTTP) to exchange necessary media information and is used for media negotiation between two Session entities. The SDP contains media information, network information, security features, and transmission policies required for session establishment.

Note: If the audio and video streams are transmitted, you need to add the media streams to the channel by addTrack before generating the Offer SDP. The reason is that the stream information needs to be collected when generating the SDP. If you do not add it, the SDP will not survive.

// Add streaming information connection.addTrack(stream.getVideoTracks()[0], stream); SDP Connection.createOffer ().then((sessionDescription) => {console.log(" send Offer "); If (connection) {console.log(" set local description"); connection.setLocalDescription(sessionDescription); } sendMessage(sessionDescription, targetId); }) .catch(() => { console.log("offer create error"); }); / / recipient distal description set connection. SetRemoteDescription (new RTCSessionDescription (sessionDescription)); SDP Connection.createAnswer ().then((sessionDescription) => {console.log(" send Answer "); If (connection) {console.log(" set local description"); connection.setLocalDescription(sessionDescription); } sendMessage(sessionDescription, targetId); }).catch(() => {console.log(" failed to create answer "); }); / / sponsors to set the remote describe connection. SetRemoteDescription (new RTCSessionDescription (sessionDescription));Copy the code

5. Exchange information about candidates candidates. Both parties get their candidate information by listening to the ICecandiDate event of RTCPeerConnection, and then transmit it to the peer through signaling service. Candidate information Candidate exchange is an important step in establishing a webRTC connection. Once you have a candidate for the peer, you need to instantiate the RTCIceCandidate object and add it to the Rtc instance via the addIceCandidate method of the RTCPeerConnection.

Candidate Information The candidate contains NAT mapping information of the current device, including IP address, port, and protocol

The icecandiDate event is triggered after the setLocalDescription execution.

/ / monitored icecandidate trigger connection. AddEventListener (" icecandidate ", (event) = > {the if (event. Candidate) {the console. The log (" send candidate, "event. The candidate. The candidate). sendMessage( { type: "candidate", label: event.candidate.sdpMLineIndex, id: event.candidate.sdpMid, candidate: event.candidate.candidate, }, targetId ); } else { console.log("End of candidates."); }}); Const candidate = new RTCIceCandidate({sdpMLineIndex: message.label, candidate: message.candidate,}); connection.addIceCandidate(candidate).catch((error) => { console.log(error); });Copy the code

This is the API required to complete the whole WebRTC connection. In addition, there are many apis about connection listening, negotiation of audio and video coding in the process of creating a connection, creating a WebRTC connection in the way of establishing a data channel, and so on. You can see the WebRTC API if you are interested.

In actual combat

To understand some of the main APIS to play WebRTC, let’s complete an example – the beginning of the realization of real-time communication between two people, here we will achieve multiple simultaneous video communication. This is mainly an example of an intranet-based implementation, accessing services through an Intranet IP address.

The realization here is mainly divided into two parts, follow the idea of the separation of the front and back ends, one part is the realization of signaling services, one part is the realization of front-end interactive pages.

First, create a signaling service: Express + socke. IO

As mentioned earlier, the primary role of a signaling service is to relay information when creating a connection. For simplicity, node’s Express framework and socket. IO support two-way communication are used.

Creating an Https Service

Because if we need multiple people to call online, then we need to share our site. As mentioned earlier, only localhost or a viable address can access the browser media device. To avoid cross-protocol problems, you need to create an HTTPS service.

const app = require("express")(); const fs = require("fs"); / / read the certificate const key = fs. ReadFileSync ("/Users/XXX/Documents/study/HTTPS / 172.20.210.160. Key ", "utf8"); Const cert = fs. ReadFileSync ("/Users/XXX/Documents/study/HTTPS / 172.20.210.160 CRT ", "utf8"); const http = require("https").Server( { key, cert, }, app ); Http.listen (3005, function () {console.log("listening on *:3005"); });Copy the code

In fact, you can also create HTTPS by configuring nginx, but I don’t want to do nginx here, so I’ll do it in code.

Cross domain set

Because the page and signaling services are two different services, there will be cross-domain problems, and the entry file needs to include the following code:

const allowCors = function (req, res, next) {
    res.header("Access-Control-Allow-Origin", req.headers.origin);
    res.header("Access-Control-Allow-Methods", "GET,PUT,POST,DELETE,OPTIONS");
    res.header("Access-Control-Allow-Headers", "Content-Type");
    res.header("Access-Control-Allow-Credentials", "true");
    next();
};
app.use(allowCors);
Copy the code

Socket. IO information relay mechanism implementation.

Implementation is very simple, code to complete a mechanism, have to praise the open source world great.

const socketIo = require("socket.io"); Const IO = socketIo(httpInstance, {cors: {origin: "*", allowedHeaders: ["Content-Type"], methods: ["GET,PUT,POST,DELETE,OPTIONS"], }, }); IO. On ("connection", function (socket) {// listen to a single end on a connection, Socket. on("connect", () => {console.log(" connected "); socket.joinRoom("demo", () => { socket.broadcast.to("demo").emit("new", socket.id); }); }); On ("message", (message) => {if (message.target) {socket.to(message.target).emit("message", {originId: socket.id, data: message.data, }); } else { socket.broadcast.to('demo').emit("message", { originId: socket.id, data: message.data, }); }}); }); } module.exports = createSocketIo;Copy the code

The second step, front-end interface implementation

The main function of the front-end interface is to establish WebRTC connection with the end that joins the room through signaling service.

Here the user interface uses creat-React-app to quickly create projects.

The user interface consists of two parts: one part is the connection creation logic of WebRTC, the other part is the interaction logic of Socket. IO.

Encapsulate WebRTC connections and support multi-peer connections

// Import "webrtC-Adapter "; class ConnectWebrtc { protected connection: RTCPeerConnection | null; constructor() { this.connection = null; } // Create RTCPeerConnection instance and listen for icecandiDate, track event create(onAddStream: EventListenerOrEventListenerObject, onReomveStream: EventListenerOrEventListenerObject, onCandidate: (candidate: RTCIceCandidate) => void ) { this.connection = new RTCPeerConnection(undefined); this.connection.addEventListener("icecandidate", (event) => { if (event.candidate) { onCandidate(event.candidate); } else { console.log("End of candidates."); }}); this.connection.addEventListener("track", onAddStream); this.connection.addEventListener("removeTrack", onReomveStream); SDP createOffer(onSessionDescription: (sessionDescription:)) RTCSessionDescriptionInit ) => void ) { if (this.connection) { this.connection .createOffer() .then((sessionDescription)  => { if (this.connection) { this.connection.setLocalDescription(sessionDescription); onSessionDescription(sessionDescription); } }) .catch(() => { console.log("offer create error"); }); SDP createAnswer(onSessionDescription: (sessionDescription: RTCSessionDescriptionInit ) => void ) { if (this.connection) { this.connection .createAnswer() .then((sessionDescription) => { if (this.connection) { this.connection.setLocalDescription(sessionDescription); } onSessionDescription(sessionDescription); }).catch(() => {console.log(" failed to create answer "); }); }} / / set the remote describe setRemoteDescription (sessionDescription: RTCSessionDescriptionInit | undefined) {enclosing the connection? .setRemoteDescription( new RTCSessionDescription(sessionDescription) ); SetCandidate (message: any) {if (this.connection) {const candidate = new RTCIceCandidate({sdpMLineIndex: message.label, candidate: message.candidate, }); this.connection.addIceCandidate(candidate).catch((error) => { console.log(error); }); }} addTrack(stream: MediaStream) { if (this.connection) { this.connection.addTrack(stream.getVideoTracks()[0], stream); this.connection.addTrack(stream.getAudioTracks()[0], stream); }} / / deleted from the connection media stream removeTrack () {if (this. Connection) {this. Connection. RemoveTrack (this) connection) getSenders () [0]). } } } export default ConnectWebrtc;Copy the code

By encapsulating the key steps to create a connection, you can create multiple WebrTCs.

The page interaction component connects to socket. IO

import { useEffect, useRef, useState } from "react"; import { io, Socket } from 'socket.io-client'; import { server } from "./config"; import ConnectWebrtc from "./webrtc"; Const mediaStreamConstraints = {video: {width: 400, height: 400}, audio: true}; Const Room = () => {// localStream const localStream = useRef<MediaStream>(); Const localVideoRef = useRef<HTMLVideoElement>(null); // Const localVideoRef = useRef<HTMLVideoElement>(null); Const connectList = useRef<{[target: string]: any}>({}); Const [userList, setUserList] = useState<string[]>([]); //socket. IO instance let socket = useRef< socket >(); Const sendMessage = (data: any, targetId? : string | null) => { socket.current? Emit ('message', {target: targetId, data})} // emit('message', {target: targetId, data})} const handeStreamAdd = (originId: string) => (event: any) => { let video = document.getElementById(originId) as HTMLVideoElement; if (video) { video.srcObject = event.streams[0]; Const getConnection = (originId: string) => {let connection = connectList.current? .[originId]; if (! connection) { connection = new ConnectWebrtc(); connection.create(handeStreamAdd(originId), () => { }, (candidate: RTCIceCandidate) => { sendMessage( { type: "candidate", label: candidate.sdpMLineIndex, id: candidate.sdpMid, candidate: candidate.candidate, }, originId ); }); Connection.addtrack (localstream.current); // Add media streams to the connection preferentially. connectList.current[originId] = connection; } return connection; Const handleConnectIo = () => {socket.current = IO (server); Socket.current. on('connect', () => {console.log(' connected '); }); Socket.current. On ('message', function (message) {// Add peer if (! userlist.includes(message.originId)) { userlist.push(message.originId); setUserList([...userlist]) } let connection = getConnection(message.originId); / / when as the receiver, set up the remote, and create the answer SDP if (message. Data. Type = = = 'offer') {connection. SetRemoteDescription (message. Data); connection.createAnswer((sdp) => { sendMessage(sdp, originId); }); } // As the initiator, Received answer SDP is set to the remote describing else if (message. Data. Type = = = 'answer') {connection. SetRemoteDescription (message. Data); } else if (message.data.type === 'candidate') {connection.setCandidate(message.data); }}); On ('new', (newId) => {const connection = getConnection(newId); connection.createOffer((sdp) => { sendMessage(sdp, originId); }); if (! userlist.includes(newId)) { userlist.push(newId); Const handleGetLocalStream = (callback: callback) const handleGetLocalStream = (callback: callback) () => void) => { navigator.mediaDevices.getUserMedia(mediaStreamConstraints) .then((mediaStream) => { localStream.current = mediaStream; if (localVideoRef.current) { localVideoRef.current.srcObject = mediaStream; } callback(); }).catch((error) => { console.log(error) }); } useEffect(() => {handleGetLocalStream(() => {handleConnectIo(); }); } []); return ( <div> <div style={{ marginTop: <p style={{marginTop: 3px; marginTop: 3px; marginTop: 3px; marginTop: 3px; marginTop: 3px; <p> {user => {return <video id={user} key={user} autoPlay playsInline></video>})} </div> </div> ) } export default Room;Copy the code

The above component code clearly shows the signaling service’s important role in the entire WebRTC connection process, which can be said to lead the entire process. There are still a lot of details that haven’t been worked out yet, but that’s not what I’m looking for here. More importantly, it shows the general implementation process of establishing multi-party real-time communication connection of WebRTC.

The result is similar to what the head looks like, which I won’t show you here.

extension

Let’s use a graph to show the relationship between users in the above practice:

Intricate relationship, if add hundreds of thousands of connections, it is not an explosion.

Therefore, P2P connection is a decentralized connection mode, it is suitable for small amount of connection, for a large number of connection needs, finally must return to the central idea, by the central service to achieve the decentralized ability (such as CDN).

At first glance, this doesn’t look like a return to traditional live audio and video. So why did I choose WebRTC?

From the perspective of technology selection, WebRTC has better compatibility and scalability than other audio and video protocols (because it is open source); In addition, WebRTC connection latency is lower, which is a very good solution for the live broadcasting industry, which requires more and more real-time interaction.

agreement Time delay The data segment HTML 5 live Application scenarios Front-end related plug-ins
HLS 10~30s slice support H5 live, game live hls.js
RTMP 2s~5s Continuous flow Does not support Interactive live
HTTP-FLV 2S ~5s, better than RTMP Continuous flow support Interactive live flv.js
RTSP Generally do less than 500ms Continuous flow Does not support Interactive live
webRTC Within 1 s Continuous flow Support (Compatibility) Interactive entertainment native

WebRTC solutions are already being used in industries with high real-time interaction requirements. The future is bright, so to speak.

conclusion

This paper briefly analyzes WebRTC from the perspective of principle to actual combat. WebRTC is a very complex scheme, which is certainly not what I can make sense of in this article. If there is something wrong, please correct me.

The last

Think barbecue brother wrote a good, like to follow it; If there is any shortage, please feel free to correct me.