WebRTC provides a set of standard apis that enable Web applications to directly provide real-time audio and video communication capabilities. Most browsers and operating systems support WebRTC, and you can directly initiate a real-time audio and video call in the browser. This article completes a 1V1 web real-time audio and video call from the perspective of WebRTC beginners.

To complete audio and video calls, you need to understand four modules: audio and video collection, STUN/TURN server, signaling server, and end-to-end P2P connection. The API of WebRTC is used to complete audio and video collection, and 1V1 call can be realized with signaling server and RTCPeerConnection method of WebRTC. The simple process is shown as follows:

What they do and the core API are explained in turn.

Audio and video capture

WebRTC uses getUserMedia to obtain MediaStream, a MediaStream object corresponding to the camera and microphone. Media streams can be transmitted through WebRTC and shared among multiple peers. Assign the stream object to srcObject of the video element to play audio and video locally

attribute meaning
width Video width
height Height of video
aspectRatio The proportion
frameRate Frame rate
facingMode Mirror mode
resizeMode The size of the pattern
API: the navigator. MediaDevices. GetUserMedia parameters: constraints return: promise, get MediaStream object method calls success. const localVideo = document.querySelector("video"); function gotLocalMediaStream(mediaStream) { localVideo.srcObject = mediaStream; } navigator.mediaDevices .getUserMedia({ video: { width: 640, height: 480, frameRate:15, facingMode: 'enviroment', // Set to rear camera deviceId: deviceId? {exact:deviceId} : undefined }, audio: false }) .then(gotLocalMediaStream) .catch((error) => console.log("navigator.getUserMedia error: ", error));Copy the code

Connection management

You know how to capture local audio and video, and then you know how to connect to the other end to transmit audio and video data.

RTCPeerConnection is a unified interface for WebRTC to implement network connection, media management, and data management. Establishing P2P connections requires the use of several important classes in RTCPeerConnection: SDP, ICE, STUN/TURN.

  1. SessionDescription RTCSessionDescription (SDP)

SDP is the capability of each side, including audio codec type, transport protocol, etc. This information is necessary to establish a connection. Both parties know whether the video supports audio and what encoding method is available through the SDP.

For example, if I’m doing video transmission, my code is H264 and they can only decode H265, so they can’t communicate.

The SDP description consists of session level and media level. For details about the SDP description, see RFC4566. The SDP description marked with an asterisk (*) is optional. Common contents are as follows:

Session Description (Session level description) V = (Protocol Version) O = (Originator and Session Identifier) s= (Session Name) C =* (connection information -- not required if included in all media) One or more Time descriptions ("t=" and "r=" lines; see below) a=* (zero or more session attribute lines) Zero or more Media descriptions Time description t= (time the Session is active) Media description, if present m= (media name and transport address) c=* (connection information -- optional if included at session level) a=* (zero or more media attribute lines)Copy the code

When SDP is parsed, each SDP Line starts with key=… RFC4566: key = a; key = A

a=<attribute> 
a=<attribute>:<value>
Copy the code

Sometimes it’s not a colon (:) but

:

. In fact, a value can also have a colon inside it, for example:

a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE 
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset 
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time 
a=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS
Copy the code

Take a look at a concrete example:

alert(pc.remoteDescription.sdp); V =0 o= Alice 2890844526 2890844526 IN IP4 host.anywhere.com s= c=IN IP4 host.anywhere.com t=0 0 M = Audio 49170 RTP/AVP 0 a= FMTP :111 minptime=10; Useinbandfec =1 // Description of format parameters A = rTPmap :0 PCMU/8000 // Description of RTP data... M =video 51372 RTP/AVP 31 a= rTPmap :31 h261/90000... m=video 53000 RTP/AVP 32 a=rtpmap:32 MPV/90000Copy the code
  1. ICE candidate RTCIceCandidate

The most convenient method of WebRTC point-to-point connection is direct IP connection between the two parties, but in practical applications, the two parties will be separated by the NAT device to obtain the address.

WebRTC shields developers from complicated technical details by using the ICE framework to determine the best way to establish a network connection between the two ends.

(NAT and ICE framework is a black box for developers using WebRTC, this section is put at the end as a supplement to optimize the reading experience)

Developers need to know:

  1. The principle of

The two nodes exchange ICE candidates to negotiate how they want to connect, and once both sides agree on a compatible candidate, that candidate’s SDP is used to create and open a connection through which the media stream starts running.

  1. Two API

Onicecandidate: Triggered when the local agent creates the SDP Offer and calls setLocalDescription(Offer) to pass candidate information to the remote end through the signaling server in eventHandler.

AddIceCandidate: called after receiving candidate information from the signaling server to add the ICE agent to the native machine.

API: pc.onicecandidate = eventHandler pc.onicecandidate = function(event) { if (event.candidate) { // Send the candidate to The remote peer} else {// All ICE candidates have been sent}} API: pc.addIceCandidate pc.addIceCandidate(candidate).then(_=>{ // Do stuff when the candidate is successfully passed to the ICE agent }).catch(e=>{ console.log("Error: Failure during addIceCandidate()"); });Copy the code

Signaling server

The SDP and ICE information of the WebRTC depends on the signaling server for message transmission and exchange and the ESTABLISHMENT of P2P connections before audio and video calls and text messages can be transmitted. Without a signaling server, WebRTC cannot communicate.

Signaling servers are typically built using socket. IO’s real-time communication capabilities. Socket. IO is cross-platform, cross-terminal, cross-language, so that we can realize each end of signaling on each end to connect with our server.

This diagram illustrates the role of the signaling server in the entire call process.

How to set up socket. IO signaling server

var express = require("express"); var app = express(); var http = require("http"); const { Server } = require("socket.io"); const httpServer = http.createServer(app); const io = new Server(httpServer); io.on("connection", (socket) => { console.log("a user connected"); socket.on("message", (room, data) => { logger.debug("message, room: " + room + ", data, type:" + data.type); socket.to(room).emit("message", room, data); }) socket.on("join", (room) => { socket.join(room); })});Copy the code

P2P connection between end and end

  1. The connection process

The process of establishing network connection between A and B is shown as follows:

  • User A initiates A WebRTC call to user B
  • Create a peerConnection object, specifying the address of Turn/Stun in the argument
var pcConfig = {
  iceServers: [
    {
      urls: "turn:stun.al.learningrtc.cn:3478",
      credential: "mypasswd",
      username: "garrylea",
    },
    {
      urls:[
        "stun:stun.example.com",
        "stun:stun-1.example.com"
      ]
    }
  ],
};

pc = new RTCPeerConnection(pcConfig);
Copy the code
  • A callcreateOfferMethod to create a local session description (SDP Offer) that contains all of the information about the codecs and options that have been attached to the WebRTC session supported by the browserMediaStreamTrackInformation, andICEA proxy that is sent over a signaling channel to a potential remote endpoint to request a connection or update the configuration of an existing connection.
  • A callsetLocalDescriptionMethod sets the proposal to a local session description and passes it to the ICE layer. The session description is then sent to B through the signaling server
API: pc.createOffer Parameter: None Returned: SDP Offer API: pc. setLocalDescription parameter: Offer Returned: Promise<null> function sendMessage(roomid, data) { if (! socket) { console.log("socket is null"); } socket.emit("message", roomid, data); } const offer = await pc.createOffer() await pc.setLocalDescription(offer).catch(handleOfferError); Message. log(' local SDP of the transmission initiator '); sendMessage(roomid, offer);Copy the code
  • After pc.setLocalDescription(offer) is created on the A end, an icecandidate event is sent toRTCPeerConnection.onicecandidateThe event will be triggered. End B receives a new ICE candidate address message from the remote page through signalingRTCPeerConnection.addIceCandidate()To add an ICE agent.
Onicecandidate = (event) => {if (! event.candidate) return; sendMessage(roomid, { type: "candidate", label: event.candidate.sdpMLineIndex, id: event.candidate.sdpMid, candidate: event.candidate.candidate, }); }; Onmessage = e => {if (e.data.hasownProperty ("type") &&e.data.type === "candidate") {var candidate = new RTCIceCandidate({ sdpMLineIndex: data.label, candidate: data.candidate, }); pc.addIceCandidate(candidate) .then(() => { console.log("Successed to add ice candidate"); }) .catch((err) => { console.error(err); }); }}Copy the code
  • User A obtains the local media stream as the caller and invokesaddtrackMethod to add an audio and video streamRTCPeerConnectionObject to the other end, the other end is triggered when joiningontrackEvents.
Return: array of media track objects const PC = new RTCPeerConnection(); stream.getTracks().forEach((track) => { pc.addTrack(track, stream); }); const remoteVideo = document.querySelector("#remote-video"); Pc.ontrack = (e) => {if (e & &e.stams) {message.log(" Received audio/video stream data from the other party... ); remoteVideo.srcObject = e.streams[0]; }};Copy the code
  • User B, acting as the caller, receives the session information from user A and invokes the callsetRemoteDescriptionThe addTrack method passes the proposal to the ICE layer and joins the RTCPeerConnction with the addTrack method
  • B calls the createAnswer method to create the reply, calledsetLocalDeacriptionMethod replies are set as local sessions and passed to the ICE layer.
Socket.onmessage = e => {message.log(" received SDP from sender "); await pc.setRemoteDescription(new RTCSessionDescription(e.data)); Message. log(" Create receiver (reply) SDP"); const answer = await pc.createAnswer(); Message. log(' transmit receiver (reply) SDP '); sendMessage(roomid, answer); await pc.setLocalDescription(answer); }Copy the code
  • AB has its own SDP and the other party’s SDP, and reached an agreement on media exchange. The ICE collected completes connectivity testing and establishes the most connected mode. P2P connection is established and the audio and video media streams of the other party are obtained.
Pc.ontrack = (e) => {if (e & &e.stams) {message.log(" Received audio/video stream data from the other party... ); remoteVideo.srcObject = e.streams[0]; }};Copy the code
  1. Two-way data channel connection

RTCDataChannelton Establishes point-to-point P2P connections through the RTCPeerConnection API, without the need for an intermediary server and with lower latency.

One end establishes a Datachannel, and the other end obtains a Datachannel object through onDatachannel

API: pc.createDataChannel Parameter: label Channel name Options? RTCDataChannel function ReceivemSG (e) {var MSG = e.da; if (msg) { message.log("-> " + msg + "\r\n"); } else { console.error("received msg is null"); } } const dc = pc.createDataChannel("chat"); dc.onmessage = receivemsg; dc.onopen = function () { console.log("datachannel open"); }; dc.onclose = function () { console.log("datachannel close"); }; pc.ondatachannel = e => { if(! dc){ dc = e.channel; dc.onmessage = receivemsg; dc.onopen = dataChannelStateChange; dc.opclose = dataChannelStateChange; }}; // This method is called back when the docking creates a data channel.Copy the code

NAT and ICE framework

As mentioned above, ICE integrates various NAT traversal technologies, such as STUN and TURN, to realize NAT traversal and discover P2P transmission path mechanism between hosts. Next, a brief introduction to what NAT, STUN and TURN are.

  1. Network Address Translation (NAT)

NAT is often deployed at the exit of an organization’s network. The network is divided into two parts: the private network and the public network. The NAT gateway is set at the egress of the route from the private network to the public network. Bidirectional data between the private network and the public network must pass through the NAT gateway. A large number of devices in an organization can share a public IP address using NAT, eliminating the IPv4 address shortage.

As shown in the following figure, there are two organizations. The NAT of each organization allocates a public IP address, which is 1.2.3.4 and 1.2.3.5 respectively. Each Intranet device translates an Intranet address to a public address through NAT and then joins the Internet.

NAT can implement UDP in four ways: full cone, address limited cone, port limited cone, and symmetric cone.

  1. Session Traversal Utilities for NAT (STUN)

STUN allows a client that is behind a NAT (or multiple NAts) to find out its public address, which type of NAT it is behind, and the public end port that the NAT binds to a local port.

STUN is a C/S protocol. The client sends A STUN Request and the STUN service Response informs the IP address and port number assigned to the host by NAT. It is also a Request/Response protocol.

To let the Intranet host know its external IP address, you need to set up a STUN Server on the public network and send a Request to the server. The server will return its public IP address.

Below is a pair of STUN binding requests and responses that are captured. First the client sends a Binding Request to the STUN server at address 216.93.246.18.

The server returns a Binding Response to the public IP address:

  1. Traversal Using Relay NAT (TURN)

TURN is a data transfer protocol. The NAT or firewall can be penetrated through TCP or UDP. TURN is a Client/Server protocol. The NAT penetration method of TURN is similar to that of STUN. It obtains public IP addresses at the application layer to achieve NAT penetration

  1. ICE to collect

The two ends of ICE do not know the location and NAT type of the network they are on, so ICE can dynamically discover the optimal transmission path. ICE collects local addresses, STUN collects NAT extranet addresses, and TURN collects trunk addresses. Therefore, there are three candidate addresses:

The host type is the IP address and port of the local Intranet.

SRFLX indicates the IP address and port of the extranet after NAT mapping.

The RELAY type is the IP address and port of the relay server.

{ 
    IP: xxx.xxx.xxx.xxx, 
    port: number, 
    type: host/srflx/relay, 
    priority: number, 
    protocol: UDP/TCP, 
    usernameFragment: string 
    ...
 }
Copy the code

In the following figure, Alice and Bob collect three types of candidates through STUN and TURN servers.

After collecting candidates, ICE conducts connectivity detection to determine the best P2P transmission path between hosts.

The effect