MDN

What is WebRTC

WebRTC (Web real-time Communications) is a Web real-time communication technology. (Most of the known live broadcast software, or teleconferencing and video calling software, all use a specific client to do the streaming of video data. WebRTC establishes peer-to-peer (P2P) connections between browsers to transfer video and/or audio streams or any other data. WebRTC includes standards that make it possible to create peer-to-peer data sharing and teleconferencing without having to install any plug-ins or third-party software. Open source on June 1, 2011, and supported by Google, Mozilla, and Opera, it has been incorporated into the W3C recommendation of the World Wide Web Consortium. It provides real-time communication (RTC) functionality for browsers and mobile applications through a simple API.

The main application areas of WebRTC are as follows.

The principle behind the WebRTC project is that apis are open source, free, standardized, browser-built, and more efficient than existing technologies. WebRTC, though titled “Web,” is not limited to the terminal operating environment of traditional Internet applications or browsers. In fact, no matter whether the terminal running environment is a browser, desktop application, mobile device (Android or iOS) or IoT device, as long as the IP connection is reachable and complies with WebRTC specifications, it can communicate with each other. This releases the real-time communication capability of a large number of intelligent terminals (or apps running on intelligent terminals) and opens up the imagination of many application scenarios requiring high real-time interaction. Audio and video conferencing, online education, camera, music player, remote desktop sharing, recording, instant messaging, P2P network acceleration, file transfer tools, games, real-time face recognition are all suitable applications.

Overall WebRTC architecture

Web application: Web developers can develop real-time communication applications based on video and audio based on Web API, such as video conferencing, distance education, video call, video live broadcasting, live games, remote collaboration, interactive games, real-time face recognition, etc

  1. Web API: Web API is the WebRTC standard API (JavaScript) for third-party developers. The commonly used API is shown below
MediaStream: media data streams, such as audio and video streams. RTCPeerConnection: This is an important class that provides an invocation interface to the application layer. RTCDataChannel: transmits non-audio and video data, such as text and picturesCopy the code
  1. The C++ API layer is written in C++ language, which makes it easy for browser manufacturers to implement WebRTC standard Web API, and process digital signal process abstractly. For example, RTCPeerConnection API is the core of point-to-point connection between each browser, and RTCPeerConnection is a WebRTC component that handles stable and efficient communication of point-to-point streaming data.

  2. Session Management is an abstract Session layer that provides Session establishment and Management functions. This layer protocol is left to the application developer to customize the implementation. For Web applications, you are advised to use the WebSocket technology to manage signaling sessions. Signaling is used to forward media and network information of both sides of a Session. It’s also a layer that back-end developers need to focus on

  3. Transport is the transmission layer of WebRTC, involving audio and video data sending, receiving, network drilling and other content. It can establish call connections between different types of networks through STUN and ICE components

  4. VideoEngine is a WebRTC video processing engine, including a series of overall framework of video processing, from camera video collection to video information network transmission to video display, is a complete process solution

    • VP8 is the video image codec and the default WebRTC video engine codec. VP8 is suitable for real-time communication applications because it is primarily a codec designed for low latency. The VPx codec is open source after Google acquired ON2 and is now part of the WebM project, one of the HTML5 standards Google is pushing.
    • Video Jitter Buffer The Video Jitter Buffer reduces the adverse effects caused by Video Jitter and Video packet loss.
    • Image Enhancements (Image quality Enhancement module) Process the video images captured by the webcam, including brightness detection, color enhancement, and noise reduction, to improve the video quality
  5. VoiceEngine (Audio engine) is a framework that contains a series of audio multimedia processing, including the entire solution from audio acquisition to network transmission. VoiceEngine is one of WebRTC’s most valuable technologies. It is open source after Google acquired GIPS, and currently leads the industry in VoIP technology

    • ISAC is a broadband and ultra-broadband audio codec for VoIP and audio streaming and is the default codec for the WebRTC audio engine
    • ILBC is a narrowband voice codec for VoIP audio streams
    • NetEQ For Voice is a speech signal processing component For audio software, which can effectively deal with the impact of network jitter and speech packet loss on speech quality
    • Acoustic Echo Canceler (AEC) is a software-based signal processing element that removes echoes picked up by the microphone in real time.
    • Noise Reduction (NR) is also a software-based signal processing element used to eliminate certain types of background Noise such as hiss, fan Noise, etc., associated with VoIP

WebRTC Call mechanism

MDNWebRTC

The call can be roughly divided into three steps (assume that the two parties are Alice and Bob. To establish a conversation, the main steps are as follows)

One: Media negotiation Alice and Bob negotiate media through the signaling server, such as the audio and video coding format used by both parties. The media data exchanged between the two parties is described by the Session Description Protocol (SDP)

Two: network negotiation Alice and Bob obtain their own network information, such as IP address and port, through the STUN server. It is then forwarded by signaling server to exchange various network information with each other. In this way, the two parties know the IP address and port number of each other. This process involves NAT and ICE protocols.

If Alice and Bob do not establish a direct connection, the TURN transfer server forwards audio and video data and finally completes the audio and video call

1. Media negotiation

Media negotiation means that both parties must tell each other what media format to use before establishing the connection. Only by knowing the media format supported by the other party can the subsequent correct codec be ensured. Alice terminal can support VP8 and H264 encoding formats. And Bob supports VP9 and H264 to encode and decode video through media negotiation by taking their intersection H264.

The Protocol that describes the content of a media connection is called Session Description Protocol (SDP) content such as resolution, format, encoding, encryption algorithm, etc. So both ends can understand each other’s data during data transfer. SDP is not really a protocol, but a data format that describes the metadata of the connections that share media between devices. So where does the SDP information come from? In general, before establishing a connection, the RTCPeerConnection API is used to specify what data (Audio, Video, DataChannel) to be transferred and CreateOffer(), The CreateAnswer() method creates SDP information

The exchange of SDP information requires the signaling server to exchange THE SDP information of both parties. Generally, Socket connections are created for interactive processing. You can use Node.js, Golang, or any other technology, as long as you can exchange SDP data from both parties.

Post an SDP data – here it is

v=0
o=- 7524998691693353763 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE 0 1
a=msid-semantic: WMS kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:hwkT
a=ice-pwd:tXV1yDOgQpS9bBHqY5w+/oGf
a=ice-options:trickle
a=fingerprint:sha-256 54:9D:F1:8C:46:89:61:24:FC:B1:5C:F6:6E:BF:18:AF:22:CD:A0:37:37:64:37:61:D6:FF:4F:0D:C2:70:7B:A4
a=setup:actpass
a=mid:0
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ 6426930d-bb60-4633-b8b5-bb91d19d8430
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:5150036 cname:+RCf3A8Ya1BflCDM
a=ssrc:5150036 msid:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ 6426930d-bb60-4633-b8b5-bb91d19d8430
a=ssrc:5150036 mslabel:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ
a=ssrc:5150036 label:6426930d-bb60-4633-b8b5-bb91d19d8430
m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 121 127 120 125 107 108 109 124 119 123 118 114 115 116
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:hwkT
a=ice-pwd:tXV1yDOgQpS9bBHqY5w+/oGf
a=ice-options:trickle
a=fingerprint:sha-256 54:9D:F1:8C:46:89:61:24:FC:B1:5C:F6:6E:BF:18:AF:22:CD:A0:37:37:64:37:61:D6:FF:4F:0D:C2:70:7B:A4
a=setup:actpass
a=mid:1
a=extmap:14 urn:ietf:params:rtp-hdrext:toffset
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:13 urn:3gpp:video-orientation
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:11 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/color-space
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ 1cfb3b88-4ed0-4267-9c1e-c861e2a323cb
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 profile-id=0
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
a=rtpmap:100 VP9/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 profile-id=2
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:121 rtx/90000
a=fmtp:121 apt=102
a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:120 rtx/90000
a=fmtp:120 apt=127
a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125
a=rtpmap:108 H264/90000
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 H264/90000
a=rtcp-fb:124 goog-remb
a=rtcp-fb:124 transport-cc
a=rtcp-fb:124 ccm fir
a=rtcp-fb:124 nack
a=rtcp-fb:124 nack pli
a=fmtp:124 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=4d001f
a=rtpmap:119 rtx/90000
a=fmtp:119 apt=124
a=rtpmap:123 H264/90000
a=rtcp-fb:123 goog-remb
a=rtcp-fb:123 transport-cc
a=rtcp-fb:123 ccm fir
a=rtcp-fb:123 nack
a=rtcp-fb:123 nack pli
a=fmtp:123 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=64001f
a=rtpmap:118 rtx/90000
a=fmtp:118 apt=123
a=rtpmap:114 red/90000
a=rtpmap:115 rtx/90000
a=fmtp:115 apt=114
a=rtpmap:116 ulpfec/90000
a=ssrc-group:FID 3517908871 1250619161
a=ssrc:3517908871 cname:+RCf3A8Ya1BflCDM
a=ssrc:3517908871 msid:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ 1cfb3b88-4ed0-4267-9c1e-c861e2a323cb
a=ssrc:3517908871 mslabel:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ
a=ssrc:3517908871 label:1cfb3b88-4ed0-4267-9c1e-c861e2a323cb
a=ssrc:1250619161 cname:+RCf3A8Ya1BflCDM
a=ssrc:1250619161 msid:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ 1cfb3b88-4ed0-4267-9c1e-c861e2a323cb
a=ssrc:1250619161 mslabel:kFj1r3NdWzKanYt530Rbg0QQk8DbMwv2eXuJ
a=ssrc:1250619161 label:1cfb3b88-4ed0-4267-9c1e-c861e2a323cb
Copy the code

2. Network negotiation

Media negotiation requires communication parties to understand each other’s network conditions. So it’s possible to find a communication link. The following two processes are required

  • 1. Obtain the mapping of external IP addresses
  • 2. Exchange “network information” through signaling server

The ideal network situation is that each browser’s computer IP is a public IP, so you can connect directly with the point to point, but the ideal is very rich and the reality is very thin. The reality is that all of our computers are on a local area network with a firewall, Network Address Translation (NAT) is needed to solve the above problems in the use of WebRTC, we need to use the concepts of NAT, STUN and TURN, which are introduced below.

NAT

NAT Simply put, NAT is a technology developed to address the lack of IP addresses in IPv4. For example, if we are under a router, the WAN port of the router has a public IP address, and all devices connected to the LAN port of the router are assigned a private IP address, usually 192.168.1.1 or 192.168.1.2. If there are n devices, they may be assigned to 192.168.1.n. And the IP address was obviously just a network IP address, so that a router’s public network address corresponding to the n network address, that a small number of public IP addresses are used to represent the more private IP address of the way, will help slow the drying up of IP address space, NAT technology can protect the security of network addresses, so it will cause a problem, When we use P2P connection, NAT will block the access of external network address, then we have to use NAT penetration technology

With the help of a public IP server, both Alice and Bob send packets to the public IP PORT, so that the public IP server can get the IP/PORT of Alice and Bob. Moreover, since Alice and Bob actively send packets to the public IP server, Therefore, the public network server can penetrate nat-Alice and NAT-Bob and send packets to Alice and Bob. Therefore, as long as the public IP address sends Bob’s IP/PORT to Alice and Alice’s IP/PORT to Bob, the next time Alice and Bob send messages to each other, they will not be blocked by NAT. WebRTC’s firewall penetration technology is based on the above ideas. ICE framework is adopted in WebRTC to ensure that RTCPeerConnection can achieve NAT penetration.

ICE

Interactive Connectivity Establishment (ICE) is a protocol framework that allows your browser to establish a connection with a peer browser. ICE accomplishes this by using several technologies

STUN (Session Traversal Utilities for NAT)

STUN is a network protocol that simply uses UDP to penetrate NAT. It allows clients (computers on the LAN) behind NAT (or multiple NAts) to find out their public address, which type of NAT they are behind, and which Internet port the NAT binds to a local port. This information is used to create A UDP communication between two hosts that are both behind the NAT router, But the STUN server obtains a public IP address and does not necessarily establish a connection. This is because different NAT types handle incoming UDP packets differently, and three of the four main types can be perforated using STUN: full coned NAT, restricted coned NAT, and port restricted coned NAT. However, Symmetric NAT (also known as bidirectional NAT), which is commonly used on large corporate networks, cannot be used. These routers use NAT to deploy what is called “Symmetric NAT restriction”. That is, the router can only accept connections established by the previously connected nodes.

  • Full Cone NAT:

Full cone NAT: All requests sent from the same Intranet IP address and port number are mapped to the same Internet IP address and port number, and any Internet host can send packets to the Intranet host using the mapped Internet IP address and port number.

  • Restricted Cone NAT:

Restrict cone NAT, which also maps all requests from the same Intranet IP and port number to the same external IP and port number. Unlike a full cone, an extranet host can only send packets to an Intranet host that has previously sent packets to it.

  • Port Restricted Cone NAT:

Port conical NAT, similar to conical NAT, except that it includes port numbers. That is, if an extranet host with IP address X and port P wants to send packets to an Intranet host, the Intranet host must have sent packets to this IP address X and port P before.

  • Symmetric NAT:

Symmetric NAT: All requests sent from the same Intranet IP address and port number to a specific destination IP address and port number are mapped to the same IP address and port number. If the same host uses the same source address and port number to send packets to different destinations, NAT will use different mappings. In addition, only the extranet host that receives data can send packets to the Intranet host

TURN (Traversal Using Relays around NAT (TURN) 

TURN refers to using the trunk to penetrate NAT. The trunk function is added. If a terminal may fail to directly communicate with other terminals after NAT is performed, the server on the public network must be used as a trunk to forward the incoming data. The protocol used for this forwarding is TURN

STUN server and TURN server we use the coturn open source project to build, the address is github.com/coturn/cotu… . It can also be set up using a server developed with Golang technology at github.com/pion/turn

Signaling server

Signaling server not only exchanges SDP and Candidate, but also has other functions, such as room management, user list, user entry, user exit, and other IM functions

3. Connection establishment

1) The two parties exchange SDP data through a third-party server. 2) The connected parties obtain their own NAT structure, subnet IP address, public IP address, and port (Candidate information) from the STUN server through STUN protocol. 3) The two connecting parties exchange candidates through a third-party server. If the two connecting parties are under the same NAT, they can establish a connection only through the Intranet Candidate. If they are under different NAts, they need to communicate through public candidates identified by the STUN server. 4) If the public network Candidate discovered only through STUN server still cannot be connected, it is necessary to seek the forwarding service provided by TURN server, and then share the forwarding Candidate with the other party. 5) The connected parties send packets to the destination IP port, and establish an encrypted long connection based on the key involved in SDP data and the content expected to be transmitted.

The marked scene is that Alice initiates a chat request to Bob. 1. Alice first creates a PeerConnection object, and then opens the local audio and video device. 2. Alice calls CreateOffer method of PeerConnection to create an SDP object for offer, in which the parameters related to the current audio and video are stored. 3. Alice saves the SDP object through SetLocalDescription method of PeerConnection and sends it to Bob through signaling server. 4. 5. Bob calls the CreateAnswer method of PeerConnection to create a reply SDP object. 6. Bob needs to send the created reply SDP object to Alice through the signaling server. 7. 8. In the offer/ Answer process of SDP information, Alice and Bob have created corresponding audio L and video according to SDP information. Candidate data is obtained through NAT. Candidate data contains each other’s IP address information (local IP address, public IP address) and port information. 9. PeerConnection will send notification to Alice through OnIceCandidate interface, and Alice will send the Candidate information received to Bob through signaling server. Bob is saved through the PeerConnection’s AddIceCandidate method. 11. In this way, Alice and Bob have established a P2P channel for audio and video transmission. After receiving the audio and video stream sent by Alice, Bob will return a MediaStream object identifying the audio and video stream at Alice’s end through the OnAddStream callback interface of PeerConnection. A and B receive each other’s media streams and play them