From: villainhr
WebRTC: Web Real-time Communication. It addresses the inability of the Web to capture audio and video, and provides peer-to-peer (that is, between browsers) video interaction. In fact, when broken down, it consists of three parts:
-
MediaStream: Captures audio and video streams
-
RTCPeerConnection: Transmits audio and video streams (generally used in peer-to-peer scenarios)
-
RTCDataChannel: used to upload audio and video binary data (usually used for streaming upload)
But in general, the peer-to-peer scenario is not really used much. This is where WebRTC should be used more often than not, compared with the live-streaming business that exploded last year. So for Web broadcast, we usually need two ends:
-
Host end: Record and upload videos
-
Audience: Download and watch the video
Here, I will not talk about the audience end, the back of another article to write an introduction (because, this is in is too much). Here, we mainly talk about the use of WebRTC anchor side. To simplify, the application technology of the anchor terminal can be divided into recording video and uploading video. Keep these two goals in mind and we’ll use WebRTC to achieve them later.
WebRTC basic understanding
WebRTC is primarily developed by two organizations.
-
Web Real-Time Communications (WEBRTC) W3C Organization: Defines browser apis
-
Real-Time Communication in Web-Browsers (RTCWEB) IETF standard organization: Defines the protocols, data, and security functions required.
Of course, our primary goal is to ask what are the basic browser defined apis? And how to use it? Then, the later goal is to learn the relevant protocols and data formats within the period. Step by step, this is more suitable for our study.
WebRTC’s Audio and video processing is mainly handled by Audio/Vidoe Engineering. The processing process is as follows:
-
Audio: Capture by physical device. Then start noise reduction, echo cancellation, jitter/packet loss hiding, coding.
-
Video: Captured by physical devices. Then start image enhancement, synchronization, jitter/packet loss hiding, coding.
Finally, the mediaStream Object is exposed to the upper-level API for use. That is, mediaStream is the middle layer that connects the WebRTC API to the underlying physical flow. So, for a better understanding, here’s a brief introduction to mediaStream.
MediaStream
MS (MediaStream) exists as a secondary object. It carries the filtering of audio and video streams, obtaining recording permissions, and so on. MS consists of two parts: MediaStreamTrack and MediaStream.
-
MediaStreamTrack represents a single type of data stream. If you’ve ever used a video, you’re familiar with the word orbit. In layman’s terms, you can think of them as equivalent.
-
MediaStream is a complete audio and video stream. It can contain >=0 Mediastreamtracks. Its main function is to ensure that several tracks are playing simultaneously. For example, sound needs to be synchronized with a video screen.
We won’t go into too much depth here, but just talk about basic MediaStream objects. In general, we can get an object by instantiating an MS object.
// You also need to pass track or other stream as arguments. // let ms = new MediaStream();Copy the code
We can look at the object properties attached to ms:
-
Active [Boolean]: Indicates whether the CURRENT MS is active (that is, playable).
-
Id [String]: uniquely identifies the current MS. For example: “f61641ec – ee78-4317-9415-58 acac066a4d”
-
Onactive: This event is triggered when active is true
-
Onaddtrack: This event is triggered when a new track is added
-
Oninactive: This event is triggered when active is false
-
Onremovetrack: This event is triggered when a track is removed
There are other methods on the prototype chain, and I’ll pick out a few that are important.
-
Clone (): A clone of the current MS stream. This method is often used when there is an operation on the MS stream.
As mentioned earlier, MS can be used for other filtering purposes, so how does it do it? In MS, there is another important concept called Constraints. It is used to regulate whether the data currently collected meets the needs. Because when we capture video, different devices have different Settings. Commonly used are:
{" audio ": true, / / whether to capture audio" video ": {/ / video related to set the" width ": {" min" : "381", / / the minimum width of the current video "Max" : "640"}, "height" : {" min ", "200", / / minimum height "Max" : "480"}, "frameRate" : {" min ":" 28 ", / / minimum frame rate "Max" : "10"}}}Copy the code
How do I know which properties are supported for tuning on my device? Here, you can directly use the navigator. MediaDevices. GetSupportedConstraints () to obtain the relevant properties can be tuned. However, this is generally set up for video. Now that we know about MS, it’s time to really get into the WebRTC API. Let’s take a look at the basic WebRTC API.
The common WebRTC API is as follows, but for browser reasons, the corresponding prefix needs to be added:
W3C Standard Chrome Firefox
--------------------------------------------------------------
getUserMedia webkitGetUserMedia mozGetUserMedia
RTCPeerConnection webkitRTCPeerConnection RTCPeerConnection
RTCSessionDescription RTCSessionDescription RTCSessionDescription
RTCIceCandidate RTCIceCandidate RTCIceCandidateCopy the code
However, you can simply use the following methods to solve the problem. You can use adapter.js to make up for the inconvenience
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMediaCopy the code
Here, let’s do it step by step. If you want to interact with video, you should first capture audio and video.
Capture audio and video
To capture audio and video in WebRTC, you only need to use one API, getUserMedia(). The code is simple:
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia; Var constraints = {// Set audio: false, video: true}; var video = document.querySelector('video'); function successCallback(stream) { window.stream = stream; / / this is the above mentioned mediaStream instance if (window. URL) {video. SRC = window. URL. CreateObjectURL (stream); SRC} else {video.src = stream; } } function errorCallback(error) { console.log('navigator.getUserMedia error: ', error); } // This is the basic format of getUserMedia: navigator.getUserMedia(constraints, successCallback, errorCallback);Copy the code
For detailed demo, see WebRTC. If you use Promise, getUserMedia would say:
navigator.mediaDevices.getUserMedia(constraints).
then(successCallback).catch(errorCallback);Copy the code
The above comments have probably made the basics clear. One caveat is that when you capture a video, you must be aware of the parameters you need to capture.
Once you have your own video, how do you share it with others? (It can be understood as the way of live broadcasting) In WebRTC, RTCPeerConnection is provided to help us establish a connection quickly. However, this is only the middle step in establishing peer-to-peer. There are some complicated procedures and additional protocols involved, so let’s go through them step by step.
Basic content of WebRTC
WebRTC uses UDP to transmit video packets. This has the advantage of low latency and not being overly concerned with the order of packages. However, UDP is just a transport layer protocol. WebRTC still has a lot of work to do
-
Traverses the NATs layer to find the specified peer
-
Both parties negotiate basic information so that both parties can play the video normally
-
During transmission, information security needs to be ensured
The overall architecture is as follows:
These protocols, such as ICE/STUN/TURN, will be covered later. So let’s take a look at how the two negotiate information, and usually this is what we call signaling.
Signaling task
Signaling is actually a negotiation process. Because both ends do not enter WebRTC video communication between, need to know some basic information.
-
Instructions to open/close a connection
-
Video information, such as decoder, decoder Settings, bandwidth, and video format, etc.
-
Critical data, equivalent to the master key in HTTPS, used to ensure a secure connection.
-
Gateway information, such as IP addresses and ports of both parties
However, the Signaling process is not written in writing, meaning that it doesn’t matter which protocol you use, as long as it is secure. Why is that? Because different applications have their own negotiation methods that are best suited to them. Such as:
-
The single gateway protocol (SIP/Jingle/ISUP) applies to the calling mechanism (VoIP, Voice over IP).
-
Custom protocol
-
Multigateway protocol
We can simulate a signaling channel ourselves. Its principle is to transmit information, usually for convenience, we can directly use socket. IO to establish room to provide channels for information exchange.
The establishment of the PeerConnection
Let’s say we’ve now set up a communication channel through socket. IO. Then we can enter the RTCPeerConnection section to establish the connection. We should first use signaling for the exchange of basic information. So what is this information? WebRTC already does this for us at the bottom – Session Description Protocol (SDP). We use signaling to deliver the relevant SDP to ensure that both parties are correctly matched. The underlying engine automatically parses the SDP (thanks to JSEP) without us having to manually parse it. Suddenly the world feels great… So let’s see how we do that.
// Make use of the channels already created. var signalingChannel = new SignalingChannel(); // Enter RTC connection. This is equivalent to creating a peer end. var pc = new RTCPeerConnection({}); navigator.getUserMedia({ "audio": true }) .then(gotStream).catch(logError); function gotStream(stream) { pc.addStream(stream); Function (offer) {pc.setLocalDescription(offer); function(offer) {pc.setLocalDescription(offer); signalingChannel.send(offer.sdp); }); } function logError() { ... }Copy the code
What is the format of SDP? Take a look at the format, ok, this does not need to know too much:
V =0 o= -1029325693179593971 2 IN IP4 127.0.1s = -t =0 0 a=group:BUNDLE audio video a=msid-semantic: WMS M =audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126 C =IN IP4 0.0.0.0 A = RTCP :9 IN IP4 0.0.0.0 a=ice-ufrag:nHtT a=ice-pwd:cuwglAha5fBmGljFXWntH1VN a=fingerprint:sha-256 24:63:EB:DD:18:1B:BB:5E:B3:E8:C5:D7:92:F7:0B:44:EC:22:96:63:64:76:1A:56:64:DE:6B:CE:85:C6:64:78 a=setup:active a=mid:audio a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=inactive a=rtcp-mux ...Copy the code
The preceding procedure is the peer-to-peer negotiation process. There are two basic concepts: offer and answer.
-
Offer: The host terminal provides other users with the basic information of the live video broadcast in their province
-
Answer: The client sends feedback to the host to check whether it can play normally
The specific process is:
-
The anchor side generates the SDP description through createOffer
-
The anchor uses setLocalDescription to set the local description
-
The host sends the offer SDP to the user
-
You can use setRemoteDescription to set the description of the remote device
-
The user creates his own SDP description with createAnswer
-
You can use setLocalDescription to set the local description
-
The user sends the Anwser SDP to the anchor
-
Anchors use setRemoteDescription to set the description of the remote end.
However, the above is only a simple establishment of the connection information at both ends, but does not involve the transmission of video information, that is, UDP transmission. UDP transmission is a very painful work, if the client-server model is fine, direct transmission can be, but this is the peer-to-peer model. Think about it, you are going to use your computer as a server now, and you have to go through if you break through the firewall, if you find a port, how do you do that across network segments? So, here we need an additional protocol, namely STUN/TURN/ICE, to help us with this transfer task.
NAT/STUN/TURN/ICE
During UDP transmission, it is inevitable to encounter a Network Address Translator (NAT) server. That is, it is primarily passing messages from other network segments to machines in the network segment it is responsible for. However, our UDP packets are generally passed with NAT’s host. If you do not have an entry from the destination machine, the UDP packet will not be forwarded successfully. However, if you are a client-server, you will not encounter such problems. However, we transmit in peer-to-peer mode, so we can’t avoid such problems.
To solve this problem, we need to establish an end-to-end connection. What is the solution? Simply set up a server in the middle to keep the destination machine’s entry in the NAT. Common protocols are STUN, TURN, and ICE. What’s the difference between them?
-
STUN: As the most basic NAT Traversal server, it retains the entry of the specified machine
-
TURN: The presence of a retry server when STUN fails.
-
ICE: Choose the most efficient delivery channel among many STUN + TURN servers.
So, the above three are usually used in combination. Their roles in PeerConnection are shown below:
If ICE is involved, we also need to pre-set the specified STUN/TRUN server when instantiating the Peer Connection.
Var ice = {" iceServers ": [{" url" : "stun:stun.l.google.com: 19302"}, usually need to define your {/ / TURN 'url' : 'turn: 192.158.29.39:3478? Transport = udp', 'the credential' : 'JZEOEt2V3Qb0y27GRntt2u2PAYA =', 'username' : '28224511:1379330808'}, {'url': 'turn:192.158.29.39:3478? Transport = TCP ', 'credential': 'JZEOEt2V3Qb0y27GRntt2u2PAYA=', 'username': '28224511:1379330808' } ]}; var signalingChannel = new SignalingChannel(); var pc = new RTCPeerConnection(ice); // Complete when instantiating the Peer Connection. navigator.getUserMedia({ "audio": true }, gotStream, logError); function gotStream(stream) { pc.addStream(stream); // Add the stream to connection. pc.createOffer(function(offer) { pc.setLocalDescription(offer); }); } // through ICE, To monitor whether there is a user to connect PC. Onicecandidate = function (evt) {if (evt) target) iceGatheringState = = "complete") { local.createOffer(function(offer) { console.log("Offer with ICE candidates: " + offer.sdp); signalingChannel.send(offer.sdp); }); }}...Copy the code
In ICE processing, there are also iceGatheringState and iceConnectionState. What is reflected in the code is:
pc.onicecandidate = function(e) {
evt.target.iceGatheringState;
pc.iceGatheringState
};
pc.oniceconnectionstatechange = function(e) {
evt.target.iceConnectionState;
pc.iceConnectionState;
};Copy the code
Of course, the main player is the OnicecandiDate.
-
IceGatheringState: Used to check the status of the local candidate. It has the following three states:
-
New: The candidate has just been created
-
Gathering: ICE is collecting local candidates
-
Complete: ICE completes the collection of local candidates
-
-
IceConnectionState: Checks the status of the remote candidate. Condition is more complex, on the far side of a total of seven kinds of: new/checking/connected/completed/failed/disconnected/closed
However, to better explain the basic process of WebRTC establishing a connection. Let’s use a one-page link to simulate this. Now suppose you have two users, pc1 and PC2. Pc1 captures the video, and then PC2 establishes a connection with PC1 to complete the effect of pseudo live broadcast. Let’s get straight to the code:
var servers = null; // Add pc1 to global scope so it's accessible from the browser console window.pc1 = pc1 = new RTCPeerConnection(servers); Onicecandidate = function(e) {onicecandiDate (pc1, e); }; // Add pc2 to global scope so it's accessible from the browser console window.pc2 = pc2 = new RTCPeerConnection(servers); pc2.onicecandidate = function(e) { onIceCandidate(pc2, e); }; pc1.oniceconnectionstatechange = function(e) { onIceStateChange(pc1, e); }; pc2.oniceconnectionstatechange = function(e) { onIceStateChange(pc2, e); }; Pc2. onAddStream = gotRemoteStream; // Once candidate has been added, the stream will play pc2. onaddStream = gotRemoteStream; // pc1 adds stream to Connection. pc1.addStream(localStream); pc1.createOffer( offerOptions ).then( onCreateOfferSuccess, error ); Function onCreateOfferSuccess(desc) {pc1.setLocalDescription(desc). Then (function() { onSetLocalSuccess(pc1); }, onSetSessionDescriptionError ); trace('pc2 setRemoteDescription start'); Pc2.setremotedescription (desc). Then (function() {onSetRemoteSuccess(pc2); }, onSetSessionDescriptionError ); trace('pc2 createAnswer start'); pc2.createAnswer().then( onCreateAnswerSuccess, onCreateSessionDescriptionError ); }Copy the code
Look at the above code, we estimate a little confused, to some real, we can refer to the single page live. While viewing the page, you can open the console to see the process in action. One of the things you can see is that onAddStream starts before the SDP negotiation is complete, which is one of the things that is wrong with the DESIGN of the API, so the W3C has removed the API from the standard. For now, though, it’s not a problem because it’s just for demonstration purposes. Let’s go through the whole process step by step.
-
pc1 createOffer start
-
Pc1 setLocalDescription start // SDP of PC1
-
Pc2 setRemoteDescription start // SDP of PC1
-
pc2 createAnswer start
-
Pc1 setLocalDescription complete // SDP of PC1
-
Pc2 setRemoteDescription complete // PC1 SDP
-
Pc2 setLocalDescription start // SDP of PC2
-
Pc1 setRemoteDescription start // SDP of PC2
-
Pc2 Received remote stream: At this point, the receiving end can play videos. Pc2’s onAddStream listening event is then triggered. Get the remote video stream. Note that the SDP negotiation of PC2 has not been completed yet.
-
At this point, the state of the local PC1 candidate has changed, triggering pc1 onicecandidate. Start adding PC1 through the pc2. AddIceCandidate method.
-
Pc2 setLocalDescription complete // PC2 SDP
-
Pc1 setRemoteDescription complete // PC2 SDP
-
Pc1 addIceCandidate success. Pc1 is added successfully
-
Trigger oniceconnectionstatechange check pc1 distal candidate status. When the state is completed, the PC2 onicecandiDate event is emitted.
-
Pc2 addIceCandidate success.
In addition, there is another concept, RTCDataChannel, which I won’t touch on much here. If you are interested, see WebrTC, Web performance optimization for further study.