WebRTC and point-to-point network communication mechanism

This article is shared under a Creative Commons Attribution 4.0 International license, BY Troland.

This is chapter 18 of how JavaScript works.

An overview of the

What is a WebRTC? First of all, a lot of information has been literally given about this technology, RTC stands for real-time communication technology.

WebRTC fills an important gap in the web development platform. In the past, only PEER-to-peer (P2P) technologies such as desktop chat programs allowed real-time communication, not the Web. But WebRTC has changed that.

WebRTC essentially allows web applications to create point-to-point communication, which we’ll cover in a later section. We’ll cover the following topics to give developers a full overview of WebRTC internals:

Point to point communication
The firewall and NAT are penetrated
Signaling, session, and protocol
WebRTC interface

Point to point communication

Each user’s Web browser must follow the following steps to enable point-to-point communication through the web browser:

Agreed to commence correspondence
Know how to locate another point
Bypass security and firewall restrictions
Real-time transmission of all multimedia communications

It is well known that one of the biggest challenges of browser-based point-to-point communication is how to locate and establish a network socket that communicates with another web browser for two-way data transfer. We will overcome the difficulties associated with establishing such a network connection.

Whenever a web application needs data or static resources, it gets them directly from the corresponding server, and that’s it. However, it is not possible to set up a peer-to-peer video chat directly from the user’s browser because the other browser is not a known web server, so the user does not know the IP address to set up the video chat. Therefore, more technology is required to establish P2P connections.

The firewall and NAT are penetrated

Computers are not normally assigned a static public IP address. The reason is that computers are behind firewalls and network access Translation devices (NAT).

A NAT device translates a private IP address inside a firewall into a public IP address. NAT devices are necessary for security and limited available public IP addresses. This is why developers’ web applications cannot treat the current device as if it has a static public IP address.

Let’s take a look at how NAT devices work. When developers are on an enterprise network and WIFI is added, computers will be assigned an IP address that exists only behind NAT. Suppose it is 172.0.23.4. Externally, however, the user’s IP address would be something like 164.53.27.98. Then, the external will treat all requests as coming from 164.53.27.98 and the NAT device will ensure that the response data from the request from the target user’s computer is returned to the corresponding internal 172.0.23.4 IP address of the computer. Thanks to the mapping table. Notice that in addition to IP addresses, network communication requires communication ports.

With the NAT device involved, the browser needs to know the machine IP address of the target browser to communicate with.

This requires the use of NAT session penetrator (STUN) and NAT through relay forwarding server. To use WebRTC technology, developers need to request the STUN server for its public IP address. This is as if your computer were asking a remote server for the IP address of the client from which the remote server initiated the query. The remote server returns the corresponding client IP address.

Assuming this goes well, the developer will get a public IP address and port that will tell other points how to communicate directly with you. Similarly, these points can request a STUN or TURN server to obtain a public IP address and then inform it of its communications address.

Signaling, session, and protocol

The network information retrieval process described above is just one part of a larger signaling topic, which in WebRTC is based on the JavaScript Session Building Protocol (JSEP) standard. Signaling covers network retrieval and NAT penetration, session creation and management, communication security, media function metadata and modulation and error handling.

For communication to proceed smoothly, the node must determine the metadata local media environment (such as resolution and coding capabilities) and collect available application host network addresses. There is no integrated signaling mechanism in the WebRTC interface to transmit this important information repeatedly.

The WebRTC standard does not specify signaling and is not implemented in interfaces in order to allow more flexibility in the use of other technologies and protocols. The signaling and the server that processes the signaling are controlled by the WebRTC developer.

Assuming that the developer’s browser-based WebRTC program uses the aforementioned STUN server to obtain its public IP address, the next step is to negotiate with other points and establish network session connections.

Initialize session negotiation and communication connections using any of the signaling/communication protocols specifically used for multimedia communication. This protocol is responsible for managing the rules for sessions and interrupts.

Session Initiation Protocol (SIP) is one of the protocols. Thanks to the flexibility of WebRTC signaling, SIP is not the only signaling protocol available. The selected communication protocol must be compatible with an application layer protocol called session Description Protocol (SDP), which is applied to WebRTC. All multimedia specified metadata is transmitted through SDP protocol.

Any point that tries to communicate with another point, such as a WebRTC program, generates an interactive Connection Establishment Protocol (ICE) candidate set. Candidate set represents a set of available IP addresses, ports, and transport protocols. Note that a computer can have multiple network interfaces (wired and wireless, etc.), so it can have multiple IP addresses, with one IP address assigned to each interface.

The following diagram depicts this communication exchange on the MDN:

Establish a connection

Each node first gets the public IP address described earlier. The “channel” signaling data is then dynamically created to retrieve other nodes and support point-to-point negotiation and session creation.

These “channels” cannot be retrieved and accessed externally and can only be accessed by unique identifiers.

It should be noted that due to the flexibility of WebRTC and the fact that signaling creation procedures are not specified in the standard, the concept and use of “channel” may differ somewhat depending on the technology used. In fact, some protocols do not require a “channel” mechanism for communication.

This article will assume that “channels” exist.

Once two or more points are connected to the same “channel,” the nodes can communicate and negotiate session information. This process is somewhat similar to the publish/subscribe model. In essence, the initial point sends an “offer” package using signaling protocols such as Session Initiation Protocol (SIP) and SDP. The initiator waits for an “Answer” reply from the receiver connected to the specified “channel”.

Once a response is received, the selection and negotiation of the optimal set of interactive connection establishment coordination candidates (ICE) generated by each node is initiated. Once the optimal ICE candidate set is selected, in particular, the metadata, network routing (IP addresses and ports), and media information required for all node communication are confirmed. This fully establishes and activates network socket sessions between nodes. Next, each node creates local data streams and data channel endpoints, and then, finally, transmits multimedia data using any two-way communication technology.

If the process of validating the best ICE candidate fails, as is often the case with firewalls and NAT technologies used, the TURN server is used as a backup relay server. This process involves using a server as an intermediary and then forwarding data between nodes. Note that this is not true point-to-point communication, because true point-to-point communication involves direct two-way data transfer between nodes.

Each node will not have to know how to connect and transfer data to the other node whenever TURN is used as backup communication. Instead, nodes only need to know the public TURN server that sends and receives multimedia data in real time during session communication.

It is important to understand that this is only a failsafe and a last resort. TURN servers need to be fairly robust, with expensive bandwidth and powerful processing power to handle potentially large amounts of data. Therefore, using a TURN server adds significant additional overhead and complexity.

WebRTC interface

WebRTC contains three main types of interfaces:

** Media capture and streaming -** Allows developers to access input devices such as microphones and web cameras. This interface allows developers to access microphone or webcam media streams.
**RTCPeerConnection-** Developers transmit captured video and audio streams in real time to another WebRTC endpoint. Developers use these interfaces to connect local machines to remote nodes. This interface provides methods for creating connections to remote nodes, maintaining and monitoring connections, and closing connections that are no longer active.
**RTCDataChannel-** This interface allows developers to transfer arbitrary data. Each data channel is associated with RTCPeerConnection.

We will introduce each of these interfaces.

Media capture and streaming

The media capture and streaming interface, often referred to as the media stream interface or streaming interface, supports streaming data for audio or video data, methods for handling audio and video streams, constraints related to data types, success and error callbacks for asynchronously retrieving data, and events triggered during API calls.

The getUserMedia() method of MediaDevices prompts the user to authorize the use of a media input device to create a media stream containing a track of the specified media type. The media stream may include tracks such as video tracks (created by hardware such as cameras, video recording equipment, screen sharing services, or virtual video sources), audio tracks (similar to video, created by physical or virtual audio sources such as microphones, A/D converters, etc.), and possibly other types of tracks.

This method returns a Promise and resolves it into a MediaStream object. When the user refuses authorization or no matching media resources are available, the Promise returns PermissionDeniedError or NotFoundError, respectively.

MediaDevice singletons can be accessed from the Navigator object:

navigator.mediaDevices.getUserMedia(constraints)
.then(function(stream) {/* use stream */}). Catch (function(err) {/* Handle error */});Copy the code

Notice that you need to pass in the Constraints object to specify the type of media stream to return. Developers can make a variety of configurations, including which camera to use (front or rear), frame rate, resolution, etc.

As of version 25, Chromium-based browsers already allow audio data retrieved via getUserMedia() to be assigned to audio or video elements (note that media elements default to null).

GetUserMedia can be used as the input node for the web audio interface:

functiongotStream(stream) { window.AudioContext = window.AudioContext || window.webkitAudioContext; var audioContext = new AudioContext(); / / var mediaStreamSource from creating audio stream node = audioContext. CreateMediaStreamSource (stream); / / and the target node to connect to listen to it or by other node processing mediaStreamSource. Connect (audioContext. Destination); } navigator.getUserMedia({audio:true}, gotStream);
Copy the code

Privacy restrictions

Because this interface can cause significant privacy issues, the specification is very specific about the getUserMedia() method in terms of user notification and permission management. GetUserMedia () must always obtain user authorization when opening media input devices such as the user’s webcam or microphone.

Browsers may provide the ability to grant permissions once per domain name, but authorization must be asked for at least the first time, and then the user must specify the permissions to grant.

Rules in notifications are equally important. In addition to possible other hardware indicators, the browser must also display a window showing the camera or microphone in use. Even if the device is not recording at the time, the browser must display a prompt window indicating which device has been authorized as the input device.

RTCPeerConnection

RTCPeerConnection represents a WebRTC connection between a local computer and a remote node. It provides methods to connect to remote nodes, maintain and monitor connections, and close connections that are no longer active.

The following is a WebRTC diagram showing the role of RTCPeerConnection:

From a JavaScript perspective, the main thing to understand in the diagram is that RTCPeerConnection abstracts the complexity of the underlying internal structure into an interface to the developer. The encodings and protocols used by WebRTC go a long way towards creating as real-time communication as possible even in an unstable network environment:

Packet Loss recovery
Reply to eliminate
Network adaptation
Video jitter buffer
Automatic gain control
Noise reduction and suppression
Image “Clean”

RTCDataChannel

WebRTC also supports real-time transmission of other types of data, not just audio and video.

The RTCDataChannel interface allows point-to-point exchange of arbitrary data.

This interface has many uses, including:

The game
Live text chat
The file transfer
Distributed network

The interface has several features that take full advantage of RTCPeerConnection and create powerful and flexible point-to-point communication:

useRTCPeerConnectionCreate a session
Multiple concurrent channels containing priorities
Reliable and unreliable messaging semantics
Built-in security (DTLS) and message blocking control

The syntax is similar to that of the existing WebSocket, including the send() method and message event:

var peerConnection = new webkitRTCPeerConnection(servers,
    {optional: [{RtpDataChannels: true}}]); peerConnection.ondatachannel =function(event) {
    receiveChannel = event.channel;
    receiveChannel.onmessage = function(event){
        document.querySelector("#receiver").innerHTML = event.data;
    };
};

sendChannel = peerConnection.createDataChannel("sendDataChannel", {reliable: false});

document.querySelector("button#send").onclick = function (){
    var data = document.querySelector("textarea#send").value;
    sendChannel.send(data);
};
Copy the code

Because communication occurs directly between browsers, RTCDataChannel is faster than WebSocket even using a relay forwarding server (TURN).

Practical application of WebRTC

In practice, WebRTC requires a server, but this is simple, so the following steps occur:

Users individually retrieve nodes and then exchange details such as names.
WebRTC client programs (dots) exchange network information.
Dots exchange media information such as video format and resolution.
WebRTC client programs penetrate the NAT gateway and firewall.

In other words, WebRTC requires four types of server-side functionality:

User retrieval and communication
signaling
NAT/ firewall penetration
Relay forwarding server in case of point-to-point communication failure

ICE uses STUN and its extension TURN protocol to create RTCPeerConnection connections to handle NAT penetration and other network changes.

As mentioned earlier, ICE is a node protocol used to connect, for example, two video chat clients. Initially, ICE will attempt to connect directly to nodes using UDP with the lowest possible network latency. In this process, the STUN server has only one job: to enable the node behind the NAT to find its public address and port. Developers can check out the list of available STUN servers (Google also has a bunch).

Retrieve connection candidates

If UDP fails, ICE tries TCP, HTTP first and then HTTPS. If direct connection fails – special case due to enterprise NAT penetration and firewall -ICE uses intermediate (forward) TURN server. In other words, ICE first uses STUN servers to connect directly to nodes via UDP, and if that fails, uses TURN relay forwarding servers. “Retrieval connection candidate” refers to the process of retrieving network interfaces and ports.

security

Real-time communication programs or plug-ins can pose several security issues. Such as:

Unencrypted media or data can be stolen between browsers or between browsers and servers.
The program may record and distribute audio and video without the user’s authorized consent.
Suspicious software or viruses may be installed along with seemingly innocuous plug-ins or programs.

WebRTC has several ways to solve these problems:

WebRTC implementations use security protocols such as DTLS and SRTP.
All WebRTC components, including the signaling mechanism, are forcibly encrypted.
WebRTC is not a plug-in: its components run in the browser sandbox and are not in a separate process, do not require a separate component installation and are updated as the browser upgrades.
The camera and microphone must be explicitly authorized and must be visible in the user window when the camera or microphone is running.

WebRTC is an incredible and powerful technology for products that need to implement some real-time communication flow between browsers.

References:

www.html5rocks.com/en/tutorial…
www.innoarchitech.com/what-is-web…

, recruiting

Toutiao is hiring! Send resume to [email protected], you can take the fast internal push channel, long-term effective! The JD of the international PGC department is as follows: c.xiumi.us/board/v5/2H… , can also be pushed inside other departments!

See Github addresses in updates to this serieshere.