It’s no secret that remote working has become more popular since the beginning of the COVID era, and despite the advent of vaccines, many companies and teams have fully embraced the idea of working online and aren’t about to let it go. As a result, the need for online collaboration tools is increasing; In particular, video conferencing solutions. When I started, I knew very little about videoconferencing. After investigating the matter thoroughly, I came across WebRTC, a protocol responsible for real-time communication (hence RTC).

WebRTC doesn’t have to be used for video conferencing, but it’s definitely built with that in mind. By today’s standards, a delay of less than a second is considered real time. WebRTC is the fastest solution available today, and most importantly, it’s open source, which makes the technology free. Any other solution is behind in latency, but keep in mind that they are not built to give us real-time performance, but for different purposes. Here’s a delay comparison so you can get an idea of how fast WebRTC really is:

When I started working on WebRTC, I realized that it was architecturally complex. It’s not a single approach like REST or Web sockets, but many.

Summary of WebRTC

WebRTC is a protocol designed to support direct communication between browsers. It contains a set of classes and methods for standardizing the process and has been available since Chrome 23:

In addition to standardizing communication processes, the browser complements WebRTC by giving you easy and secure access to hardware. You can stream the screen, microphone and camera; This usually requires you to install external plug-ins or binaries, and can get very complicated because each operating system and hardware needs to be different (and complex!). The configuration.

Peer connections

WebRTC is based on P2P (point-to-point); The call participant is responsible for transferring data from one end to the other without relying on a middleman. If one participant is disconnected for any reason, the other participants will continue to send data, and the other participants will continue to send data. This is different from traditional communication, where data is no longer streamed if the connection to the server is lost.

So when we talk to each other we have to instantiate a peer object to represent each other and consider that there are three of them and one of them to instantiate

Each instance of the three parties is connected to a different channel as follows:

                                             

Signaling Server

As the call goes on, I will have to keep track of the people joining and exiting the conversation, and create or dispose of the connections separately. In order to listen and track these events we need to have signaling servers to control them.

A signaling server is dedicated to establishing an initial connection between two or more peers who want to communicate. Once the connection is established, you will not need to use it for continuous communication. But you can use it if you want to signal other events.

A signaling server can be implemented in many ways; all you need is A bridge between peer A and peer B.

This is usually done through WebSocket because communication can be initiated spontaneously at any time:

SDP

Once we know that someone has joined the conversation, we need to exchange information about each other’s systems to establish a connection. This information is based on a protocol called SDP (Session Description Protocol) and includes detailed information about its owning peer, such as what proxy it uses, what hardware it supports, what type of media it wants to exchange, and so on. The SDP config is a simple key-value object:

The SDP configuration also means that Anwer can be used as an Offer, sending an Offer in return for each connection

However, it is important to keep track of what end the SDP configuration in question represents: us or the other end. When initializing a peer instance, we need two things: a local description and a remote description. The local description represents us, and the remote description represents the other end. Together we can build a successful connection:

ICE is a candidate

A peer connection may have many traffic transfers, not just one. Someone may have multiple private IP addresses/ports, or multiple public IP addresses/ports, or various protocols, or one or more reverse proxies, and so on. Once we have created the SDP WebRTC tries to find every possible communication transmission to the browser, called ICE candidate (interactive connection establishment) :

The CE candidate is just another key-value pair that should be added to the SDP. We can either wait for WebRTC to find all possible candidates and send the complete SDP, or we can send each detected ICE candidate along with the signaling server and gradually expand the SDP; Both options are valid. WebRTC should know how to alternate between ICE and choose the most viable option.

By default, WebRTC will be preferentially basedUDPICE for user Datagram Protocol. withTCPUnlike TCP, where packets are not transmitted unless 100% of them are sent first, UDP keeps the flow of data moving regardless of the state of the previous packets, making communication faster.

NAT through

Most machines are not directly connected to the global network, they are likely to go through the NAT layer (network address translation). When transmitted through the router, your machine’s private IP/port will actually be converted to another public IP/port.

Since WebRTC strives to establish as direct a connection as possible between the two parties, the fact that either party is working through an agent creates some complexity in the process, so we should be aware of this. Let’s look at the different NAT configurations and see how we can use them to establish a direct connection

STUN

If our computer is connected to the NAT layer, we need our public IP/port to create the ICE candidate. Thus, WebRTC enables us to specify the STUN server URL when initializing a WebRTC connection:

TURN

The other two restricted NAts connect to their peers in a similar way to full NAts, but they have one small restriction. They need to know their public IP/ port, and they also need to ensure that the destination IP/ port for incoming requests exists in the NAT table. Unlike full NAT, in which the router basically trusts everyone and is restricted to trusting only those who are trying to establish a connection with it.

To overcome this problem, we use a technique called turn (piercing)