First of all, we have a picture on webRTC’s official website to understand the architecture of webRTC:

There’s a lot of information online that says this picture is on the webRTC website, but many children’s shoes are nowhere to be found. This is because a lot of children’s shoes were not to learn – on – the WebRTC architecture shows that English document: https://webrtc.github.io/webrtc-org/architecture/

The official English documentation for WebRTC’s architecture is quite clear, so this article may be more of a translator. Let’s take a look at WebRTC’s architectural design from top to bottom.

Three layer architecture

First of all, we can see from the figure that webRTC is divided into three parts: the green part, the dark purple part and the light purple part.

The top layer, highlighted in pale purple, is the developer’s application layer, which refers to the applications that developers develop based on the webRTC specification. This is not strictly part of WebRTC’s architecture.

Edited by W3C WG The dark purple mid-tier Web API(Edited by W3C WG) represents the API that WebRTC opens up for application layer developers to call (mainly JavaScript apis for use on the Web side), where developers don’t have to worry about complex underlying technologies, You only need to understand the general flow principle of webRTC and adjust its API to realize the point-to-point communication function with webRTC.

Secondly, the green part is the core function layer of WebRTC, which is divided into four sub-core function layers. C++API layer, session management layer, engine layer, driver layer respectively.

Your web app layer

Your Web App # 1… . As mentioned above, this is not strictly part of the WebRTC architecture, so I won’t cover it here.

The Web API layer

The Web API layer, which is Edited by W3C WG in dark purple, represents the API that WebRTC opens up to application layer developers (mainly JavaScript apis for use on the Web side), in which developers don’t have to worry about complex underlying technologies, You only need to understand the general flow principle of webRTC and adjust its API to realize the point-to-point communication function with webRTC.

WebRTC c + + API layer

The green part is the light purple WebRTC C++ API (PeerConnection) part, which is mainly some C++ interface layer, this layer provides some C++ API, mainly for browsers to support the WebRTC specification and call the API, For example, if you need to implement webRTC function on Android, you need to write JNI function call this layer of API. The main function of this layer is to expose the core functions of WebRTC, such as device management, audio and video stream data collection, so that various software vendors can integrate into their own applications, such as browser vendors.

PeerConnection is the core module of this layer, that is, the PeerConnection module. Many functions are realized in this module, such as P2P hole through the wall, communication link establishment and optimization, stream data transmission, non-audio and video data transmission, transmission quality report and statistics, etc.

Session management layer

Session Management/Abstract Signaling (Session) is a layer in green. This layer provides session management functions, such as creating sessions, managing sessions, and managing context. This layer involves various protocols, such as SDP protocol of signaling server, which is mainly used for signaling interaction and connection status management of RTCPeerConnection.

Engine layer

This layer is the heaviest and most complex of the WebRTC core layers. This layer is divided into three small modules, namely, Voice Engine, Video Engine and Transport module.

The first module is Voice Engine. Voice Engine is a framework containing a series of audio processing functions, such as audio collection, audio codec, audio optimization (including noise reduction, echo cancellation, etc.) and a series of audio functions.

The second module, Video Engine, is a framework containing a series of Video processing functions, such as Video capture, Video encoding and decoding, dynamic modification of Video transmission quality according to network jitter, image processing, etc.

The third module Transport(Transport module), in WebRTC, data transmission in addition to audio and video streaming data, but also can transfer files, text, pictures and other binary data, these functions are provided by this module.

From the figure, we can see that there are multiple sub-engines under each engine. Now we will explain the sub-engine functions under each engine.

iSAC / iLBC Codec

ISAC and iLBC are the audio encoders built into WebRTC. ISAC is a codec for Voice over Internet Protocol (IP-based Voice transmission) and audio streams for audio transmission in broadband and ultra-broadband environments. It is the default codec of WebRTC audio engine. The technology is mature. It is widely used in all kinds of real-time communication software. ILBC is a voice codec of VoIP in a narrowband environment, which can maintain good call quality even when the network packet loss is serious.

NetEQ for voice

NetEQ is a component of network speech signal processing. This algorithm can adapt to the changes in the network environment and effectively deal with the audio quality problems caused by data packet loss caused by network jitter. This technology is the strength of GIPS, the predecessor of WebRTC.

Echo Canceler/Noise Reduction

Echo Canceler is processing Echo cancellation module, which can effectively eliminate the collected audio impact Echo, for example, in the process of real-time audio and video calls to open the phone’s speaker, the requirements of the originally recorded my voice send real time each other, but because of the Echo, also can put each other voice recording. So far, I’ve found that some phones on the market already have echo cancelling for recording, and Android has an API for recording, but it seems that in most cases, the API doesn’t work, either because of vendor compatibility issues or even because the feature has been removed. Therefore, if you want to achieve recording is full platform echo cancellation function, you can use this function of WebRTC. Recording on iOS is echo cancelling.

Noise Reduction is a Noise suppression module (also known as Noise Reduction), such as effective suppression of various kinds of Noise (hiss, fan Noise, etc.).

VP8 Codec

VP8 is the eighth generation of On2 video, which can provide higher quality video with less data and requires less processing power to play video, providing an ideal solution for Internet TV, IPTV and video conferencing companies committed to product and service differentiation.

Its data compression rate and performance is higher than other codecs on the market, its functional characteristics are very suitable for real-time communication, is the default video codec in WebRTC.

VP9 is a free and open source video codec provided by Google. It is the later version of VP8 and was originally developed as The NEXT Open Source Video or VP-Next.

The development of VP9 began in Q3 2011, trying to reduce the bit rate of VP8 by 50% while maintaining the same quality. In addition, it is hoped that VP9 has better Coding Efficiency than H.265 (High Efficiency Video Coding).

Video Jitter Buffer

Video Jitter Buffer — Video Jitter Buffer. It is inevitable for real-time Video communication to cause Video Jitter or Video data loss due to network reasons. Video Jitter Buffer relies on unique algorithms to effectively solve such situations, which have a great impact on the quality of live meetings.

Image enhancements

Image Enhancements – Image enhancements module for image enhancements such as image brightness detection, color enhancement, noise reduction, etc.

SRTP

SRTP is the content of the transport module. Before learning about SRTP, we need to learn about RTP.

The Real Time Protocol (RTP) is an end-to-end data transmission Protocol with real-time features. And we usually say RTCP and so on is the control protocol of RTP.

Unlike HTTP and FTP, which can download the entire film and television file, RTP sends data in a fixed data format over the network. If the first few bytes of RTP represent something, audio data or video data are contained in those few bytes of RTP, etc.

RTP does not consider the security of data transmission, for example, it does not have the encryption function. Therefore, IT does not meet the application requirements with high security requirements. Therefore, SRTP is proposed to solve this problem. The Securereal-Time Transport Protocol (SRTP) is a Transport Protocol that adds a security mechanism on the basis of RTP. SRTP provides data encryption, message authentication, integrity assurance, and replay protection to maximize data transmission security. So the relationship between RTP and SRTP is roughly the same as that between HTTP and HTTPS.

Multiplexing

Multiple exing, channel multiplexing, that is, Multiple stream data transmission sharing a channel, in order to improve transmission efficiency.

To tell the truth, I do not know how to reuse this, put aside for a while…

P2P STUN+TURN+ICE

WebRTC is a P2P based communication technology. STUN, TURN, ICE are some key technologies to realize P2P.

STUN, TURN, and ICE become NAT penetration. In real life, internal and external IP addresses in different Lans cannot communicate directly. For example, 192.168.2.1 in LAN A and 192.168.2.2 in LAN B cannot send messages to each other directly. So if you want to establish a direct communication channel in two different Lans, you have to rely on STUN+TURN+ICE technologies.

STUN, TURN and ICE use different schemes to penetrate, which is not clear in a few words. We will learn more about it in detail with examples later.

Driver layer

This is the part with the light blue dotted line, which consists of Audio Capture/Render, Video Capture and Network I/O.

And these audio and video collection and rendering, network IO transmission function, we are directly call the relevant API provided by each platform can be realized, as for the underlying driver is how to achieve, the author is not clear, it is not here.

conclusion

WebRTC’s excellent layered architecture at least teaches us that good architectural design is mostly divide-and-conquer. Each layer is a small function point, and what we need to do is make each layer good enough, and then take the best layer and assemble it into a great project.

In fact, WebRTC is a very large content, if we can make every module good enough, good enough optimization, even can be isolated to do a professional project operation. You can imagine how much energy and time it takes to learn and study WebRTC thoroughly.

WebRTC most a project based on audio and video, not only need to understand the knowledge about the learning communication, you also need to audio and video related knowledge, and these knowledge to eat fully is need a lot of actual combat, if you want to do much also know the why, you will need to children’s shoes have a strong self-control and endurance of continuous learning.

The resources

“Fix WebRTC Audio and Video Live Communication Technology (Core Technology Introduction)”

Pay attention to me, progress together, life is more than coding!!