WebRTC is to solve the problem of real-time audio and video transmission, and is committed to providing efficient and convenient real-time streaming media transmission free of installation, plug-in and patent fee.
1. Comparison of three real-time streaming media
At present, there are three main ways to realize real-time streaming media: WebRTC, HLS, RTMP, when you look at Live Streaming websites, many of them use HLS (HTTP Live Streaming), which is a technology that splits Streaming media into separate small files and requests different files according to the time of playback. The HLS files are demultiplexed and then the audio and video data is thrown to the video for playback (Safari and Android Chrome can play HLS directly). Its advantages are as follows: it uses the traditional HTTP protocol, so its compatibility and stability are very good. The server can upload HLS files to CDN, so as to cope with the live broadcast of millions of viewers. The disadvantage is that the delay is relatively long, usually more than 10s, which is suitable for the scene where there is no interaction between the audience and the anchor. Because an HLS file is usually more than 10 seconds long, this, combined with the time it takes to generate the file, leads to a significant delay.
RTMP is a standard introduced by Apple, and the other RTMP is introduced by Adobe. It is a complete set of streaming media transfer protocol, using FLV video container, which is not supported by native browsers (flash plug-in support), but can use WebSocket + MSE, relatively few related class libraries. Live streaming on Android/IOS clients should be used more often. RTMP receives an uninterrupted stream of data with a much lower latency than HLS due to the long connection, which is acceptable if there is a call or video interaction between the viewer and the host.
The third WebRTC (Web Real Time Communication) was launched by Google in 2012 and has been in development for six years now. WebRTC 1.0 was finalized in March 2018 and is supported by all major browsers including Safari (Edge has an ORTC). WebRTC is dedicated to efficient real-time audio and video communications with lower latency and smaller buffering rates than RTMP. And the official also provides the matching native Andorid/IOS library, but the actual implementation may be a WebView, which starts webrTC, and then renders the data to the Native layer.
Let’s start with the composition of WebRTC.
2. Composition of WebRTC
WebRTC consists of three blocks, as shown below:
(1) getUserMedia is responsible for obtaining local multimedia data of users, such as setting up camera videos.
(2) RTCPeerConnection is responsible for establishing P2P connection and transmitting multimedia data.
(3) RTCDataChannel is a signaling channel provided. Signaling is an important element to realize interaction in games.
3. getUserMedia
GetUserMedia is responsible for retrieving the user’s local multimedia data, including microphone recording, camera capturing video and screen recording. I have used this API in “How to Achieve Front-end Recording function” — recording with WebRTC’s getUserMedia. Tuning the camera to record video is similar, the method is very simple, as shown in the following code:
window.navigator.mediaDevices
.getUserMedia({video: true})
.then(mediaStream= > {
// Draw it over a video element
$('video') [0].srcObject = mediaStream;
});Copy the code
If you want to record the screen (screen sharing), you can change the parameters of the media, the following code to change the parameter from the default camera to the screen:
navigator.mediaDevices
.getUserMedia({video: {mediaSource: 'screen'}})
.then(stream= > {
videoElement.srcObject = stream;
});Copy the code
A box will pop up asking for the application window to record, as shown below:
For example, you can choose the PPT application, and then you can start your speech. This is currently only supported by Firefox, Edge has a similar one called getDisplayMedia, and Chrome is still in development, but it can be installed with an official browser plug-in. So you can see this demo.
The stream object mediaStream is called by getUserMedia. The stream can be rendered locally and sent to the peer via RTCPeerConnection.
4. RTCPeerConnection
RTCPeerConnection does a lot of work to enable point-to-point connections on the client side (data does not need to be forwarded by the server). The first issue that needs to be addressed is LAN penetration.
(1) NAT holes through the wall
To establish a connection, you need to know the IP address and port number of each other. On a LAN, a router may connect to many devices. For example, when a home router connects to a broadband network, the broadband service provider assigns a public IP address to all devices connected to the router. If both devices use the same port number to create a socket to connect to the service, this will cause a conflict because the external IP address is the same. Therefore, the router needs to rewrite the IP address/port number for differentiation, as shown in the following figure:
When two devices are connected using the same port number, the router translates them into different ports. The external network displays the same IP address and different port numbers. When the server sends data to the two port numbers, the router forwards the data to the corresponding host according to the address translation mapping table.
Therefore, if you listen on the local port number 55020, but the external port number is not this port number, the other party cannot connect to you using the port number 55020. At this time, there are two solutions. The first is to set up port mapping on the router, as shown in the picture below:
The configuration above is to forward all packets destined for port 8123 to 192.168.123.20.
But we can’t ask every user with their router, so therefore there will be a wall hole, basic method is to first by the server (Peer) to establish a connection with one of the party, the router will build a port number of the internal network and external network mapping relationship and save up, 1091, above the network can hit 55020 on the application of computer, This makes a hole. At this time, the server tells the Peer to connect with the hole by adding port 1091 with the IP address. This is the principle of P2P connection through the wall hole, originally originated from online games, because playing online games often need networking, WebRTC for NAT hole standardization.
The validity of this subject to the user’s network topology, because if the router mapping relationship depends on both the network IP + port number, also depends on the server IP and port number, this time was not the hole, because the hole can’t give another server dozens outside network applications use (will establish different mapping). Conversely, if the address mapping table depends only on the IP and port numbers of the machines on the Intranet, then this is possible. WebRTC also provides a solution to the problem of using a server to forward multimedia data.
The hole-making mechanism is called ICE (Interactive Connectivity Establishment), the server for hole-making is called TURN, and the server for forwarding multimedia data is called STUN. Google provides a turn Server that can only get the LAN address on my home network:
(2) Establish P2P connections
For this purpose, I have written a demo. You can open this link to try P2P chat (using two tabs or two computers), and the result is as shown below:
In addition to the default provided TURN service for holing, a Websocket service is required to exchange information between the two sides of the interconnect. So we need to write a websocket service, I wrote a simple one with Node.js, the code has been uploaded to Github: WebrtC-server-client-demo, including the browser code.
This process is shown below:
First, open the camera to obtain the local mediaStream, and add it to the object of RTCPeerConnection. Then create a local offer, which mainly describes some network and media information of the local machine. The format is SDP, as follows:
V =0 o= -4809135116782128887 2 IN IP4 127.0.1s = -t =0 0 a=group:BUNDLE audio video a=msid-semantic: WMS 6ReMVBFmnh4JhjzqjNO2AVBc26Ktg0R5jCFB m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126 ...Copy the code
Then send the offer to the other party to be connected through websocket service. After receiving the offer, the other party creates an answer with the same format and function as the offer, and sends the caller to inform the called party of some information. When either party receives the SDP information of the other party, it calls setRemoteDescription to record it. The onAddStream event is triggered when the candidate is sent by the default ICE server (after setRemoteDesc) and the candidate initiates the connection. Draw the event. Stream in the event onto the video to get the image of the other party.
That’s the whole connection.
If the connection is successful, then the multimedia data will be transferred, which WebRTC does a lot of work.
(3) WebRTC P2P transmission
The overall architecture of WebRTC is shown in the figure below (see the official website) :
The main work includes:
(1) Audio and video codec (VP8/VP9/AV1)
(2) Anti-packet loss and congestion control
(3) Echo and noise cancellation
This is where WebRTC plays a big role — providing reliable transmission, high quality codec and echo problem elimination. The author used a package called H323 Plus to do a project, which is also a P2P connection. But now this real-time multimedia transmission function is directly embedded in the browser, for developers to greatly improve the development efficiency.
In actual online projects, since the connectivity and stability of P2P are not very optimistic, the P2SP architecture is mostly adopted, and S stands for Server, as shown in the following figure:
On the one hand, it can improve stability, and on the other hand, it can solve the problem of one-to-many and many-to-many video chat. Because WebRTC is more suitable for one-to-one, in a one-to-many scenario, there may be performance and upload bandwidth issues for a stream from one user to several users.
Can make a compatibility solution, when P2P does not work, cut to P2SP.
RTCDataChannel is not discussed here, but WebSocket is more used in actual scenarios.
5. WebRTC’s future
WebRTC has been published as a 1.0 standard by the W3C, but is not yet an RFC standard. WebRTC is also developing gradually, including:
(1) Chrome 69 uses a new echo cancellation algorithm called AEC3
(2) VP9 code has been improved by 35%. The new AV1 code can be used in Chrome
(3) Include richer manipulation apis such as RTCRtpSender
Future RTC will provide more features:
(1) The ability to directly manipulate media stream data (now indirectly via CaptureStream)
(2) the ability of a custom codec parameters RTCRtpEncodeingParameters (Chrome, 70).
And so on.
We believe WebRTC has a bright future.
[Middle and senior recruitment front end of Renren]
1. Project Background: We are developing an enterprise-level overseas SAAS CRM(customer management system) product. The front-end technical challenges are great, such as enabling customers to directly make Internet phone calls (directly make mobile phone calls) on our website, sending emails, and automatically processing business according to user scenarios, etc.
2. Background of technology stack: Popular frameworks such as VUE and VUEX are also adopted; communication is WebRTC; message distribution system is FCM of Google and APN of Apple. The service is deployed on Amazon or Google Cloud. Serve global customers.
3. In addition, because the product is an enterprise-level user product, it has high requirements in aspects (such as performance, security, multi-task processing, etc.). Therefore, the candidates have high requirements for technical skills. If you are particularly interested in technical skills, our vacancy provides a good opportunity to develop your talents and grow.
Please send your resume to [email protected]