What is the WebRTC
WebRTC is a free, open project that provides real-time communication (RTC) functionality for browsers and mobile applications through a simple API. WebRTC components have been optimized to best meet this purpose.
First contact
The official website is a good reference for learning webRTC. The videos about webRTC and PPT about webRTC in Google I/O 2013 also provide an overview for our understanding of webRTC. To make this easier to understand, we first need to understand the concepts of some of the terms involved.
WebRTC Basic term for real-time communication
-
NAT Network Address Translation (NAT) is a box (physical or virtual box) that connects our local private Network with the public Internet. It can be understood as a firewall, gateway or router. Maintains a mapping table between LAN IP addresses and port numbers and public IP addresses. When our mobile phone uses WIFI, the IP address is generally 192.168.x.x, such as a LAN address. In this case, it can be said that the mobile phone is behind NAT, rather than directly owning a public address. NAT mainly translates the internal IP address (LAN address) into the corresponding public IP address (public IP address). NAT is actually a Network Address and Port Translator because most of the time instant devices can change both the transport Address and the transport Port number. How do I determine whether I am behind a symmetric NAT
-
ICE Interactive Connectivity Establishment — ICE is a framework that allows real-time peers to discover and connect to each other. WebRTC is designed to create end-to-end connections between web browsers. Because there are multiple layers of firewalls and NAT devices between the ends, we need a mechanism to collect THE IP of the public lines between the ends, so it uses a series of technologies, collectively called ICE. ICE allows terminals located behind a specific type of NAT router to establish direct connections. The terminal behind the NAT (on the LAN) does not know its own IP address or public IP address. Therefore, the first problem in establishing a connection is to find the corresponding public IP address. 1. If a STUN server is configured, the ICE agent queries the external STUN server to obtain the public IP address and port 2 of the local end. If a TURN server is configured, ICE will consider the TURN server as a candidate, and when the end-to-end connection fails, the data will be forwarded through the specified intermediate device.
-
STUN Server Session Traversal Utilization for NAT (STUN) A public network server with a public IP address, which is used for Traversal NAT. The terminal behind the NAT can access the STUN server to obtain all the candidate addresses of the terminal. As mentioned earlier, NAT can translate the internal IP address used by the terminal into the corresponding public IP address. When the terminal behind the NAT sends a message (STUN test packet) to the STUN server, NAT passes through. In this case, the STUN server on the public network can collect translated network information such as THE NAT IP address (which becomes a potential candidate address) and send it to the terminal. In this case, the terminal finds its own public IP address
-
TURN A server used to help traverse NAT. When two want to establish a connection for audio and video communication network environment is very complex, is behind the NAT and is always unable to establish a connection, can TURN through the query server to get the relay address (a public IP address), and to forward packets received from the terminal, or will receive a packet is forwarded to the terminal. Like using it to help forward audio and video streams. Note that when two peers cannot establish a peer media session (connection) simply because of the NAT type, the TURN server can be relied on
-
The IP addresses and port groups that can be accessed over the Internet are collected as candidate addresses for each end that will establish a session. A candidate address is a string consisting of IP, port number, priority, and network type. Each terminal may have multiple candidate IP addresses due to different network environments. For example, the mobile phone has both 4G network address and WIFI allocated LAN address
-
OFFER/ANSWER Negotiation After the candidate addresses of accessible terminals are collected, a negotiation session must be conducted between the two parties to establish a media session. The negotiation session is used to determine a set of common features for the session. The negotiation method used in WebRtc is called offer/reply. A party initiates a media session by creating a description for the type of media session to be established. This process is called “proposing”. When the other party receives the proposal and responds, this process is called responding. The RTCSessionDescription object is used in WebRTC to represent proposals and replies. The SDP.
-
SDP Session Description Protocol (SDP), which describes WebRTC sessions. Describes the media information and transmission information. In the WebRTC API, the corresponding code is RTCSessionDescription object. The SDP describes the parameters of an end-to-end connection with a set of data set in a specific format. The SDP does not contain any information about the media itself, but only describes “session health,” as a set of connection properties: the type of media to be exchanged (audio, video, and application data), the network transport protocol, the codec used and its Settings, bandwidth, and other metadata. Therefore, SDP is mainly used for fine control of media sessions.
-
RTP and SRTP Real-time Transport Protocol (RTP) is a real-time Transport Protocol. Secure RTP(SRTP) is a Secure real-time Transport Protocol. SRTP is used to transmit audio and video media packets between webRTC clients. A media packet contains digitized audio or video frames generated by a microphone, camera, or application and played back using a speaker or monitor. SRTP to provide the necessary information for transmission media need: codec (used for sampling and compression of audio or video encoder/decoder), media source (source or SSRC, identifies the sender) the packet, timestamp (used to determine the broadcast time), serial number (used to detect the lost packets), and other information needed to play it. For non-audio or video data, instead of using SRTP, the RTCDataChannel API is invoked, which opens a data channel between browsers to exchange data in any format.
-
An SSRC (source identifier), such as an audio stream or a video stream, can be called a sending source. Each source has a unique identifier called an SSRC. For example, a client can receive multiple video streams at the same time. We use SSRC to distinguish which video source each packet belongs to. This packet is then decoded and rendered in the correct window.
-
Media track is the basic media unit in WebRTC. This track represents a single type of media that can be returned by a device or recorded content (called a “source”). Each track has a source associated with it. The source cannot be directly accessed and controlled through WebRTC. All control of the source is implemented through the track. Tracks can come not only from the source’s original media, but also from a transformed version provided by the browser. MediaStream is a collection of MediaStreamTrack objects. There are two ways to create these MediaStream objects: one is to request access to local media by copying tracks from existing MediaStream; The other is to accept new streams using peer connections. Method of requesting and accessing local media: getUserMedia().
-
Peer connection in WebRTC In terms of WebTRC, a peer connection refers to a direct media connection between two terminals
-
Signaling In WebRTC, signaling is not standardized, that is, WebRTC does not provide a way to exchange SDP, so developers can choose their own signaling protocols to use (such as SIP or Jingle) to transport SDP. Main Functions of signaling 1) Functions of negotiation media and setting The main functions of signaling. Information contained in the Session Description Protocol (SDP) object is exchanged between two browsers (terminals) that participate in a peer connection. This includes the type of media (audio, video, data), the codec used (Opus, G.711, etc.), the individual parameters or Settings used for the codec, and information about broadband. There is also an exchange of candidate addresses (optional). There are also secret key materials for SRTP
2) Identify and verify the identity of the session participant When using standard signaling protocols (such as SIP or Jingle) to initiate real-time communication, the signaling channel will provide the participant’s identity and can choose to authenticate the participant.
3) to control the media session, indicating progress, change the session and terminate the session In WebRTC, signaling to initiate or change the media session, but do not need to signal to indicate status or end the session, the browser ICE state machine can provide the information, such as check the candidate address, the ICE state machine can provide information on the session schedule. After a session is established, if ICE continues to agree that the check fails, the session is terminated.
4) Implement dual-occupancy decomposition when both sides of a communication session attempt to establish or change a session at the same time. The dual-occupancy problem occurs when both sides of a communication session attempt to establish or change a session at the same time. This can result in an indeterminate state of the session. Signaling protocols such as SIP have built-in dual-occupancy decomposition functions. If these solutions are adopted in WebRTC, the processing requirements caused by dual-occupancy problems will be greatly reduced. 1) HTTP/HTTPS transport terminals can initiate new HTTP requests to send and receive signaling messages to the server. Signaling messages can be transmitted using GET or POST methods or as replies.
2) WebSocket transport WebSocket transport allows an endpoint to open a two-way connection to the server. This connection initially takes the form of an HTTP request, which is then upgraded to a WebSocket.
3) Data channel transmission The data channel model is established based on WebSocket, with simple and configurable Send method and onMessage handler.
-
Signaling server is an important component to realize WebRTC communication. It is mainly responsible for exchanging the Candidates and media information description files (SDP) of two terminals.
When two terminals are behind NAT, neither terminal knows the address of the other terminal and cannot connect to the other terminal. In addition, neither terminal knows the media type supported by the other terminal and whether the other terminal supports certain network protocols. This is because a server (signaling server) on the public network is required to exchange this information between the two parties. A signaling server can also serve as a room manager. You can keep a long connection with both clients throughout the communication process to notify both of room status, such as someone leaving, meeting ending, and so on. The signaling server is only used to send some commands and forward some connection information and media information, and cannot be used as the TURN server. The TURN server is used to forward audio and video streaming packets when the two parties cannot establish a connection. Not required, whereas a signaling server is required.
-
Comparison of WebRTC order schemes
plan | Server Requirements | advantages |
---|---|---|
WebSocket agent | A WebSocket server that provides the server code | No signaling architecture foundation is required |
XML HTTP requests | A Web server that provides the server code | No signaling architecture foundation is required |
SIP | SIP registration/proxy server that supports SIP WebSocket transmission | Easy to interoperate with SIP terminals or infrastructure without server code |
Jingle | XMPP server that supports XMPP WebSocket transmission | Easy to interoperate with Jingle terminals or infrastructure without server code |
Data channel | A WebSocket or Web server used to establish a data channel | Signaling delay is short and signaling privacy is protected |
Once you understand some basic concepts
-
WebRTC, SIP, XMPP(Jingle) differences 1) WebRTC provides a series of apis for real-time communication. It eventually emits session Description Protocol (SDP) lines describing media sessions between terminals. But WebRTC doesn’t give you any way to exchange SDP with other people. Therefore, we can use existing signaling channels or implement our own signaling channels for SDP switching.
2) SIP is an existing standardized signaling protocol. Has been used for telephone communication. SIP deals directly with SDP, but in practice, SIP has some limitations, one of which is that it’s almost impossible to figure out whether another endpoint can connect back in. Another reason is that EMBEDDING SIP in a web browser is tricky.
3) XMPP (Extensible Messaging and Online Status Protocol) is commonly used for instant messaging and online status. There is a signaling protocol called Jingle that is an extension of the XMPP protocol. It adds media signaling capabilities to XMPP. Jingle is an XMPP extension that supports voice and video. With Jingle, SDP session descriptions can be mapped to XML format and transmitted to XMPP server via TCP or TSL. Google (still) uses it for chat loops, meetings, etc. The big advantage of XMPP is that it can run over pure TCP as well as Websockets, and is very easy to tunnel over HTTP. These are standardized, so you can choose from multiple servers for deployment.
4) As such, WebRTC is a standard specification (and an open source project) designed to support real-time voice, video, and data communication across browsers. WebRTC itself has no signaling. But it requires signaling to exchange SDP between the two session terminals. This signaling protocol can be Jingle (an extension of XMPP), SIP or any of your own signaling protocols that support the transport of SDP.
-
Step 1: Get local media (getUserMedia())
2) Establish a connection between terminals (RTCPeerConnection)
3) Associate media data and data channels to this connection (RTCPeerConnection)
4) Exchange session Description (RTCSessionDescription)
- Introduction to the main API