Status and Future of WebRTC (Part 1)

By Chad Hart

Link to original articleWebrtchacks.com/webrtc-toda…

Bernard has had a long and distinguished career in real-time communications. In addition to the role of the W3C Webrtcco-Chair, He is also the co-chair of WEBTRANS and AVTCORE working groups and the editor of ORTC, WebrtC-SVC, WeBRtC-NV Use Cases, WebrtC-ICE, WebTransport and WebrtC-Quic documents. Don’t forget that WebRTC is also partially standardized in the IETF, and Bernard is also the co-chair of WEBTRANS and AVTCORE WGs. At Microsoft, he is the chief architect of the Microsoft Team Media Organization, called IC3, which supports Microsoft teams and other projects based on team infrastructure, such as Azure Communications Services (Gustavo publishes information about this here).

Status of WebRTC standardization

As one of the chairs of the W3C WebRTC Working Group, Bernard is the authoritative figure in the WebRTC standardization process. I first asked him about the working Group’s current charter.

**Bernard: ** As discussed in the April 2020 PRESENTATION at the W3C, the WebRTC Working Group Charter describes three areas of work:

1. Complete the first priority network real-time communication peer connection (WEBRTC-PC), and the network real-time communication statistics and other related specifications, such as WEBRTC-STATS.

2. Capture, streaming, and output related specifications, including media capture and streaming, screen capture, capturing media from DOM elements, media stream image capture, media stream recording, audio output devices, and content hints.

3. Webrtc-nv, the “next version” of WebRTC.

Webrtc-nv is the “next version” of WebRTC. This is after the current 1.0 specification.

**Bernard: ** WebrtC-NV’s work falls into four broad categories.

1. One is the extension of WebRTC peer connection. This includes WebRTC extensions, WebrtC-SVC and pluggable streams. I should mention that the CNTC recommendation and all work that relies on CNTC connection requires the RTCPeerConnection “unified Plan,” which is the default SDP language in all browsers. For example, it is not possible to leverage pluggable streams to support end-to-end encryption in your application without first supporting a unified plan.

2. The second category involves functions that do not meet the implementation or maturity requirements contained in the WEBRTC-PC recommendations, such as WebRTC Identity, WebRTC Priority Control and WebRTC DSCP.

3. The third category is extensions to Capture, such as The MediaStreamTrack Plugable Streams, the MediaCapture and Streams extension, and the MediaCapture Deep Streams extension (recently restored).

4. The fourth category is what I call an independent specification that does not necessarily rely on RTCPeerConnection or the existing Media Capture specification. Webrtc-ice (now implemented as a stand-alone specification) falls into this category, as do API specifications developed outside the W3C WebRTC working group, Examples are WebTransport(W3C WebTransport Working Group), WebrtC-Quic (ORTC Working Group), and WebCodecades(WICG Working Group).

Given the different job categories, the term “NV” is somewhat vague and can be confusing. The term originally referred to ORTC, but today it usually refers to multiple specifications rather than a single file. In current usage, there is ambiguity because “NV” may refer to an extension of RTCPeerConnection and the existing capture application interface, or an application interface unrelated to RTCPeerConnection or the existing Capture application interface, Such as WebTransport and WebCodecs. Therefore, when someone mentions “webrTC-NV”, it is often necessary to ask follow-up questions to understand the potential meaning they want.

Become a path to full recommendations

The protocols used in WebRTC are defined by the IETF, while the W3C defines the apis used by browsers. The path to formal standardization at the W3C — and the debate over what should be included — is sometimes a contentious topic.

Bernard gives some background and status to this process.

**Chad: ** Can you walk our audience through the W3C specification phase?

**Bernard: ** The first standardization phase is CR-Candidate recommendation. The candidate recommendation means that the specification has been extensively reviewed, meets the requirements of the working Group, and is enforceable. In CR, the specification may not be fully implemented (there may be “functional risks”), and there may be interoperability issues between browsers.

Here you are (www.w3.org/2020/Proces…

Chad: The last CR you said was, I guess, a hint that there can be multiple CRS, or that the CR process is a multi-stage thing?

**Bernard: ** there’s also a new W3C process where you basically have real-time specifications. We can only say that before we make recommendations on these two issues, we are already on the last one.

So PR [Proposed Recommendation] is the stage where you try to prove that everything in the specification has been implemented and passed through your interoperability standards. Then recommendations, and even further. The next step is PR, and we’re gathering all the data you need. In the case of peer-to-peer connections, this is a lot of data because you need all of your interop tests, including your WPT test results and possibly your KITE test results.

WPT stands for Web Platform Tests, which are a set of tests that the W3C checks API implementations. The results were located in WPT. Fyi.

KITE is an open source interoperability testing project specifically for WebRTC. Dr. Alex Gouaillard discusses this in his blog post, Turning Point: The SFU Blog Load Test.

**Chad: **WPT is Wpt.fyi, which is a general purpose automated feature test, and KITE is a WebRTC specific interop test.

**Bernard: **WPT Webcast’s test runs on a single browser. We don’t have server tests at WPT for network live transmission, but we do have server tests for network transmission. As a result, the WebRTC WPT test did not demonstrate interoperability between browsers or between browsers and conference servers, while the KITE test was conducted between browsers and potentially multiple entities.

**Chad: ** This is specific to WebRTC – you’re actually sending media to different browsers.

In order to understand the level of WPT test coverage, we have annotated the specification. So, in addition to the test results, you want to know how many specifications the tests actually cover.

COVID-19 has slowed standards work

WebRTC has had some interesting effects on WebRTC. This keeps all of us in the WebRTC community busy and more focused on the scalability and reliability of all the new traffic. However, this change in focus can wreak havoc on existing processes. Does this apply to standardization?

**Bernard: ** The bottom line is that we are trying to gather all this evidence that we will submit to the W3C to show that we are ready for the recommendation phase. This is a very big step, but progress has been slowed by the virus. I mean, we thought we’d make more progress in implementation, but the virus has slowed everybody down.

**Chad: ** Is that because people are busy doing things to support their products or is it because you can’t actually get together as much?

**Bernard: ** COVID-19 has disrupted a lot of things. For example, KITE interop testing is usually done in person at IETF events, but we haven’t been able to participate in IETF personally yet. We’ve been trying to figure out how to get the test done, but it’s hard without everyone being in the same place. If you’re all over the world, the ability to organize everyone in the same place at the same time is really hard. Imagine it’s 3am and you need to do interoperability testing with people in different time zones on the other side of the world.

The global pandemic has not only disrupted testing, it has also affected implementation plans, as shown in the fusion chart. While nearly all of the proposed features are already implemented in at least one browser, we initially thought we would have more functionality in two or more browser cobblebases by fall 2020. Therefore, the implementation schedule and testing were not what we expected.

Source: TPAC-2020-Meetings

Docs.google.com/presentatio…

How important is standardization?

In the past few years, almost every new Web browser has implemented WebRTC. WebRTC is supporting a significant portion of the world’s voice over IP (VoIP) traffic. At this point, is it important to move to the next stage of standardization?

Bernard points out that standardization isn’t just about writing specifications — it’s really about interoperability.

Standardization focuses on testing and stability. One of the biggest challenges with WebRTC peer-to-peer connectivity is its breadth. We learn from important bugs every day. We found that our coverage was not what we wanted it to be. We also learned how difficult it is to achieve even what I call acceptable test coverage. A bunch of issues like reuse have come up recently that actually have a big impact on existing services and we haven’t tested them. What we see with these errors is that they are not the kind of problems that WPT encountered. Essentially things that require something like the KITE framework to do, we haven’t yet achieved 100% test coverage in KITE.

Overall, one of the biggest differences I experienced between real-time communication and other aspects of the network was the sheer size of the test matrix. If I tell you Chad, I want you to develop and get 95% coverage. I think the process of passing the test is helpful, but it also gives us an idea of the scale of the challenge that really covers everything. It’s tough.

WebRTC extension

The list of things you can do with WebRTC is growing. As Bernard just mentioned, WebRTC 1.0 is going through the standard process, so they have to add new features somewhere. As Bernard will explain, the WebRTC extension is home to some features that did not make it into WebRTC 1.0.

**Bernard: ** there are a number of specifications that rely on RTCPeerConnection. The WebRTC extension is one of them. These are specifications that add functionality to the WebRTC PC. There are a lot of things here, for example, live transport protocol header extension encryption. WebRTC SVC (Extensible Video Coding) is not in the WebRTC extensions document, but I think it is an extension. I think of the pluggable stream as an extension of the WebRTC PC (its encoded version), its encoded version. This is assuming you have an RTCPeerConnection.

getCurrentBrowsingContextMedia

As the use of video conferencing has increased, there have been several high-profile stories of webcam errors and accidental screen sharing. Meanwhile, fast access to webcams is usually an issue with WebRTC services. Balancing access speed and privacy controls is a challenge. In addition, fingerprinting using media device information provided by getMediaDevices has been a privacy challenge.

GetCurrentBrowsingContextMedia proposal is an attempt to solve these challenges.

Chad: * * * * we can report the getCurrentBrowsingContextMedia media proposal?

**Bernard: ** this is really an extension, I think this is an extension to the screen capture. Let me talk about the issue of [media] capture — a lot of the focus of capture is on privacy and security. We found that media capture streams are not good for privacy. Suppose you’re going to provide the application with all the information on the device, whether it’s selected or not, and then let it create its own selectors. This is a real problem with fingerprinting, because now I know all the devices on your machine. Even if you don’t want to use that camera, I know it’s there. So this really helps identify your fingerprint, and Jan-Ivar has been suggesting that we move to another model that’s more like a screen capture.

In screen capture, you can only access the captured surface of the user’s choice. So, I can’t access all of your apps, I can see every window, and then I decide as an app to buy what I want to see. Now that the user has selected the source, you can only access it. This is the media capture and streaming model proposed by Jan-Ivar. Essentially, it will become part of the browser selector. The application can only access information on the device selected by the user. This is a big change. It also questions some of the basis for media capture and raw energy. For example, if the user is going to choose anyway, what is the purpose of the constraint?

**Chad: ** Does this mean more specification for device selectors?

**Bernard: ** I think what it does is. However, we have decided to improve our model more or less. Jan-ivar then created a separate specification for this new model that addresses all of these issues. The tricky thing is that this is a very different model. How do people transition to new patterns when they are used to applying selectors? This may take a long time based on user habits.

WebRTC NV

One consequence of the standards debate is a reluctance to give official version names because everyone has different ideas about what constitutes a major release (i.e. 1.0, 2.0) and a minor release (i.e. 1.1, 1.2, etc.). There has also been an alternative recommendation called ORTC, which is sometimes positioned as a successor to WebRTC, and we will be in a large. WebRTC 1.0 integrates around the current specifications we discussed. Still, there’s plenty of debate about what happens next. They eventually decided to name everything that followed with a very mild, imprecise term: “The next version of WebRTC” or WebrTC-NV.

Bernard explains what that means.

**Chad: ** We talked a little bit about what we’ll see in the “next release” of WebRTC — I don’t think we’ll call it 2.0 because 1.0 isn’t done yet?

**Bernard: ** I think maybe it’s time we thought about dropping the whole term NV, because it can actually refer to two potentially very different things. One is the peer connection extensions I mentioned — such as pluggable streams, WebRTC extensions, and WebRTC SVC. My idea is that when you put all these specs together, they add up to the same level of functionality as the ORTC. As a result, we have integrated most of the ORTC object model into WebRTC PC.

Another very independent track is what I call the independent spec. This includes things like network traffic, WebCodecs, real-time network communication, etc. – these are completely independent things that do not rely on RTCPeerConnection. So it’s really a break between the next generation and the present.

Obviously, not yet. WebTransport is an original trial release. WebCodecs is the original trial version of Chrome. Now, this is very different, because many of the things you used to get as part of the overall WebRTC PC now have to be written in Web Assembly. So it’s a very, very different development model.

Something is not there. For example, WebTransport is now essentially a client server. We’ve written a peer extension, had an initial trial not long ago, but now it’s client server. Therefore, you cannot write a complete WebRTC PC use case using only existing WebCodecs and network transport.

I would say the other thing that has happened in WebRTC NV that has become very important is that there is a real focus on machine learning and access to raw media. This is not provided by ORTC. In a sense, I would say that the network transport or WebCodecs model is even lower than ORTC in this regard. ORTC does not give you direct access to the decoder and encoder. That’s what you get from WebCodecs. So I think we took the IDEA of ORTC and applied it to the lower transport layer.

What happened to ORTC?

Object Real-time transmission Control (ORTC) is an alternative model for network real-time transmission control that provides low-level control without using a software development platform. Bernard was one of its authors, and Microsoft launched the original Edge with ORTC support. We don’t hear much about ORTC anymore, so what happened to it? As Bernard just explained, most of this is absorbed into the core WebRTC standards. Is this a defeat or a victory for the ORTC vision?

**Chad: ** You are one of the authors of the original ORTC specification. Compared to your original ORTC vision, where do you see us now?

**Bernard: ** The object model is entirely in Chromium. As a result, we have almost all the objects from ORTC — Ice Transport, DTLS Transport Transport, SCTP Transport (from the data channel) — all of which are now in WebRTC PC and Chromium browsers.

RTC also has advanced features such as simulcast and SVC that we have incorporated. In addition, we have more support than the original ORTC through end-to-end encryption that can be supported through pluggable streams. So we’ve equated WebRTC PC to ORTC with the object model and all of these extensions.

The scenario we’re looking for is something like the Internet of Things that just focuses on data transfer. You can see this reflected in WebRTC and use cases — these scenarios are like peer-to-peer data exchanges.

Network transmission

WebTransport is another SPECIFICATION from the W3C and has its own working group and specification. You’ll see a lot of familiar names involving WebRTC, including Bernard.

QUIC is an improved transport protocol – something like “TCP/2” that can be used for network transport.

**Chad: ** So what is WebTransport, where does it come from, and how does it relate to WebRTC?

**Bernard: ** Network Transport is both an API and a member of the W3C Network Transport Group. It’s also a set of protocols in the IETF — a set of transports. Protocols include network traffic over QUIC, called QUIC traffic, as well as network traffic over HTTP/3 and potentially HTTP/2. So the WEB transport API in the W3C only applies to QUIC and HTTP/3. HTTP/2 is considered a failover transport and may have a separate API. That API is the client server API. Constructors and everything is very WebSocket like. In the constructor the network transport constructor, you give it a url, and then you get a network transport. But it is different because you can create reliable streams and datagrams.

**Chad: ** packets, like those used in UDP for fast but unreliable transmissions.

**Bernard: ** It is bidirectional, that is, once the network transmission is initiated by the client, but once the connection is established, the server can initiate a one-way bidirectional flow to the client, and the datagram can flow back and forth.

**Chad: ** two-way, like two-way communication?

**Bernard: ** WebSockets are really just clients. Websockets cannot be started by the server, but network traffic can. In quic-based network traffic, connections are not shared. In network traffic over HTTP/3, it can be pulled together — creating a series of very interesting scenarios, some of which restore the IETF BoF. Consider that you can do BOTH HTTP/3 request-response and network traffic, including streams and datagrams on the same QIUC connection.

Here’s a scene Justin Uberti designed for an IETF project called RIPT BOF that took people by surprise. In this case, you have an RPC-request-response going back and forth, but RPC-results in a flow from the server to the client. So think of it as a client saying, I want to play this movie, or play this game, or participate in this video conference, and then a stream that could be a reliable QUIC stream, or it could be a datagram stream from the server.

I think WebTransport has the potential to revolutionize the web. HTTP/3 itself is a revolutionary change to the Web. Much of this revolution is in the more complex version of HTTP/3 pooling transport. QUIC transmission is much simpler, it just gives me a socket that I can send things back and forth.

**Chad: **WebTransport how far away?

**Bernard: ** I would like to say that the WebTransport API is now quite complete, it has just finished its original testing, trial version ended with M88. There are a few bugs, some things don’t work very well, but the API is pretty polished. You can use it to write a rather complex sample code. I think this is because we updated the specification with actual code. So if you read the specification, you can do these things in code. Hopefully we’ll have a full example there soon that you can try out.

On the server side, there are still some QUIC interoperability issues. So I think the server that people are using is Aioquic (the Python library). You can also use Quiche as a server, but it’s not integrated into the framework. Unfortunately, we don’t have a Node.js server and it’s really nice to have it — that’s probably far away.

**Chad: ** As Bernard said, WebTransport is a client server, not peer to peer (P2P) like WebRTC. However, we have already seen a preview of P2P QUIC. In fact, Fippo wrote an article on the QUIC Data channel back in February 2019. How is it different from this new network transmission method?

**Bernard: ** That’s ORTC style. It does not support WHATWG/W3C streams and is also based on the gQUIC protocol, not the IETF’s QUIC. WebTransport — the code is in Chrome — is based on WHATWG streams as well as IETF QUIC. So the RTCQuicTransport code is very outdated because it is both an old API and an old protocol. That code has been removed from Chromium.

**Chad: ** So, how do we implement peer-to-peer WebTransport for low latency scenarios?

**Bernard: ** We have a small extension specification, which is still in ORTC CG. Basically it’s just a WebTransport, but you’re running it on RTCIceTransport instead of a URL. So to do a build, instead of giving it a URL, you give it an ICE Transport.

That’s how you set it up. There are some ORTC things that are basically extracted from RTCDtlsTransport that you can add to the equivalent. But the extension specification is only a few pages long. It’s very, very small, just like 95 percent of the network transmission specifications.

**Chad: ** Did anyone build it?

**Bernard: ** We don’t have a working version of the new API and the new QUIC library yet. A version with nothing new. One characteristic of RTCQuicTransport is that it is independent. There’s a code in Chromium today called WebRTC ICE. Imagine an ICE delivery from a network live delivery center to a PC — a stand-alone live delivery center version. When you construct an RTCQuicTransport from an RTCQuicTransport, it is not reused with your peer-to-peer connectivity component.

It’s on a separate port. Now we have to do this in the old RTCQuicTransport because gQUIC cannot be reused with RTP/RTCP STUN and TURN. IETF QUIC can be reused.

**Chad: **gQUIC is the original version of QUIC from Google. While this may sound like a big impact on IP port usage, bundling helps to restrict port usage through firewalls.

**Bernard: ** do developers want to use QUIC on the same port for all their other audio and video tools? Bundling is very, very popular in WebRTC PCS right now. Everyone pushes everything together on the same port — that’s far more than 99% of all WebRTC uses. One might think that QUIC would have similar needs. If that’s what they really want, we don’t want to build it with ORTC stylized transportation; You want to be able to build it from a PC in a network delivery center.

This is a bit odd, because now you’re saying that part of the network traffic relies on RTCPeerConnection.

RPC Settings to send media via WebTransport. Source: IETF 107 — Justin Uberti, 107-RIpt-RTC-implementation-Experiences (www.ietf.org/proceedings…

Simulcasts

WebTransport seems to be a new potential approach. But some of the thorny issues plaguing WebRTC implementations today are primarily that Simulcast is part of almost every major WebRTC videoconferencing service, has a large number of participants, and has struggled with standardization and interoperability.

**Chad: **Simulcasts how does it work?

Bernard: It’s said that in Chromium, all codecs support with Simulcast, or at least all codecs today. So, in theory, you should be able to do this using H.264, VP8 and VP9.

We’ve been looking for bugs, and we’ve had some really horrible ones, like H264 not working properly. We’ve done the full KITE test, but we still need a simple loopback test to test the basic operation, where you can send yourself a Simulcast. Eventually Fippo wrote the loopback test.

If you want to check out the test on Fippo’s “Simulcast Playground” (webrtchacks.com/a-playgroun…

**Bernard: ** This test does not pass on all browsers. It didn’t pass because you sent RID, got spoofed by SDP and received them as MID. So, essentially, if you send three streams, you can get back three streams, but they are at different mids.

Firefox does not support MID RTP header extension. So, in fact, the loopback test is invalid.

We found that whenever we wrote a test, we found something that wasn’t clear.

Let me give you another weird example. We’ve been working on hardware acceleration. It turns out that you can get different bit streams when you use hardware accelerators. Not only does it make things faster, it actually changes the bit stream of the codec, and then you can start breaking interoperability. You run a Simulcast test and suddenly the SFU can’t handle what’s going to happen. I really hope we can meet in person at one of these IETF meetings and do another Simulcast test, like Dr. Alex was able to do, and see where we are.

You know, if everyone is delivering a unified plan, we’ll be fine.

**Chad: ** Unified Plan is a new, standardized SDP format that, among other things, specifies how syndication streams should be handled in SDP. Shouldn’t a unified plan be the norm to save a day? Why didn’t we do that?

Bernard: If everyone uses the same plan for all the codecs, and everyone is happy with [interoperability testing], then you’ll know everything is fine. We’re not around yet. Let me put it this way — we’re fully functional. I think that’s true, but things are slipping in the test range. I wouldn’t say that every browser has all the functionality needed to publish a commercial application. For example, I think it’s true that there are a lot of business applications that are published across multiple browsers, but I think there are very few applications that are published across all browsers.

So one way to think about this situation (which is probably a little easier than all of these test results) is that if you do a matrix analysis of all the major conference services and all the browsers they run on, and all the different modes, you might find that it’s best to look at where we actually are.

This has not been very encouraging. It’s nice that most services support most browsers, but you’ll often see support for various features and slightly different experiences across browsers.

Status and Future of WebRTC (Part 1)

Related Posts

[Youth Training Camp] -🎨 Into Web multimedia technology

2. Basic use of Elasticsearch

Secondary wrapping of AXIos,post request sends x-www-form-urlencode style