The text/Will Law
Organizing/LiveVideoStack
Hi, my name is Will Law and I’m currently working in the Office of Akamai Technologies in San Francisco. I live and work very close to the Golden Gate Bridge, which is where the picture is. It has always been an honor to communicate with engineers in the United States and China, especially to discuss next-generation Web protocols of global importance. The topic I want to introduce today is about WebTransport.
Without further ado, let’s begin. Why do we need a new agreement in the first place? Why don’t the HTTP1, HTTP2, HTTP3, or WebSocket protocols meet our needs? I’ll introduce the current problems with these protocols and introduce what WebTransport is; What parts it includes; And what a network stack is; In addition, I will introduce the draft Web browser Application Interfaces (APIS) proposed by the W3C. Finally, we will discuss how to participate in the development of the protocol.
Before we begin, I would like to thank Jeff Posnick, Victor Vasilief, Peter Thatcher, Yutaka Hirano and Bernard Aboba for their data and content for this presentation. They have been an integral part of the development of WebTransport, especially in shaping the community draft, so I would like to thank them for the information they provided.
Let’s begin today’s introduction, suppose we want to design a game, you are the architect, we hope that you come up with as many as possible usage scenarios, and we have to consider the first scenario is based on the Web or multiplayer games, the host in these games, instructions from the client sends to cloud-based servers, some of the instructions is sensitive to time property, For example, your position and role in the game. Some commands are more state related, such as your image or weapon.
So, we don’t mind the loss of stateful instructions, but those that are time-sensitive must be sent in time. In both cases the data flow is two-way, because you need to send the location to the game, and the game needs to send you the current state of the game and the location of all the other players. So, we can use RESTful apis (which in fact most of the time we do), we can use HTTPS for security, we can use WebSockets protocol or WebRTC data channels, or we can use our own UDP (User Datagram Protocol) for transport.
All four are used in different games and scenarios. However, a closer look at these protocols reveals some problems with each approach, so how can we improve it?
The second example is low-latency live streaming. Typical scenarios include a one-to-many one-way live stream of sporting events, news, and entertainment shows. We want video graphics to support HD, high frame rates, high dynamic range, wide color gamut, and DRM, but many of these are not available. WebRTC (Instant Messaging on the Web) does not support these features today. Sometimes we might also want to have many-to-many video chats, such as web meetings using Zoom, Apple Facetime, or Google Meet. In this case we can use one-way broadcasting, using Chunked encoding via H1 or H2, which is what most people use these days; We can also use WebSockets Protocol, WebRTC data channel to send our media fragments; Also, we can use our own UDP private protocol for transport.
So those are two detailed examples, and we can continue to find similar examples, for example, if we want to do simultaneous interpretation, we may use AI technology, which I think is the direction of the future of online meetings. Or if you ask us to do instant translation, we need to upload the audio to the cloud quickly. I’ve covered security camera analysis, MASSIVELY multiplayer online games, and console games in detail before, but the principle of cloud gaming is to render the actual game in real time in the edge encoder and send the game content so that the low-end client gets the same visual effect as the high-end console. Google’s Stadia cloud gaming platform is a good example of this, but it also requires two-way game commands. We already have server-based video conferencing systems, but we wanted to simplify meetings, and WebRTC exposed a lot of private information during session establishment, which we wanted to avoid. In terms of remote desktop management, which is an enterprise scenario where you have people using iot sensors and data analytics, we can use small power sensitive iot devices to very efficiently send small amounts of data to the cloud. How does this work? Are RESTful apis the best choice for them? The final option is the PubSub (Publish/subscribe) model. We can probably avoid using long polling code in many applications today.
So, taking these cases together, we see some core requirements. First of all, we want to inherit the security protection technology of the modern Web. In other words, we need TLS (Secure Transport Layer Protocol) encryption, we want some type of congestion control and CORS (Cross-domain resource sharing), we still want client-server architecture, and we don’t want it to be built on a P2P model because it’s difficult to start a P2P connection architecture session. We also want to use two-way communication in most applications. We need to send reliable and ordered data, which we call “streaming.” Streams follow a first-in, first-out pattern, so nothing is lost in the process.
We also hope that with the minimum delay to implement the flow, but at the same time we also need to send a reliable and disorder of data message, this is very similar and UDP packet, they are all small packets, the key lies in the speed of transmission, if too slow, some of the data can be lost, but as long as we are able to achieve high-speed transmission, in order to solve the problem. We also need to provide constant feedback to the sender, which you can think of as counter-pressure. We can’t just aimlessly send and receive data that we can’t handle. And they should use URIs for resource location, because URIs and urls on the Web are the backbone of how we locate content on the Internet, so we don’t want to change that mechanism, we want something that fits the URI mechanism.
Now, let’s review what are the problems with existing Web technologies? For example, RESTful apis (Presentation layer state transition Application Interfaces) are relatively slow to establish connections, especially if H1 or H2 is used. When using TLS (Secure Transport Layer Protocol), Http/1 is the slowest, Http/2 is second, and Http/3 is far and away the best. They always provide reliable lossless transmission, which is important in many areas where retransmission is required to add latency. Even with HPACK and QPACK compression, HTTP header information adds an additional burden, and when our data payloads are small, headers still account for a large portion. As we mentioned, in many cases we don’t need it, and we’re willing to sacrifice some reliability for speed.
The main problem with WebSockets is queue head blocking, which I’ll explain in more detail later, which is a major obstacle to promoting the use of the WebSocket protocol, but WebSockets provide reliable delivery at all times. The problem with WebRTC data channels is that the burden of establishing connections is high, as we’ll explain later.
You could develop your own UDP protocol to address these issues, but this would introduce the new problem of having to install the SDK on every client and every server in order to communicate with the protocol. But you lose control, because the advantage of Web standards is that your Web servers and clients can use your language.
In addition, we can’t use Chunked encoded media fragments as this would cause a throughput delay (which currently increases by one or two seconds) due to the slower connection speed of HTTPS. There is also an RTT (round-trip time) delay for each segment, so existing network technology is not what we need.
Let’s look at some of the details of queue head blocking, since WebSocket is probably our preferred solution. So normally, we will see first in line with jam Shared by multiple streams of a single data queue, so I can compare it to the red and blue car driving on a road, suppose now my red car is going to turn left at the crossing, a blue car to continue driving, as long as the road without congestion, everything will be in accordance with what we expected.
But queue head blocking occurs if there is a queue of packets waiting to be transmitted and the packets at the head of the queue cannot move forward. You can think of it here as the red car trying to turn left, but the car for some reason can’t keep going, and now all the other flows in my single-channel stream will pile up behind the blocking data, so my blue car and my red car can’t drive, and the lane is blocked.
There are several solutions to queue head blocking, one of which is to make each output forwarding queue independent in parallel. So in the road analogy, that means we widen the road to bring in another lane. Let’s say our red car is now blocking their lane, and the blue car has another lane to drive on, but because the first packet is blocked there, the red queue is still blocked, and the blue stream remains independent and unblocked.
So why not use the existing Web protocol WebRTC? And this chart explains why, because it’s a very complicated protocol. It was originally built as a P2P communication protocol and required the Session Description protocol for SDP messaging before establishing a connection. In the client-server communication model we do not need to do this because the server also needs to be addressed by the client.
The ICE, DTL and SCTP protocols also required by WebRTC are usually not deployed on a large scale in CDN. So WebRTC can be used in some cases, but WebRTC is not specifically designed for applications with a server-client model.
Why not just use UDP? Because UDP is not secure, it does not inherit the Web security model, lacks encryption and congestion control, and lacks CORS and send acknowledgements.
If you look at the QUIC protocol, you’ll see that it actually meets many of our requirements. In fact, we now know that it’s probably the best option, so it has a fast connection setting of 1-round trip or 0-round trip if the client and server have successfully shaken hands before.
At the same time, QUIC is also very secure. It always uses TLS1.3, has low delay congestion control and reliable flow, and can realize disorderly data packets identified as 1 and 2. If needed, it can be used in p2p scenarios as the basis for H3. Therefore, it is widely deployed throughout the Internet, especially in CDM. Now that IETF is about to launch a standard version, we’ll have excellent global QUIC support in HTTP/3.
So, we hope that QUIC meets most of our criteria, because the other protocols have more or less minor issues. Therefore, developers should develop new transport options for the network that address the specific issues I just talked about. WebTransport was created by engineers, used by engineers, and named by engineers, so we just stick with WebTransport.
WebTransport is a protocol called a framework that enables clients to communicate with remote servers under a secure model and uses secure multiplexing transport technology. Note that WebTransport is a framework, and underneath it is the actual protocol. The framework provides a consistent interface, so it consists of a set of protocols that can be securely exposed to untrusted applications, and a model that allows them to be used interchangeably. It also provides a flexible set of features that provide us with reliable one-way and two-way traffic as well as unreliable data packet transmission.
So what does WebTransport do? I mentioned unidirectional stream earlier, which is actually an infinite byte stream in one direction. When the receiver can’t read them fast enough, it puts counter-pressure on the sender, similar to the live video I’m giving you right now.
Normal sequential messaging follows the first-in, first-out FIFO principle, with the first in also coming out from the other end, while out-of-order messaging can have multiple concurrent flows by adding messages to each stream. A two-way flow is just a pair of one-way flows, each in opposite directions, so it’s useful for sending information in cases where I expect a response. As we’ve discussed in many of the cases we’ve used, this requirement is so common that data packets are small, unordered, and unreliable messages that typically hold less than 1MTU (about 1500 bytes), depending on the network configuration. This is useful for sending small updates that might get lost, but my application can handle those losses and transport can focus on getting packets from the server to the client as quickly as possible.
So remember that WebTransport is a transport framework, what are the recommended transport protocols included in WebTransport? As I mentioned, QUIC seems to be a good candidate, starting with dedicated QUIC transport, plus HTTP3Transport. We have an alternate mechanism called HTTP2Transport.
Let’s see what the network stack looks like. Innocuous stacks on HTTP1/2 really drive today’s global Internet revolution. In a real case, back in 2001, Akamai’s entire network bandwidth was 1Gbps, while today the figure is 160+Tbps. Without changing the agreement, the speed increased by more than 100,000 times. We still deploy HTTP1block today, and we deploy HTTP2 today, but this simple protocol stack can support traffic scaling up to 150,000 times, so it’s a good design. HTTP/3, which is built on TOP of QUIC, has now improved on this and removed features that were not available or flexible over TCP in favor of using UDP directly. Websockets are built on top of TCP and form a more complex protocol stack. So these four protocols cover 99.9% of the traffic on the Internet today.
So what might WebTransport look like? As we mentioned, it is primarily built on UDP and QUIC. WebTransport is our framework under which we have several transport types: QUICTransport, HTTP3Transport, and HTTP2Transport. It is worth noting that both HTTP3Transport and QUICTransport provide both reliable streams and unreliable data packets.
As of November 2020, these conclusions are still in dispute, possibly because QUICTransport is not being developed in the direction of HTTP, or both are continuing to be developed and therefore not in sync.
Let’s compare HTTP3Transport with QUICTransport. The main difference with HTTP3Transport is that it is in a pool with other HTTP3 traffic. Suppose I have a terminal and I am running a number of applications that are communicating with the server as a common host, then all HTTP/3 traffic will be shared to this link. HTTP/3 inherits many of our favorite HTTP features, such as load balancing, header information, and comprehensive firewall and proxy support. So while your firewall will understand HTTP/3 when it sees it, it may not understand QUICTransport. The applications discussed here are regular Web applications, Web chat applications, and the Pub/Sub subscription model.
QUICTransport, on the other hand, is a dedicated connection between a client and a server that can optimize the client’s transport and expose more statistics to the client without relying on HTTP/3. It also inherits the extensibility features of HTTP. What we really care about here is speed or performance in applications such as online video games and real-time media.
Either HTTP3 or QUIC can satisfy certain Web application scenarios and these protocols are still evolving, so you can debate whether you can use either to solve most of these cases. In fact, there is no best way, only the best choice based on the current environment.
TCP is also interesting as a fallback plan, so what if QUIC is blocked? We can fall back to WebSocket. It can be implemented on browsers that do not support WebTransport at all. HTTP2Transport is considered a natural fallback to HTTP3Transport. So either QUICTransport can fall back to WebSocket, or there is no way back.
So what does the QUICTransport URI scheme look like? If you’re familiar with how the Web works, you’ll have a good idea. We have the protocol descriptor QUIC/DASH Transport, in which case we send the host server test as part of the SNI along with the port number, and the remaining URI scheme is the page being transferred. After TLS is established, the client will receive it.
You might want to see an example handshake from QUICTransport. We can open a QUICTransport connection, but there are some changes in the section that I’ll describe in detail later. We want the QUICTransport to transfer to the port of this Server. The browser will send a Hello to the Server on this port with the ALPN list “wq”. The Server will send back a Hello to the Server with “wq” in it. The browser receives Server Hello and sends a packet on the QUIC with the data stream and FIN (close connection), so it sends a very short stream. The server receives the stream and receives the sending source, and now the application can send and receive both stream and data packets.
HT3Transport has a completely new transport using the protocol HTTPS. When creating a new HTTP/3 connection, it uses the existing connection pool. It sends a connection request with a specific header, which of course is not an innovation in HTTP/3. The server responds with 200 OK and sets the “sessionID” header to 1. Peer clients and servers can now send streams and data packets because it is bidirectionally associated by ID, and once the associated CONNECT flow is closed, the session is closed.
Now let’s compare QUICTransport to WebSocket. The main difference is that queue head blocking always affects WebSocket, but queue head blocking only affects the same streams in QUICTransport, which gives you more overall reliability, while QUICTransport provides reliable stream traffic as well as unreliable data packets. In fact, you can cancel an ongoing stream, and there is no difference in the fact that TLS and the source trust model are the same. To prevent cross-protocol attacks, you can use a Sha-1 based WebSocket handshake, and ALPN should be used on QUICTransport. To prevent middleware clutter, WebSocket uses an XOR-based mask mechanism, while QUICTransport is always encrypted with TLS1.3. WebSocket authentication can be enabled through cookies, but for QUICTransport applications, the application itself must provide some means to implement authentication. Therefore, There is a great difference between QUIC and WebSocket. QUICTransport has queue head blocking and partial reliability problems.
So which teams are updating WebTransport? The first, of course, is the IETF, which defines HTTP/1, HTTP/2, and every RFC you might use. This is a new team, set up on March 6, 2020, so this is a new project of the IETF, and you can read their proposal through the link shown here.
Who is updating the browser-based API? The W3C has established a WebTransport Working Group, which was established in September 2020. You can go to the WebTransport Working Group home page here. I’m actually one of the co-leaders of this group with Jan Ivar Bruaroey from Mozilla, under a two-year contract. We have a draft API for the WICG commitment, so we hope that our work can move forward quickly and that we can formalize it as a standard two years from now.
So let’s take a look at some code examples from the incubation draft, and if you’re used to writing JavaScript type scripts, the syntax used here is probably not too complicated for you. Let’s assume that there is a function here to get the existing serialized game state, and to illustrate our transport with a concrete example, we just need to illustrate the new WebTransport. Remember that HTTP3Transport and QUICTransport have been changed to a simple new WebTransport protocol layer to find the transport type to be used. We write writable objects in the message at the transport layer to establish the message content. Then, we can simply use datagrams to write the information we get from us to implement the game state. All of this can be done simply.
This is an example of code that uses QUIC unidirectional flow to send reliable game state to the server. We implement our transmission in exactly the same way. But now we need to wait until the stream returns. So once the flow is established, we create a one-way flow from the flow. We can get content from the writable component again, and with this information we can write down the information we can send, and then close the writer. So this is a very simple way to send data along a one-way stream.
Let’s look at the third code example I want to show you, which is essentially sending request data over HTTP and reliably receiving unordered media files over the same network connection. Since this is video, we will cover media sources here. We set up our own media source, after which we connect to the source buffer (that is, initialize the buffer) and set up a new WebTransport here, and then wait for the incoming one-way receive stream to be set up, on which we wait for the readable object to be set up, This simply attaches the buffer (in this case, the readable component that receives the stream) directly to the specified buffer on the source buffer and pipes it. This provides a very clear interface to receive the data stream, which I put into MCS (Modulation and Encoding Strategy) and render in the video element of the web page.
Now the API is still flowing, and as I mentioned in the previous example, you can either use the QUICTransport or HTTP3Transpaort transport that changed a while ago, or you can set up a new WebTransport, The transport type you use is defined by the protocol in the URI used when initializing WebTransport.
Google has already done a lot of work on WebTransport. QUICTransport is currently testing source code for Chrome84, and some Python code uses AIOQUIC. If I remember correctly, for the QUIC part, you can download the project from GitHub, as well as a client that you can communicate with. Adam Yutaka and Victor also did a demo involving Chrome 87+. You can visit this page and test it here. Keep an eye out for the next screen shot. Please note that you must enable the experimental Web platform feature in Chrome Canary or it won’t work.
This is my Chrome Canary browser and I’m verifying that the experimental Web platform feature is enabled. We load the page, and then I click connect to their publicly hosted server, and it says connect ready datagrams writer is ready. In the console here, I need to confirm window.com transmission shows that we have established an object, and you can see here publicly accessible interface, now I can send data packets, when we send them a simple server is back to send the number of bytes of data packets sent, for the uniflow also can send, The server sends back the bytes received on the stream and closes the stream and the same bidirectional stream. All my components are right here. We now need to build use cases involving unreliable, disordered data packets and ordered streams that are unreliable and running. From there, you can start building the application and really test whether it works, which is a very exciting test.
How can you participate in the test? IETF is a global organization and you can participate in IETF events for free. You only need to pay for offline events. There is a public mailing list that you can subscribe to for free, and I’ve listed it to keep you abreast of developments in WebTransport in IETF. Although membership is required, the W3C provides a publicly accessible public mailing list online, so please feel free to contact us if you need any registration information. You can also visit the website shown, where a small group of people are developing these specifications, and you are welcome to provide valuable input if you have a usage scenario.
Here are some useful links that provide background information on WebTransport.
Thank you for letting me go into detail about WebTransport in such a short time. I am very glad to talk with you. If you have any questions about WebTransport or would like to join the W3C WebTransport Working Group, please feel free to contact me. Thank you again for taking time out of your busy schedule to participate in our activities. Good bye!