This article was first published on my blog (click here to view it).

Streaming rendering technology, different from traditional sense in the field of the front-end server rendering (SSR), refers to the machine frame render cloud, will render the complete data sent to the client, the client only responsible for broadcasting and handling and upload user input signal to the server, a technique that, Google’s cloud gaming platform is one of the use case. There are also some related solutions in the open source community. After reading Parsec’s blog post A Look at Game Streaming Tech in the Browser, Here are some notes that give you an idea of the difficulties you might encounter in the overall technology architecture, especially on the client side (in this case, the browser).

The overall process

  1. Point-to-point (more commonly known as P2P) connectivity via WebRTC technology;
  2. Sends the client configuration to the server, initializes the flow;
  3. Start to receive the video, audio, and control information from the server.
  4. Decode the Audio using the Opus Audio format and play it through the Web Audio API;
  5. Use the Media Source Extensions to tuck the video content in<video>The element;
  6. Take input events, package them in binary form and send them to the server.

network

P2P connections in browsers only rely on WebRTC. WebSockets are not suitable because of their disadvantages in NAT traversal and TCP-based congestion control. Parsec’s web client uses RTCDataChannels to communicate with the server. The RTCDataChannel is encapsulated in the STCP stream by UDP. For security reasons, the STCP stream is encapsulated by DTLS.

NAT traversal and the initial P2P connection (which later turned out to be a simple STUN ping/pong as part of the UDP perforation process) were technically complex to implement. The initial handshake requires a pre-exchange of security credentials, which is achieved by sending signals through the WebSocket.

Parsec’s native client uses its own UDP-based encapsulation BUD protocol. With an open mind, the Web client uses the default DTLS/SCTP. Although it can be used under ideal conditions, it is obviously not as robust as BUD protocol, so it may be replaced by BUD in the later stage.

video

In the browser (actually only Chrome), we load the video frame into the HTML

audio

The Audio is passed in in the original Opus encoding format, decoded through the Opus library compiled by The Web Assembly, and finally played back by the Web Audio API. Chrome will support processing MP4 / Opus via MSE after version 70, which would be a better solution, just like video push, but inside the

Input/signal

Input events (keyboard, mouse, gamepad) and arbitrary information (cursor, dialogue) are processed in their respective channels. The information is packaged in binary format and sent to the server.

Personal summary

  1. network

    The network is very important to ensure that the entire application works. In order to adapt to the high throughput and zero buffering characteristics of streaming rendering technology, existing network protocols (mainly UDP) may need to be modified. In addition, the NAT traversal problem that needs to be faced in the public network environment can be bypassed if only the LAN environment is considered in the early stage.

  2. video

    Based on Chrome’S MSE, videos are relatively easy to play on the client. You just need to be familiar with the MSE API.

  3. audio

    It can also be implemented using Chrome MSE.

  4. Input/signal

    Separate processing can be, the browser side of the common input signal almost all have support.

The browser has done a lot of work for the implementation of the Web client. In the early stage, if the main appeal is fast landing, we can consider the implementation of the Web client based on the browser.