In recent years, the concept of cloud games is gradually understood by the majority of game enthusiasts. Cloud gaming, as the name suggests, takes advantage of powerful cloud services deployed in data centers to render game graphics. The game screen generated in the cloud is transmitted to the user’s terminal for display in real time in the form of video stream with the help of high-speed network. Users can control the game in the terminal, and experience and play the game running locally are no different. Cloud gaming services make it as easy for users to choose the games they want to play online as if they were browsing a video site. You don’t have to spend time downloading and installing, and you don’t have to worry about whether your hardware meets the requirements of the game. With relatively inexpensive mobile phones and set-top box hardware, you can enjoy the high-quality gaming experience that high-end graphics cards and consoles can provide.

Figure 1. Overview of cloud game commercialization development

However, cloud gaming is not a new thing, and related ideas have been proposed since the beginning of this century, as shown in Figure 1. In 2010, OnLive, an American startup, officially launched the first commercial cloud gaming service for the public. However, immature cloud services and a fragile web environment at the time were not enough to support OnLive’s ambitions. OnLive was acquired by SONY in 2015 after several attempts to transform its business, and its cloud gaming business was folded into SONY’s PlayStation product but not widely promoted. In recent years, with the continuous evolution of cloud computing, the popularity of optical fiber into the home, and the rollout of 5G network, cloud games ushered in an opportunity again. With The launch of Google’s Stadia cloud gaming service in 2019, cloud computing giants like Microsoft and Amazon have followed suit with their own cloud gaming products. According to the prediction of relevant consulting companies [1], the global cloud game market will grow at an annual rate of nearly 50% and reach a scale of 7 billion DOLLARS in 2027.

This paper will strip away the various products of cloud games, the evolution history of cloud games technology, challenges, and future optimization direction of a comb and summary.

History of cloud gaming technology

In essence, cloud game system can be regarded as a thin-client system that uses cloud service resources for 3D game rendering, and this kind of architecture that uses remote computing resources for complex computation and local display can be traced back to the 1980s. The X11 protocol adopted by Unix graphics display system increases network transparency at the beginning of design. Through the separation of X Server and X client, users can run an application on the remote server and display the graphical interface of the application on the local machine. Since the graphical interfaces in this period were all designed in 2d, the server transmitted 2d drawing instructions to the client, and the client needed to draw locally to generate corresponding graphical interfaces after receiving the instructions. The remote desktop [11] of Microsoft Windows system based on RDP protocol and the VNC system [10] that supports cross-platform based on RFB protocol, which are familiar to more users in the later stage, also adopt similar design.

With the development of 3d graphics rendering technology and professional graphics card hardware, new requirements are put forward for these remote display protocols which only support 2d graphics rendering. GLX extension was first proposed in X11 protocol, which packaged the OpenGL instruction for 3D graphics rendering and the X11 instruction for 2D graphics rendering in the same format and delivered to the client for 3D rendering. There are two problems with transferring draw instructions in this way:

  1. The client needs to have sufficient computing resources to perform 3d rendering. In the early days of graphics acceleration hardware being expensive, remote rendering was about allowing multiple users to share a server’s graphics card resources.
  2. The number of instructions required for 3D rendering is related to the complexity of 3d model, so when rendering a very complex 3D model, even if the final rendering image is small, a large number of instructions need to be passed in the network.

Figure 2. Architecture of GLX and VirtualGL [2]

In order to improve the problem of transferring rendering instruction scheme, the scheme of transferring rendering image came into being. VirtualGL [2] is one of them. The specific approach is to carry out 3d rendering on the server side and transfer the rendering results to the client in the form of images, while the 2d graphics rendering instructions still follow the previous X11 protocol path. This improvement eliminates the need for 3d rendering, and the amount of data that the server passes to the client is only relevant to what the client needs to display. When rendering complex 3D scenes, the amount of data transfer between server and client can be effectively controlled.

From the earliest OnLive to today’s various commercial cloud game systems also use the transmission of rendered images of the technical scheme. As shown in the system framework shown below, all 3d rendering takes place on the server side, and the resulting game images are encoded in real-time as video streams to the client. The client only needs to pass the user input game instructions to the server, while the received video stream decoding display.

Figure 3. System Framework of cloud games [3]

Although the technical solution remains the same, the current generation of cloud-based gaming systems is still vastly improved and different from OnLive a decade ago. Specifically include:

  1. Earlier OnLive only supported PC games running on x86 platforms, while modern cloud gaming systems can support mobile games developed for non-x86 platforms in addition to PC games.
  2. In terms of network transmission, OnLive only proposed that the server should be as close to the user as possible, ensuring that the physical distance between the user and the server is no more than 1600 kilometers. With the progress of transmission technology represented by WebRTC, the construction of real-time streaming media transmission networks and edge nodes, as well as the rollout of 5G and other new generation of mobile networks, the network environment facing contemporary cloud games has been greatly improved.
  3. Video coding technology has also improved from H.264 to H.265 and H.266 ten years ago, and the higher coding efficiency allows cloud gaming systems to support 4K game resolution.

Challenges

Despite the current boom in the cloud gaming industry, a huge stumbling block still stands in the way of all players, and that is the problem of cloud gaming latency. A more accurate term for cloud game delay is response delay or Interaction Latency, which is defined as the time from the time when the user sends the game control command using the input device to the time when the user sees the content changes corresponding to the game command on the terminal display device. Different game users have different latency requirements. For example, for board and puzzle games with low operational requirements, users can tolerate response delays of up to 200-300 milliseconds. For more operation requirements, the response delay should be controlled within 100 ms in general, and not more than 150 ms in the worst case. Games that require more precision, such as first-person shooters, require a response delay of 60 milliseconds or less. At the extreme end of the spectrum are virtual reality (VR) games that require the user to wear a helmet-mounted display, and because the display is so closely matched to any body movement, it is thought that the response delay needs to be under 25 milliseconds to reduce the player’s sense of vertigo.

Due to the current cloud game system generally adopts the transmission of rendering image scheme, the client completely depends on the server to render the game picture, and every game instruction must be transmitted to the server through the network for processing. OnLive provides a detailed breakdown of the components of response latency in their technical documentation (Figure 4), and this analysis is still relevant today. However, corresponding to the accurate definition of response delay described above, input device delay and output device delay are also missing in the delay composition here. For wired input devices, such as USB gamepads or mouse and keyboard, the input delay can be controlled to less than 1 millisecond. However, for a wireless gamepad using Bluetooth, or a touch screen on a phone, the input delay can take more than 10 milliseconds, depending on the client’s processing hardware. As a result, some current cloud gaming providers, such as Google’s Stadia, offer accessories for wireless gamepads that communicate directly with the game server over WiFi to avoid the latency consumed in Bluetooth communication. The output device delay is determined by the refresh rate of the display. The most common LCD display with a refresh rate of 60Hz adds an average of 8 milliseconds to the response delay for the display refresh alone. By increasing the refresh rate of the display to 120Hz, you can halve the average latency of the output device to 4 milliseconds.

Figure 4. Delay analysis of cloud gaming systems [4]

It is not difficult to see that the time spent on image rendering and video codec in the whole delay path is relatively small. With the progress of GPU and video codec hardware, its time is also decreasing, and the biggest delay bottleneck is still in network transmission. In the figure above, network delay is divided according to network architecture, while in fact network transmission delay can be further subdivided:

  • Propagation Delay: The network Delay required to send a packet end-to-end, which is measured by round-trip Time (RTT). Propagation latency is typically affected by network type, router processing speed and busyness, and physical distance from end to end. In cloud gaming systems, end-to-end physical distance is also an important factor, as a distance of 1000 km, even at the speed of light, requires a delay of 6.7 milliseconds.
  • Transmission Delay: Time required for sending all data packets containing the current content end-to-end. The sending delay is mainly determined by the size of the sent content and the actual network bandwidth. It should be noted that the transmission delay of the cloud gaming system should not be based on the average bit rate of the transmitted video, but should consider the worst case, namely the size of the maximum video frame. For example, when cloud games use a 1080p resolution and a refresh rate of 60 frames per second, IDR frames generated in many complex scenes can exceed 1MB. If you want IDR frames larger than 1MB to be sent within 16.7ms so as not to affect the sending of the next frame, you need at least 500Mbps of actual bandwidth to complete. Meanwhile, another very important problem is packet loss that may occur in network links. If the method of retransmission is used for recovery, the transmission delay needs to be increased by at least one RTT. If the method of redundancy coding is used to restore data, it will increase the amount of data normally sent and put forward higher requirements for network bandwidth.
  • Queuing Delay is a Queuing Delay that is commonly used in network transmission to smooth the impact of jitter caused by buffer queues. Queuing Delay refers to the Queuing time of transmitted data in the cache queues in the links. In the cloud gaming system, if the video is sent at a frame rate of 60 frames per second, the queuing delay increases by 16.7 ms for each additional frame in the buffer queue.

Based on the above analysis, it can be seen that the biggest challenge of modern cloud game systems is to transmit video frames of different sizes from the server to the client with the shortest delay, and in the whole game process, the transmission delay can be maintained within the maximum delay range that users can tolerate. The need for such stability, however, is at odds with the Internet’s “best effort” design. Therefore, when calculating the delay, we should not consider the best case, or the average delay, but should consider the worst case, the maximum delay can meet the user’s game needs. Especially in games where the movement is intense and precise controls are required, it is likely that one or two small freezes will affect the progress of the game and destroy the overall good will of the user for the cloud game. Although 5G networks are widely expected due to their high bandwidth and low latency, 5G is only equivalent to the User Premise Routing shown in Figure 4, theoretically reducing the transmission delay by 20-50 ms compared to 4G. In practice, there will be no essential difference between 5G users and Wi-Fi users using fiber-optic wireless routers directly connected to Gbps. There are still a series of network transmission problems to be solved, such as delay of Internet trunk lines, congestion of routers that may occur at any time, queuing and packet loss.

Another major challenge facing cloud gaming has less to do with individual users and more to do with the industry as a whole: how to reduce the cost of services without compromising the user experience. Although cloud gaming was originally intended to allow users to share expensive hardware, in practice, latency constraints meant that cloud servers had to be dispersed rather than centrally managed in a single data center to save money. Similarly, latency constraints prevent cloud servers from being multiplexed by users in different time zones to improve utilization. The cost of bandwidth increases exponentially as the resolution of a game increases. OnLive, the world’s first commercial cloud game provider, only lasted five years because it could not afford the high operating costs. And the problem of cost control does not have a qualitative breakthrough in the new generation of industry competitors. At the same time, some studies [5] point out that compared with the traditional downloading of the game to local rendering, the energy consumption of rendering in the cloud and transferring it to the local cloud game system in the form of video streaming is greatly increased. If energy consumption is converted into co2 emissions, video-streaming cloud games emit twice as much co2 as traditional games. In the context of global energy conservation, emission reduction, green environmental protection, whether to reduce energy consumption may also become the key to the healthy development of the entire cloud game industry.

Future development direction

From the above challenges to cloud gaming systems, it can be concluded that the future development of cloud gaming technology needs to address two issues:

  1. How to ensure low latency and stability of data transmission from the server to the client under the existing network architecture?
  2. How to optimize the technical solution of cloud gaming to increase the reuse of computing resources and reduce the energy consumption of the entire system?

To solve these two problems, it requires a lot of cross-field cooperation and innovation. Here we introduce some academic research achievements in recent years and how they can be integrated with commercial solutions.

The first concern is architectural innovation. Cloud game systems generally adopt the scheme of transmitting rendered images and the architecture design of thin client. In view of the continuous improvement of computing capacity of mobile hardware, devices used for clients, such as mobile phones or set-top boxes, also begin to have enough computing resources and Gpus to carry out 3D rendering and relatively complex calculation. More and more designers began to consider the evolution of thick client into thick client, which uses the computing resources of the client and the data transmitted by the server to directly generate the game picture responding to the user’s next game command locally on the client. The biggest effect of this design is to reduce the response latency of the game to a local loop on the client side, thus eliminating the influence of the network. Specific measures include the following three aspects:

  1. IBR: Image-based Rendering: Different from existing schemes, the server does not simply transmit the rendered Image to the client, but generates the Image content containing more information, such as Panorama or depth Image. Based on these contents, the client can perform image-based rendering locally and directly respond to some game commands that only change the rendering perspective [4]. For example, in a first-person shooter or VR game, where there is a lot of movement that changes perspective, this optimization can be used to reduce response latency.
  2. Foreground background separation: Needs at the application level of the game. The foreground object and the background of the scene, on the server side rendering complex background image in the form of a panoramic image (panorama) after passed to the client, and at the same time the foreground object 3 d model and rendering instructions are sent to the client, and the use of the client’s GPU resources in local rendering relatively simple prospect model, Then, it is spliced with the background image to form the final image [6]. The separation of foreground and background can support some non-perspective shifting game commands, such as controlling a character in the game to do an action or emit certain game skills.
  3. Pre-fetch and prediction execution: (on the server side to predict the future a certain period of time can be set in the worst cases of network transmission delay) for a game instructions from the client, to perform early and will be executed and prior to executing the instruction images were sent to the client, the client can choose appropriate, according to the actual issued instructions or not prefetch image for display, loop through the network so that the response delay without [7]. Such optimizations are suitable for game instructions that cannot be handled locally by the client.

The research directions listed here have not yet been applied to specific commercial cloud gaming systems, but they suggest a feasible path for future cloud gaming technology development: In other words, by exploiting the local computing power of the client, the data transmitted by the Server can be rendered and processed locally at the client side in a certain period of time (see Figure 5). Generate a screen that responds to the user’s game command within a delay that meets the game QoE requirements. The biggest advantage of this approach is that the response latency experienced by users is isolated from network traffic, giving network traffic more time to cope with unpredictable congestion and fluctuations. Of course, such a scheme would come at a heavy cost. In order to support local rendering by the client, the server needs to transmit more content to the client, including panoramic image, depth information, 3D model, forecast execution result and so on, and the server also needs to consume more computing resources to generate these auxiliary content.

Figure 5. Impact of client local processing on response latency [4]

The second area of concern is data compression. The video encoder used in cloud game system is a universal coding standard. However, the game engine rendering of the game is completely determined by the 3D model and game logic, there is a lot of room for pre-processing. Therefore, it is a promising research direction in the academic world to redesign the encoder of the special cloud game and deeply integrate it with the game engine [3]. At the same time, taking advantage of the client’s local rendering capabilities can also effectively reduce data transfer. For example, the local rendering of simplified models is proposed in [8]. This scheme requires to create a simplified 3D model in the game scene and render it locally on the client with a relatively weak GPU, while the server will render both the complete model and the simplified model and transmit the difference to the client. In this way, the client can directly display a low definition picture to the user without network influence after rendering the simplified 3D model locally. And wait until the server transmission difference image reception can be further integrated to get the original quality of the game screen. As only the difference image is transmitted, the amount of data can be greatly reduced compared with the original image.

The third direction is to improve the existing network architecture to provide transmission delay and bit rate assurance. The design of reserving network resources to guarantee network transmission quality QoS has been proposed for a long time, such as ATM network standard in the early stage, and recent relevant studies include [9]. However, how to optimize the hardware and software to guarantee the transmission delay and bit rate according to the existing network architecture is still a decisive influence on the next generation of cloud game services when constructing the special push stream network for cloud games. Combined with the optimization mentioned in the first point, if the new cloud gaming system can tolerate a large server update delay, the cloud gaming server can also serve users in different time zones beyond geographical limitations. The server resources can be multiplexed more efficiently and the overall operating cost of the system can be greatly reduced.

This paper summarizes the development history, current situation and future optimization direction of cloud game technology. Cloud games may be the most demanding and challenging application for latency and network among all cloud computing related applications. Overcoming the difficulty of cloud games is not only convenient for users to play high-quality games, but also can help expand cloud computing application scenarios, so that cloud computing deep into all aspects of the future society. The further development of cloud game technology, need more open and innovative thinking, and including game development, graphics rendering, network transmission, video coding and other aspects of the assistance and efforts!

reference

[1] Grand View Research. “Cloud Gaming Market Size, Share & Trends Analysis Report By Type (File Streaming, Video Streaming), By Device, By Gamer Type, By Region, And Segment Forecasts, 2021 – 2027”, www.grandviewresearch.com/industry-an…

[2] virtualgl.virtualgl.org/

[3] Wei Cai, Et al. “A Survey on Cloud Gaming: Future of Computer Games”, 2169-3536 2016 IEEE. Volume 4 2016

[4] S Shi, CH Hsu. “A Survey of Interactive Remote Rendering Systems “, ACM Computing Surveys (CSUR), 2015

【5】M Marsden, M Hazas, and M Broadbent. “From One Edge to the Other: Exploring Gaming’s Rising Presence on the Network.” Proceedings of the 7th International Conference on ICT for Sustainability. 2020

[6] Z Lai, et al. “Furion: Engineering high-quality immersive virtual reality on today’s mobile devices.” IEEE Transactions on Mobile Computing 19, no. 7 (2019): 1586-1602

[7] K Lee, et al. “Outatime: Using speculation to enable low-latency continuous interaction for mobile cloud gaming.” In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, 2015

[8] E Cuervo, et al. “Kahawai: High-quality mobile gaming using gpu offload.” In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 121-135. 2015

[9] T Szymanski. “An Ultra-low-latency Guaranteed – Rate Internet for Cloud Services.” IEEE/ACM Transactions on Networking 24, no. 1 (2014): 123-136

[10] T Richardson, et al. “Virtual Network Computing.” IEEE Internet Computing 2, No. 1 (1998): 33-38

【11】B Cumberland, Et al. “Microsoft Windows SERVER 4.0 Terminal Server Edition Technical Reference.” Microsoft Press, 1999