• GCAP 2017: Scaling Multiplayer Games with Open Source
  • Scaling Dedicated Game Servers with Kubernetes: Part 3 — Scaling Up Nodes
  • Example project: paddle-soccer

In the first two articles, we looked at how to host a dedicated game server on Kubernetes and measure and limit its memory and CPU resources. In this installment, we’ll look at how we can use the CPU information from the previous article to determine when we need to expand the Kubernetes cluster because as the number of players increases, we don’t have enough room for more game servers.

Separate Apps from Game Servers

The first step we should take before we start writing code to increase the size of the Kubernetes cluster is to put our application (for example, Match Makers, The Game Server controllers and the node scalers to be written are separated into different applications – on different nodes in the cluster, not where the game server is running.

This has several benefits:

  1. The resource usage of our application now has no impact on the game server because they are on different computers. That means ifmatchmakerBring about by some reasonCPUPeak, then there is an additional hurdle to ensure that it does not unduly affect the dedicated game server that is running.
  2. This makes it easier to scale up and down the capacity of dedicated game servers — because we only need to look at game server usage for a particular node set, not all the potential containers in the entire cluster.
  3. In this case, we can use with moreCPULarge machines with a core and memory can be used to run game server nodes, or smaller machines with less kernel and memory can be used to run controller applications because they require fewer resources. We can basically choose the right machine size for the job at hand. This gives us a lot of flexibility while still being cost-effective.

Kubernetes makes it relatively easy to set up heterogeneous clusters and gives us tools to specify the scheduling location of pods in the cluster through the function of node selectors on nodes.

It’s worth noting that the more sophisticated Node Affinity feature is also available in the beta, but we won’t need it in this example, so we’ll ignore it for now.

First, we need to assign labels (a set of key-value pairs) to the nodes in the cluster. This is exactly what you see when you create Pods using Deployments and expose them using Services, except that you apply them to nodes. I use Google’s Cloud platform container engine and it uses node pool tags applied to clusters of nodes to create and build heterogeneous clusters — but you can also do similar things in other cloud providers, as well as directly through the Kubernetes API or command line client.

In this case, I add the labels ROLE :apps and ROLE :game-server to the appropriate nodes in the cluster. We can then add a nodeSelector option to the Kubernetes configuration to control which nodes the Pods in the cluster are scheduled to.

For example, here is the matchmaker application configuration, where you can see that the node selector is set to Role :apps to ensure that it only creates container instances on the application nodes (nodes marked as “apps” roles).

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: matchmaker
spec:
  replicas: 5
  template:
    metadata:
      labels:
        role: matchmaker-server
    spec:
      nodeSelector:
        role: apps # here is the node selector
      containers:
      - name: matchmaker
        image: gcr.io/soccer/matchmaker
        ports:
        - containerPort: 8080
Copy the code

Similarly, we can adjust the configuration from the previous article so that all dedicated game server PODS are scheduled only on the machines we specifically specify for them, those marked Role: game-server:

apiVersion: v1
kind: Pod
metadata:
  generateName: "game-"
spec:
  hostNetwork: true
  restartPolicy: Never
  nodeSelector:
    role: game-server # here is the node selector
  containers:
    - name: soccer-server
      image: GCR. IO/soccer/soccer - server: 0.1
      env:
        - name: SESSION_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        resources:
          limits:
            cpu: "0.1"
Copy the code

Note that in the sample code, using the Kubernetes API provides the same configuration as above, but the YAML version is easier to understand, and it’s the format we’ve been using throughout the series.

A strategy to scale up

Kubernetes on cloud providers tend to have auto-scaling features, such as Google Cloud Platform Cluster Auto-scaling, but since they are typically built for stateless applications, and our dedicated game server stores game simulations in memory, they don’t work in this case. However, it wasn’t particularly difficult to build our own custom Kubernetes cluster automatic Scalers using the tools provided by Kubernetes!

For a cloud environment, it might make more sense to scale and shrink nodes in a Kubernetes cluster because we only want to pay for the resources we need/use. If we were running on our own site, it probably wouldn’t make sense to change the size of the Kubernetes cluster, and we could run a large cluster on all the machines we own and keep them at static size, because it would cost more to add and remove physical machines than it would on the cloud, And since we own/rent computers for longer, it doesn’t necessarily save us money.

There are several potential strategies for determining when to expand the number of nodes in a cluster, but in this example, we’ll make things relatively simple:

  • Define the minimum and maximum number of nodes for the game server and make sure we stay within that limit.
  • useCPUResource capacity and utilization are used as indicators to track how many dedicated game servers can fit on a node in the cluster (in this case, we assume we always have enough memory).
  • In a cluster, define a number of game serversCPUCapacity buffer. That is, if at any moment you can’t run out of clusterCPUResource case will benWith two servers added to the cluster, more nodes are added.
  • Whenever a new dedicated game server is started, calculate whether a new node needs to be added to the cluster because of cross-nodeCPUThe capacity is less than the number of buffers.
  • As fail-safe every othernSecond, also calculate whether the new node needs to be added to the cluster, as measuredCPUCapacity resources are below the buffer.

Create a Node Scaler

Node Scaler essentially runs an event loop to execute the policies outlined above.

This is relatively easy to do using Go in combination with the native Kubernetes Go Client Library library, as you can see below in the Start() function of the node scalper.

Note that I’ve removed most of the error handling and other boilerplate files to make the event loop clearer, but here’s the original code if you’re interested.

// Start the HTTP server on the given port
func (s *Server) Start(a) error {
        
        // Access Kubernetes and return a client
        s.cs, _ = kube.ClientSet()

        // ... there be more code here ... 
        
        // Use the K8s client's watcher channels to see game server events
        gw, _ := s.newGameWatcher()
        gw.start()

        // async loop around either the tick, or the event stream
        // and then scaleNodes() if either occur.
        go func(a) {
                log.Print("[Info][Start] Starting node scaling...")
                tick := time.Tick(s.tick)

                // ^^^ MAIN EVENT LOOP HERE ^^^
                for {
                        select {
                        case <-gw.events:
                                log.Print("[Info][Scaling] Received Event, Scaling...")
                                s.scaleNodes()                          
                        case <-tick:
                                log.Printf("[Info][Scaling] Tick of %#v, Scaling...", tick)
                                s.scaleNodes()
                        }
                }
        }()
      
        // Start the HTTP server
        return errors.Wrap(s.srv.ListenAndServe(), "Error starting server")}Copy the code

For those unfamiliar with Go, let’s break it down:

  1. kube.ClientSet()— We have a little utility code that returns us oneKubernetes ClientSetWhich gives us access to the running clusterKubernetes API.
  2. gw, _ := s.newGameWatcherKuberneteswithAPIAllows you to monitor changes throughout the cluster. In this particular case, the code here returns an includeGo Channel(essentially a blocking queue) data structure, in particulargw.eventsWhenever a game is added or removed from the clusterPod, the data structure will return a value.
  3. tick := time.Tick(s.tick)– This will create anotherGo ChanneltheChannelBlock until a given time (in this case, 10 seconds), and then return a value.
  4. The main event loop is in// ^^^ MAIN EVENT LOOP HERE ^^^ ^Comments below. Is one in this code blockselectStatements. This essentially states that the system will block untilgw.events channeltick channel(every10Trigger once every second) returns a value, and then executess.scaleNodes(). This means that every time a game server is added/removed or every10Trigger once every secondscaleNodesCommand.
  5. s.scaleNodes()– Run the scale node policy outlined above.

In s.calenodes (), we use the Kubernetes API to query the CPU limit we set on each Pod and the total CPU available on each Kubernetes node in the cluster. We can see the configured CPU limits in the Pod Specification via the Rest API and the Go Client, which allows us to track the number of cpus per game server and any Kubernetes-managed pods that exist on the nodes. Through the Node Specification, Go Client can also track the CPU capacity available on each Node. In this case, we need to sum up the number of cpus occupied by Pods, subtract the number of cpus from the capacity of each node, and determine if we need to add one or more nodes to the cluster so that we can keep that buffer space for creating a new game server.

If you dig into the code in this example, you will see that we are adding new nodes to the cluster using the API on the Google Cloud Platform. The API provided for the Google Compute Engine managed instance group allows you to add (and remove) instances from the Nodepool of the Kubernetes cluster. That being said, any cloud provider will have a similar API that lets you do the same thing, and here you can see the interface we defined to abstract away the implementation details so that it can be easily modified for use with other providers.

Deploy the node zoomer

Below, you can see the deployment of YAML for node zoomers. As you can see, environment variables are used to set all configuration options, including:

  • Which nodes in the cluster should be managed
  • How much each dedicated game server requiresCPU
  • Minimum and maximum number of nodes
  • How many buffers exist at all times
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nodescaler
spec:
  replicas: 1 # only want one, to avoid race conditions
  template:
    metadata:
      labels:
        role: nodescaler-server
    spec:
      nodeSelector:
        role: apps
      strategy:
        type: Recreate
      containers:
      - name: nodescaler
        image: gcr.io/soccer/nodescaler
        env:
          - name: NODE_SELECTOR # the nodes to be managed
            value: "role=game-server"
          - name: CPU_REQUEST # how much CPU each server needs
            value: "0.1"
          - name: BUFFER_COUNT # how many servers do we need buffer for
            value: "30"
          - name: TICK # how often to tick over and recheck everything
            value: "10s"
          - name: MIN_NODE # minimum number of nodes for game servers
            value: "1"
          - name: MAX_NODE # maximum number of nodes for game servers
            value: "15"
Copy the code

You may have noticed that we set the deployment to replicas: 1. The reason we do this is that we always want to have only one active node Scaler instance in a Kubernetes cluster at any given point in time. This ensures that there won’t be more than one process in the cluster trying to expand or eventually shrink our nodes, which definitely leads to race conditions and can lead to all sorts of weird situations.

Similarly, if you want to update a node zoomer to ensure that the node zoomer is properly closed before it is created, strategy.type: Set so that Kubernetes destroys the currently running node zoomer Pod before recreating the node zoomer. The updated version also avoids any potential competition situations.

Let’s see it in action

After deploying the node zoomer, let’s trace the log and see how it works. In the video below, you can see from the log that we have the ability to start 40 dedicated game servers when there is a node in the cluster assigned to a game server and have configured the buffer requirements for 30 dedicated game servers. When we fill up the available CPU capacity via Matchmaker by running dedicated game servers, notice how the number of game servers that can be created in the remaining space drops, eventually adding a new node to maintain the buffer!

YouTube video:

  • www.youtube.com/watch?v=UzI…

I am weishao wechat: uuhells123 public number: hackers afternoon tea add my wechat (mutual learning exchange), pay attention to the public number (for more learning materials ~)Copy the code