- GCAP 2017: Scaling Multiplayer Games with Open Source
- Scaling Dedicated Game Servers with Kubernetes: Part 3 — Scaling Up Nodes
- Example project: paddle-soccer
In the first two articles, we looked at how to host a dedicated game server on Kubernetes and measure and limit its memory and CPU resources. In this installment, we’ll look at how we can use the CPU information from the previous article to determine when we need to expand the Kubernetes cluster because as the number of players increases, we don’t have enough room for more game servers.
Separate Apps from Game Servers
The first step we should take before we start writing code to increase the size of the Kubernetes cluster is to put our application (for example, Match Makers, The Game Server controllers and the node scalers to be written are separated into different applications – on different nodes in the cluster, not where the game server is running.
This has several benefits:
- The resource usage of our application now has no impact on the game server because they are on different computers. That means if
matchmaker
Bring about by some reasonCPU
Peak, then there is an additional hurdle to ensure that it does not unduly affect the dedicated game server that is running. - This makes it easier to scale up and down the capacity of dedicated game servers — because we only need to look at game server usage for a particular node set, not all the potential containers in the entire cluster.
- In this case, we can use with more
CPU
Large machines with a core and memory can be used to run game server nodes, or smaller machines with less kernel and memory can be used to run controller applications because they require fewer resources. We can basically choose the right machine size for the job at hand. This gives us a lot of flexibility while still being cost-effective.
Kubernetes makes it relatively easy to set up heterogeneous clusters and gives us tools to specify the scheduling location of pods in the cluster through the function of node selectors on nodes.
It’s worth noting that the more sophisticated Node Affinity feature is also available in the beta, but we won’t need it in this example, so we’ll ignore it for now.
First, we need to assign labels (a set of key-value pairs) to the nodes in the cluster. This is exactly what you see when you create Pods using Deployments and expose them using Services, except that you apply them to nodes. I use Google’s Cloud platform container engine and it uses node pool tags applied to clusters of nodes to create and build heterogeneous clusters — but you can also do similar things in other cloud providers, as well as directly through the Kubernetes API or command line client.
In this case, I add the labels ROLE :apps and ROLE :game-server to the appropriate nodes in the cluster. We can then add a nodeSelector option to the Kubernetes configuration to control which nodes the Pods in the cluster are scheduled to.
For example, here is the matchmaker application configuration, where you can see that the node selector is set to Role :apps to ensure that it only creates container instances on the application nodes (nodes marked as “apps” roles).
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: matchmaker
spec:
replicas: 5
template:
metadata:
labels:
role: matchmaker-server
spec:
nodeSelector:
role: apps # here is the node selector
containers:
- name: matchmaker
image: gcr.io/soccer/matchmaker
ports:
- containerPort: 8080
Copy the code
Similarly, we can adjust the configuration from the previous article so that all dedicated game server PODS are scheduled only on the machines we specifically specify for them, those marked Role: game-server:
apiVersion: v1
kind: Pod
metadata:
generateName: "game-"
spec:
hostNetwork: true
restartPolicy: Never
nodeSelector:
role: game-server # here is the node selector
containers:
- name: soccer-server
image: GCR. IO/soccer/soccer - server: 0.1
env:
- name: SESSION_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
limits:
cpu: "0.1"
Copy the code
Note that in the sample code, using the Kubernetes API provides the same configuration as above, but the YAML version is easier to understand, and it’s the format we’ve been using throughout the series.
A strategy to scale up
Kubernetes on cloud providers tend to have auto-scaling features, such as Google Cloud Platform Cluster Auto-scaling, but since they are typically built for stateless applications, and our dedicated game server stores game simulations in memory, they don’t work in this case. However, it wasn’t particularly difficult to build our own custom Kubernetes cluster automatic Scalers using the tools provided by Kubernetes!
For a cloud environment, it might make more sense to scale and shrink nodes in a Kubernetes cluster because we only want to pay for the resources we need/use. If we were running on our own site, it probably wouldn’t make sense to change the size of the Kubernetes cluster, and we could run a large cluster on all the machines we own and keep them at static size, because it would cost more to add and remove physical machines than it would on the cloud, And since we own/rent computers for longer, it doesn’t necessarily save us money.
There are several potential strategies for determining when to expand the number of nodes in a cluster, but in this example, we’ll make things relatively simple:
- Define the minimum and maximum number of nodes for the game server and make sure we stay within that limit.
- use
CPU
Resource capacity and utilization are used as indicators to track how many dedicated game servers can fit on a node in the cluster (in this case, we assume we always have enough memory). - In a cluster, define a number of game servers
CPU
Capacity buffer. That is, if at any moment you can’t run out of clusterCPU
Resource case will ben
With two servers added to the cluster, more nodes are added. - Whenever a new dedicated game server is started, calculate whether a new node needs to be added to the cluster because of cross-node
CPU
The capacity is less than the number of buffers. - As fail-safe every other
n
Second, also calculate whether the new node needs to be added to the cluster, as measuredCPU
Capacity resources are below the buffer.
Create a Node Scaler
Node Scaler essentially runs an event loop to execute the policies outlined above.
This is relatively easy to do using Go in combination with the native Kubernetes Go Client Library library, as you can see below in the Start() function of the node scalper.
Note that I’ve removed most of the error handling and other boilerplate files to make the event loop clearer, but here’s the original code if you’re interested.
// Start the HTTP server on the given port
func (s *Server) Start(a) error {
// Access Kubernetes and return a client
s.cs, _ = kube.ClientSet()
// ... there be more code here ...
// Use the K8s client's watcher channels to see game server events
gw, _ := s.newGameWatcher()
gw.start()
// async loop around either the tick, or the event stream
// and then scaleNodes() if either occur.
go func(a) {
log.Print("[Info][Start] Starting node scaling...")
tick := time.Tick(s.tick)
// ^^^ MAIN EVENT LOOP HERE ^^^
for {
select {
case <-gw.events:
log.Print("[Info][Scaling] Received Event, Scaling...")
s.scaleNodes()
case <-tick:
log.Printf("[Info][Scaling] Tick of %#v, Scaling...", tick)
s.scaleNodes()
}
}
}()
// Start the HTTP server
return errors.Wrap(s.srv.ListenAndServe(), "Error starting server")}Copy the code
For those unfamiliar with Go, let’s break it down:
kube.ClientSet()
— We have a little utility code that returns us oneKubernetes ClientSet
Which gives us access to the running clusterKubernetes API
.gw, _ := s.newGameWatcher
–Kubernetes
withAPI
Allows you to monitor changes throughout the cluster. In this particular case, the code here returns an includeGo Channel
(essentially a blocking queue) data structure, in particulargw.events
Whenever a game is added or removed from the clusterPod
, the data structure will return a value.tick := time.Tick(s.tick)
– This will create anotherGo Channel
theChannel
Block until a given time (in this case, 10 seconds), and then return a value.- The main event loop is in
// ^^^ MAIN EVENT LOOP HERE ^^^ ^
Comments below. Is one in this code blockselect
Statements. This essentially states that the system will block untilgw.events channel
或tick channel
(every10
Trigger once every second) returns a value, and then executess.scaleNodes()
. This means that every time a game server is added/removed or every10
Trigger once every secondscaleNodes
Command. s.scaleNodes()
– Run the scale node policy outlined above.
In s.calenodes (), we use the Kubernetes API to query the CPU limit we set on each Pod and the total CPU available on each Kubernetes node in the cluster. We can see the configured CPU limits in the Pod Specification via the Rest API and the Go Client, which allows us to track the number of cpus per game server and any Kubernetes-managed pods that exist on the nodes. Through the Node Specification, Go Client can also track the CPU capacity available on each Node. In this case, we need to sum up the number of cpus occupied by Pods, subtract the number of cpus from the capacity of each node, and determine if we need to add one or more nodes to the cluster so that we can keep that buffer space for creating a new game server.
If you dig into the code in this example, you will see that we are adding new nodes to the cluster using the API on the Google Cloud Platform. The API provided for the Google Compute Engine managed instance group allows you to add (and remove) instances from the Nodepool of the Kubernetes cluster. That being said, any cloud provider will have a similar API that lets you do the same thing, and here you can see the interface we defined to abstract away the implementation details so that it can be easily modified for use with other providers.
Deploy the node zoomer
Below, you can see the deployment of YAML for node zoomers. As you can see, environment variables are used to set all configuration options, including:
- Which nodes in the cluster should be managed
- How much each dedicated game server requires
CPU
- Minimum and maximum number of nodes
- How many buffers exist at all times
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nodescaler
spec:
replicas: 1 # only want one, to avoid race conditions
template:
metadata:
labels:
role: nodescaler-server
spec:
nodeSelector:
role: apps
strategy:
type: Recreate
containers:
- name: nodescaler
image: gcr.io/soccer/nodescaler
env:
- name: NODE_SELECTOR # the nodes to be managed
value: "role=game-server"
- name: CPU_REQUEST # how much CPU each server needs
value: "0.1"
- name: BUFFER_COUNT # how many servers do we need buffer for
value: "30"
- name: TICK # how often to tick over and recheck everything
value: "10s"
- name: MIN_NODE # minimum number of nodes for game servers
value: "1"
- name: MAX_NODE # maximum number of nodes for game servers
value: "15"
Copy the code
You may have noticed that we set the deployment to replicas: 1. The reason we do this is that we always want to have only one active node Scaler instance in a Kubernetes cluster at any given point in time. This ensures that there won’t be more than one process in the cluster trying to expand or eventually shrink our nodes, which definitely leads to race conditions and can lead to all sorts of weird situations.
Similarly, if you want to update a node zoomer to ensure that the node zoomer is properly closed before it is created, strategy.type: Set so that Kubernetes destroys the currently running node zoomer Pod before recreating the node zoomer. The updated version also avoids any potential competition situations.
Let’s see it in action
After deploying the node zoomer, let’s trace the log and see how it works. In the video below, you can see from the log that we have the ability to start 40 dedicated game servers when there is a node in the cluster assigned to a game server and have configured the buffer requirements for 30 dedicated game servers. When we fill up the available CPU capacity via Matchmaker by running dedicated game servers, notice how the number of game servers that can be created in the remaining space drops, eventually adding a new node to maintain the buffer!
YouTube video:
- www.youtube.com/watch?v=UzI…
I am weishao wechat: uuhells123 public number: hackers afternoon tea add my wechat (mutual learning exchange), pay attention to the public number (for more learning materials ~)Copy the code