Before the speech

Most Rancher users prefer to create custom clusters using Rancher Server. After the cluster is created, Rancher Server may fail to manage the cluster for a variety of reasons, such as deleting the Rancher Server or failing to restore backup data. The usual solution to this problem is to restart a Rancher Server and import and manage the downstream business cluster, but this can lead to “sequelae” such as the inability to continue to scale the nodes of the business cluster.

To eliminate this legacy, we can manage the “custom” clusters created by Rancher Server through RKE.

As you know, Rancher Server creates “custom” clusters through the UI, and the backend is implemented through RKE, so RKE(docs.rancher.cn/rke/) has the ability to manage… Custom cluster created by Server.

Create and manage a Kubernetes cluster with RKE, relying on 3 files:

  • Cluster. yml: RKE cluster configuration file

  • Kube_config_cluster. yml: This file contains the authentication credentials to obtain all permissions for the cluster

  • Cluster. rkeState: Kubernetes cluster status file, which contains the authentication credentials for obtaining all permissions in the cluster

So, as long as these three files are available from the downstream business cluster, you can continue to manage the downstream business cluster with the RKE binaries. The following details how to manage a “custom” cluster created by Rancher Server using RKE and extend the nodes of the cluster using RKE.

The demo environment

This article has only been tested for Rancher v2.4.x and v2.5.x, other versions may not be applicable.

For better demonstration purposes, this article will start with Rancher Server creating a “custom” cluster, then manage the “custom” cluster through RKE, and finally demonstrate adding a node through RKE to verify RKE’s ability to manage the cluster.

Rancher Server (IP-172-31-2-203) can be started using the simplest Docker Run method and created a “custom” cluster via the UI. The cluster consists of two nodes: For details, see the following table: IP-172-203 and IP-172-31-1-111.

# kubectl get nodes NAME STATUS ROLES AGE VERSION IP-172-31-1-111 Ready worker 2m2s v1.18.14 IP-172-31-2-203 Ready Controlplane etcd, worker 3 m23s v1.18.14Copy the code

RKE manages custom clusters

1. Shut down IP-172-31-8-56 to simulate a Rancher Server failure. In this case, the downstream cluster cannot be managed through Rancher Server.

2. Restore the kube_config_cluster.yml file of the downstream service cluster and run the following command on the Controlplane node:

# docker run --rm --net=host \ -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro \ --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) \ --format='{{index .RepoTags 0}}' | tail -1) \ -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap \ -n kube-system full-cluster-state \ -o json | jq  -r .data.\"full-cluster-state\" | jq \ -r .currentState.certificatesBundle.\"kube-admin\".config | sed \ -e "/ ^ [[: space:]] * server: / s_ :. * _ : \" https://127.0.0.1:6443\ "_" '\ > kubeconfig_admin. YamlCopy the code

After successfully exporting kubeconfig_admin.yaml, you can use Kubectl to continue with the downstream business cluster:

# kubectl --kubeconfig kubeconfig_admin.yaml get nodes NAME STATUS ROLES AGE VERSION ip-172-31-1-111 Ready worker 32m Ip-172-31-2-203 Ready ControlPlane, ETCD,worker 34M V1.18.14Copy the code

3. Restore the cluster.rkestate file of the downstream service cluster and run the following command on the Controlplane node:

# docker run --rm --net=host \ -v $(docker inspect kubelet \ --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro \ --entrypoint bash $(docker inspect $(docker images -q --filter=label=org.label-schema.vcs-url=https://github.com/rancher/hyperkube.git) \ --format='{{index  .RepoTags 0}}' | tail -1) \ -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml \ -n kube-system get configmap full-cluster-state \ -o json | jq -r .data.\"full-cluster-state\" | jq -r .' \ > cluster.rkestateCopy the code

4. Restore the cluster.yml file of the downstream service cluster

At present, I have not found a good way to automatically restore the file, but you can manually restore the cluster.yml file based on the recovered cluster.rkestate, because the configuration required by cluster.yml is basically available from cluster.rkestate.

Get cluster node configuration information from cluster.rkeState:

# cat cluster.rkestate | jq -r .desiredState.rkeConfig.nodes [ { "nodeName": "c-kfbjs:m-d3e75ad7a0ea", "address": "172.31.2.203" and "port", "22", "internalAddress" : "172.31.2.203", "role" : [ "etcd", "controlplane", "worker" ], "hostnameOverride": "ip-172-31-2-203", "user": "root", "sshKeyPath": "~/.ssh/id_rsa" } ]Copy the code

Write cluster.yml manually according to the node information provided by cluster.rkestate

# cat cluster.yml nodes: -address: 172.31.2.203 Hostname_override: IP-172-31-2-203 user: Ubuntu role: -controlplane-etcd-worker-address: 172.31.1.111 Hostname_override: IP-172-31-1-111 user: Ubuntu role: -worker-address: 172.31.5.186 hostname_override: IP-172-31-5-186 user: Ubuntu role: -worker kubernetes_version: V1.18.14 rancher1-1Copy the code

There are a few things to note about cluster.yml written manually above:

The ** Controlplane (IP-172-31-2-203) node is only available from the cluster.rkeState file, because there is also a worker(P-172-31-1-111) node in this example cluster, Therefore, you need to manually complete the information for the worker(P-172-31-1-111)** node.

Ip-172-31-5-186 in cluster.yaml is the new worker node, which will be used in the next demonstration of RKE.

The node information obtained from cluster.rkestate is the root user. Change the node information to the RKE user based on actual requirements.

The kubernetes_version parameter must be specified for the original cluster, otherwise the cluster will be upgraded to the latest version of Kubernetes with the RKE default.

In addition to the above methods, you can also use the following script to restore cluster.yml. Again, you need to revise the points mentioned above. The advantage of using this method is that you can restore the cluster.yml file more completely.

#! /bin/bash echo "Building cluster.yml..." echo "Working on Nodes..." echo 'nodes:' > cluster.yml cat cluster.rkestate | grep -v nodeName | jq -r .desiredState.rkeConfig.nodes | yq r - | sed  's/^/ /' | \ sed -e 's/internalAddress/internal_address/g' | \ sed -e 's/hostnameOverride/hostname_override/g' | \ sed -e 's/sshKeyPath/ssh_key_path/g' >> cluster.yml echo "" >> cluster.yml echo "Working on services..." echo 'services:' >> cluster.yml cat cluster.rkestate | jq -r .desiredState.rkeConfig.services | yq r - | sed 's/^/ /' >>  cluster.yml echo "" >> cluster.yml echo "Working on network..." echo 'network:' >> cluster.yml cat cluster.rkestate | jq -r .desiredState.rkeConfig.network | yq r - | sed 's/^/ /' >> cluster.yml echo "" >> cluster.yml echo "Working on authentication..." echo 'authentication:' >> cluster.yml cat cluster.rkestate | jq -r .desiredState.rkeConfig.authentication | yq r - | sed  's/^/ /' >> cluster.yml echo "" >> cluster.yml echo "Working on systemImages..." echo 'system_images:' >> cluster.yml cat cluster.rkestate | jq -r .desiredState.rkeConfig.systemImages | yq r - | sed 's/^/ /' >> cluster.yml echo "" >> cluster.ymlCopy the code

5. Use RKE to add nodes to the original cluster.

So far, cluster.yml and cluster.rkestate have been restored. Now you can use RKE up to add **worker(P-172-31-1-111)** nodes to the cluster.

# rke up
INFO[0000] Running RKE version: v1.2.4
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [172.31.2.203]
INFO[0000] [dialer] Setup tunnel for host [172.31.1.111]
INFO[0000] [dialer] Setup tunnel for host [172.31.5.186]
...
...
INFO[0090] [addons] no user addons defined
INFO[0090] Finished building Kubernetes cluster successfully
Copy the code

After the cluster is updated, obtain the node information again:

# kubectl --kubeconfig kubeconfig_admin.yaml get nodes NAME STATUS ROLES AGE VERSION ip-172-31-1-111 Ready worker 8m6s V1.18.14 IP-172-2-203 Ready Controlplane,etcd,worker 9m27s v1.18.14 IP-172-31-5-186 Ready worker 29s v1.18.14Copy the code

You can see that a new worker(IP-172-31-5-186) node has been added and the cluster version is still V1.18.14.

In the future, you can use RKE to continue managing custom clusters created with Rancher Server, whether they are new nodes, snapshots, or recovery. It’s almost the same as a cluster created directly through RKE.

Remember after

This article explains how to use RKE to manage a Rancher custom cluster, but it can be complicated, especially with cluster.yml configuration. If a single error occurs, the entire cluster can be updated or fail.

Hailong Wang is the Technical Manager of Rancher China community, responsible for the maintenance and operation of Rancher China Community. I have 6 years of experience in cloud computing and experienced the technical transformation from OpenStack to Kubernetes. I have rich practical experience in operation and peacekeeping no matter the underlying operating system Linux, virtual KVM or Docker container technology.