Background:
The online Kubernetes environment is built using Kubeadm. It was probably built by Kubeadm on 1.15. It has been running steadily for nearly two years. There was a major upgrade from 1.15 to 1.16. Several minor upgrades have been made. The current version is 1.16.15. I tried to upgrade to a higher version, but something happened when I upgraded the master cluster. Fortunately, it was a three-node master cluster, so I reverted to version 1.16. Upgrades to higher versions have not been made. Yesterday was finally the cluster began to upgrade……
The cluster configuration
The host name | system | ip |
---|---|---|
k8s-vip | slb | 10.0.0.37 |
k8s-master-01 | centos7 | 10.0.0.41 |
k8s-master-02 | centos7 | 10.0.0.34 |
k8s-master-03 | centos7 | 10.0.0.26 |
k8s-node-01 | centos7 | 10.0.0.36 |
k8s-node-02 | centos7 | 10.0.0.83 |
k8s-node-03 | centos7 | 10.0.0.40 |
k8s-node-04 | centos7 | 10.0.0.49 |
k8s-node-05 | centos7 | 10.0.0.45 |
k8s-node-06 | centos7 | 10.0.0.18 |
The Kubernetes upgrade process
1. Refer to official documentation
With reference to:https://kubernetes.io/zh/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
https://v1-17.docs.kubernetes.io/zh/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
The Kubernetes cluster created by Kubeadm was upgraded from version 1.16.x to 1.17.x, and from version 1.17.x to 1.17.y, where y > x
2. Confirm the upgradable version and upgrade plan
yum list --showduplicates kubeadm --disableexcludes=kubernetes
Since my version of Kubeadm is 1.16.15, I have to upgrade to 1.17.15 and then from 1.17.15 to 1.17.17 (not to mention upgraders). The master node has three nodes: K8S-master-01, K8S-master-02, K8S-master-03. Personally, I don’t like to move the first node first, so I directly start from the third node (sh-master-03)……
3. Upgrade the K8S-master-03 node control plane
Yum install kubeadm 1.17.15-0 - disableexcludes = kubernetes
sudo kubeadm upgrade plan
Can you upgrade to 1.17.17? Have a try
Kubeadm upgrade the apply v1.17.17
Can’t upgrade to 1.17.17 but can upgrade to 1.17.16? But first upgrade Kubeadm. How could that be? 1. Y to 1. Y +1 versions
Kubeadm upgrade the apply v1.17.15
Yum install -y kubelet-1.17.15-0 kubectl-1.17.15-0 --disableexcludes=kubernetes
systemctl daemon-reload
sudo systemctl restart kubelet
Well, I still can’t figure out why we’re still at version 1.17.16, so it doesn’t matter. That’s it for now!
/etc/kubernetes: /etc/kubernetes: /etc/kubernetes: /etc/kubernetes: /etc/kubernetes: /etc/kubernetes
4. Upgrade other control plane (K8S-master-02 K8S-master-03)
Each of the other two master nodes executes the following command:
Yum install -y kubeadm-1.17.15-0 --disableexcludes=kubernetes kubeadm upgrade node yum install -y kubelet-1.17.15-0, lower lower class, lower lower class Kubectl-1.17.15-0 --disableexcludes=kubernetes systemctl daemon-reload sudo systemctl restart kubelet
Log in to any master node:
[root@k8s-master-03 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-01 Ready master 297d v1.17.15 K8S-Master-02 Ready Master 297D v1.17.15 K8S-Master-03 Ready Master 297D v1.17.16-rc.0 K8S-Node-01 Ready Node 549D V1.16.15k8s-node-02 Ready node 2d5h v1.16.15k8s-node-03 Ready node 549d v1.16.15k8s-node-04 Ready node 547d v1.16.15 K8s-node-05 Ready node 547d v1.16.15 k8s-node-06 Ready node 192D v1.16.15 test-ubuntu-01 Ready,SchedulingDisabled <none> 47H v1.16.15TM-node-002 Ready node 154D v1.16.15TM-node-003 Ready <none> 99D v1.16.15
5. Continue to upgrade the minor version to 1.17.17
Similarly, repeat the steps above to upgrade the minor version to 1.17.17
kubectl get nodes -o wide
Note: The above mast-02 mast-03 control plane node upgrade ignores the vacated node step
6. Work node upgrade
Note: The demonstration is performed on the k8s-node-03 node
Yum install kubeadm-1.17.17 kubectl-1.17.17 kubelet-1.17.17 --disableexcludes=kubernetes
Make the node undispatchable and empty the node:
kubectl drain k8s-node-03 --ignore-daemonsets
kubeadm upgrade node
sudo systemctl daemon-reload
sudo systemctl restart kubelet
kubectl uncordon k8s-node-03
kubectl get nodes -o wide
Note: the back cut of the figure has already been ignored, look at the results…… , just upgrade a few nodes first. Other nodes have time to upgrade, and it is estimated that they are similar. If there is any abnormality, they will be sorted out and analyzed again.
7. Some minor problems occurred during upgrade and use
1. clusterrole
There are still some minor exceptions, such as: the clusterrole system of my controller-manager: the permissions of kube-controller-manager are not valid.
I don’t know if it was because only two nodes were upgraded, K8S-master-01. When this problem occurred, I upgraded the k8s-master-01 node, deleted the clusterrole system:kube-controller-manager, and applied my clusterrole of kubernetes1.21
kubectl get clusterrole system:kube-controller-manager -o yaml > 1.yaml
kubectl get clusterrole system:kube-controller-manager -o yaml >clusterrole.yaml
kubectl apply -f 1.yaml
Anyway seems to be solved…. Clusterrole takes a look at it and takes a look at it. Recently anyway is a little muddled……
2. Abnormal Flannel
Here’s another question from Flannel:
My cluster was updated from 1.15. There are always no problems, but with the new work node, the probe assigned to the new node will have all kinds of weird problems. Either the probe will not pass, or the reboot will have all kinds of problems….. What’s going on?
Doubts about Flannel. Go to the GitHub Flannel repository and take a look:
Look at the Flannel version of my cluster and it’s still v0.11. Please upgrade the Flannel……
Kubectl delete-f xxx. yaml(old Flannel plugin configuration file)
Official download kube-flannel. Yaml file
Change the network
Note: Of course, if it is still 1.16, it is necessary to modify the apiversion of RBAC.
kubectl apply -f kube-flannel.yaml
After the modification, there is basically no previous probe failure restart phenomenon.
3. Error from Prometheus
The Kubernetes version corresponds to the Prometheus version:
Well mine is the early 0.4 branch, and Kubernetes1.17 works. However, the alarm of the control-manager scheduler is abnormal…… With reference tohttps://duiniwukenaihe.github.io/2021/05/14/Kubernetes-1.20.5-upgrade1.21.0%E5%90%8E%E9%81%97%E7%97%87/Modify the kube-controller-manage kube-scheduler configuration file. Of course. If I upgrade to 1.18 or later, I will be irritated by….. Switching branches for reconfiguration or what?
Postscript:
- Try to refer to official documentation
- Remember to upgrade your network components
- If the API changes, remember to modify the associated component version or configuration file