This article focuses on documenting problems with the K8S cluster deployment process. Due to different circumstances, limited to experience, this article is for reference only. Note: This article is updated from time to time.
Source and key problems
Using Chinese university of Science and Technology sources:
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list deb http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial main EOFCopy the code
Update:
apt-get update
Copy the code
But the error:
Ign:7 http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages Get:7 http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes xenial/main amd64 Packages [31.3 kB] Err: 7 http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes xenial/main amd64 Packages Hash Sum mismatch Fetched 38.9 kB in 1 s (20.2KB /s) Reading package Lists... Done E: Failed to fetch http://mirrors.ustc.edu.cn/kubernetes/apt/dists/kubernetes-xenial/main/binary-amd64/Packages.gz Hash Sum mismatch E: Some index files failed to download. They have been ignored, or old ones used instead.Copy the code
Causes and solutions: Add key:
gpg --keyserver keyserver.ubuntu.com --recv-keys 6A030B21BA07F4FB
gpg --export --armor 6A030B21BA07F4FB | sudo apt-key add -
Copy the code
Result: failure.
Use k8S official foreign sources:
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
Copy the code
Update apt-get update will get stuck.
Using Aliyunyuan:
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
Copy the code
Add the key:
cat https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
Copy the code
If you don’t succeed, first through some method to download: packages.cloud.google.com/apt/doc/apt… Save to the current directory. To perform:
cat apt-key.gpg | sudo apt-key add -
Copy the code
Then execute apt-get update, success.
If you do not add key, update Ali cloud source error:
W: GPG error: https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6A030B21BA07F4FB
W: The repository 'https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease' is not signed.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
Copy the code
Query k8S configuration packages:
W1214 08:46:14.303158 8461 version.go:101] Could not fetch a Kubernetes version from the Internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: Request Canceled while waiting for Connection (Client.Timeout exceeded while headers) W1214 08:46:14.303772 8461 version.go:102] falling back to the local client version: V1.17.0w1214 08:46:14.304223 8461 validation.go:28] Cannot validate kube-proxy config-no validator is available for W1214 Validation. Go :28] Cannot validate kubelet config - no validator is availableCopy the code
Cause and Solution: The Internet cannot be accessed. Don’t worry, because the default version will be used based on the version executed.
Script execution
pullk8s.sh: 3: pullk8s.sh: Syntax error: "(" unexpected
Copy the code
Cause and Solution: The script must start with #! /bin/bash. If no, run the bash pullk8s.sh command.
Initialize environment kubeadm init
Tip:
[ERROR Swap]: running with swap on is not supported. Please disable swap
Copy the code
Cause and Solution: Disable swap if it is not supported.
Tip:
[ERROR Port-10250]: Port 10250 is in use
Copy the code
To stop kubelet: systemctl stop kubelet
WARNING IsDockerSystemdCheck.
[init] Using Kubernetes version: v1.17.0 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=... 'To see the stack trace of this error execute with --v=5 or higher)Copy the code
Cause and Solution: Docker uses CgroupFS, which is inconsistent with K8S. Check the:
# docker info | grep -i cgroup Cgroup Driver: cgroupfs // !!! Cgroupfs WARNING: No swap limit supportCopy the code
Docker:
systemctl stop docker
Copy the code
Change /etc/docker-daemon. json to add:
"exec-opts": ["native.cgroupdriver=systemd"]
Copy the code
Restart the docker:
systemctl start docker
Copy the code
Check the cgroup:
# docker info | grep -i cgroup
Cgroup Driver: systemd
Copy the code
Has been changed. (!!!!!! Note: Modify kubeadm configuration file:
vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Copy the code
Environment = Environment = Environment
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
Copy the code
Another specified POD source:
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1"
Copy the code
Resume:
systemctl daemon-reload
systemctl restart kubelet
Copy the code
This method is not successful in practice !!!!!!)
ERROR NumCPU:
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...
Copy the code
Cause solution: The VM CPU must have at least two cores. Change the VM CPU to two or more cores.
The runtime
View status:
kubectl get pods -n kube-system
Copy the code
Error:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Copy the code
Causes and Solutions: Not implemented:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Copy the code
The command is executed successfully.
The connection to The server 192.168.0.102:6443 was refused -did you specify The right host or port?Copy the code
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-j7lvd 0/1 CrashLoopBackOff 14 51m
coredns-6955765f44-kmhfc 0/1 CrashLoopBackOff 14 51m
etcd-ubuntu 1/1 Running 0 52m
kube-apiserver-ubuntu 1/1 Running 0 52m
kube-controller-manager-ubuntu 1/1 Running 0 52m
kube-proxy-qlhfs 1/1 Running 0 51m
kube-scheduler-ubuntu 1/1 Running 0 52m
Copy the code
You can also use kubectl get pod –all-namespaces to view all namespaces.
If the network is not set, coreDNS is in Pending state. The deployment of flannel:
kubectl apply -f kube-flannel.yml
Copy the code
Tip:
Error: Unable to recognize "kube-flannel-aliyun-0.11.0.yml": unable to recognize "kube-flannel-aliyun-0.11.0.yml": no matches for kind "DaemonSet" in version "extensions/v1beta1"Copy the code
Calico, same thing. I think it’s a naming problem. Solution: Find the file corresponding to master. Github.com/coreos/flan…
The Kube-flannel-Aliyun. yml mster and other tags use extensions/v1beta1. Kube-flannel. yml Tag was used, but the master was restored.
Before flannel was deployed,
[FATAL] plugin/loop: Loop (127.0.0.1:60825 - > : 53) detected for zone ". ", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 7805087528265218508.4857814245207702505."Copy the code
After deploying flannel,
Kube-flannel Init:ImagePullBackOff,
# kubectl logs kube-flannel-ds-amd64-n55rf -n kube-system
Error from server (BadRequest): container "kube-flannel" in pod "kube-flannel-ds-amd64-n55rf" is waiting to start: PodInitializing
Copy the code
Use Kubectl Describe Pod to view:
# kubectl describe pod kube-flannel-ds-amd64-n55rf -n kube-system ... Normal Scheduled 13m default-scheduler Successfully assigned kube-system/kube-flannel-ds-amd64-n55rf to ubuntu Normal Pulling 4m21s (x4 over 13m) kubelet, Ubuntu Pulling image "quay. IO/coreos/flannel: v0.11.0 - amd64" Warning Failed 3 m6s kubelet (x4 over 10 m), Ubuntu Failed to pull the image "quay. IO/coreos/flannel: v0.11.0 - amd64" : the RPC error: code = Unknown desc = context canceled Warning Failed 3m6s (x4 over 10m) kubelet, ubuntu Error: ErrImagePull Normal BackOff 2m38s (x7 over 10m) kubelet, Ubuntu Back - off pulling image "quay. IO/coreos/flannel: v0.11.0 - amd64" Warning Failed 2 m27s kubelet (by 8 over 10 m), ubuntu Error: ImagePullBackOffCopy the code
Cause: FLANnel: V0.11.0-AMD64 cannot be downloaded. Note that the name of quay. IO/coreos/flannel: v0.11.0 – amd64 must be right.
Coredns is now in ContainerCreating state.
# kubectl logs coredns-6955765f44-4csvn -n kube-system
Error from server (BadRequest): container "coredns" in pod "coredns-6955765f44-r96qk" is waiting to start: ContainerCreating
Copy the code
Coredns becomes CrashLoopBackOff:
# kubectl logs coredns-6955765f44-4csvn -n kube-system .:53 [INFO] plugin/reload: Running configuration MD5 = 4 e235fcc3696966e76816bcd9034ebc7 CoreDNS - 1.6.5 Linux/amd64, go1.13.4, c2fd1b2 [FATAL] plugin/loop: Loop (127.0.0.1:41252 - > : 53) detected for zone ". ", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 1746539958269975925.3391392736060997773."Copy the code
To view details:
# kubectl describe pod coredns-6955765f44-4csvn -n kube-system Name: coredns-6955765f44-r96qk Namespace: Kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: Ubuntu /192.168.0.102 Start Time: Sun, 15 Dec 2019 22:45:15 +0800 Labels: k8s-app=kube-dns pod-template-hash=6955765f44 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/coredns-6955765f44 Containers: coredns: Container ID: Image: K8s.gcr. IO/coreDNS :1.6.5 Image ID: Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qq7qf (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-qq7qf: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-qq7qf Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 7m21s (x3 over 8m32s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 6m55s default-scheduler Successfully assigned kube-system/coredns-6955765f44-r96qk to ubuntu Warning FailedCreatePodSandBox 6m52s kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9a2d45536097d22cc6b10f338b47f1789869f45f4b12f8a202aa898295dc80a4" network for pod "coredns-6955765f44-r96qk": networkPlugin cni failed to set up pod "coredns-6955765f44-r96qk_kube-system" network: failed to set bridge addr: "Cni0" already has an IP address different from 10.244.0.1/24Copy the code
After installing the Flannel, delete the pod in question:
kubectl delete pod coredns-6955765f44-4csvn -n kube-system
Copy the code
A new POD will automatically restart, but the problem remains. Check ifconfig and find cNI0. Online solutions:
Kubeadm reset systemctl stop kubelet systemctl stop docker rm -rf /var/lib/cni/ rm -rf /var/lib/kubelet/* rm -rf /etc/cni/ ifconfig cni0 down ifconfig flannel.1 down ifconfig docker0 down ip link delete cni0 Kubelet systemctl restart kubelet systemctl restart dockerCopy the code
Try and fail!
Tips for another deployment:
Warning FailedScheduling 77s (x5 over 5m53s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 76s default-scheduler Successfully assigned kube-system/coredns-9d85f5447-4jwf2 to ubuntu Warning FailedCreatePodSandBox 73s kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5c109baa51b8d97e75c6b35edf108ca4f2f56680b629140c8b477b9a8a03d97c" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 71s kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3f8c5b704fb1dc4584a2903b2ecff329e717e5c2558c9f761501fab909d32133" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Normal SandboxChanged 70s (x2 over 72s) kubelet, ubuntu Pod sandbox changed, it will be killed and re-created. Normal Pulled 29s (x4 over 69s) kubelet, Ubuntu Container image "registry.aliyuncs.com/google_containers/coredns:1.6.5" is already present on the machine Normal Created 29s (x4 over 69s) kubelet, ubuntu Created container coredns Normal Started 29s (x4 over 69s) kubelet, ubuntu Started container coredns Warning BackOff 10s (x9 over 67s) kubelet, ubuntu Back-off restarting failed containerCopy the code
–pod-network-cidr=10.244.0.0/16 –pod-network-cidr=10.244.0.0/16 Note: Wait a moment to produce this file.
Warning FailedScheduling 56m (x5 over 60m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 56m default-scheduler Successfully assigned kube-system/coredns-9d85f5447-4jwf2 to ubuntu Warning FailedCreatePodSandBox 56m kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5c109baa51b8d97e75c6b35edf108ca4f2f56680b629140c8b477b9a8a03d97c" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 55m kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3f8c5b704fb1dc4584a2903b2ecff329e717e5c2558c9f761501fab909d32133" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Normal SandboxChanged 55m (x2 over 55m) kubelet, ubuntu Pod sandbox changed, it will be killed and re-created. Normal Pulled 55m (x4 over 55m) kubelet, Ubuntu Container image "registry.aliyuncs.com/google_containers/coredns:1.6.5" is already present on the machine Normal Created 55m (x4 over 55m) kubelet, ubuntu Created container coredns Normal Started 55m (x4 over 55m) kubelet, ubuntu Started container coredns Warning BackOff 59s (x270 over 55m) kubelet, ubuntu Back-off restarting failed containerCopy the code
The log information:
.:53 [INFO] plugin/reload: Running configuration MD5 = 4 e235fcc3696966e76816bcd9034ebc7 CoreDNS - 1.6.5 Linux/amd64, go1.13.4, c2fd1b2 [FATAL] plugin/loop: Loop (127.0.0.1:48100 - > : 53) detected for zone ". ", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 639535139534040434.6569166625322327450."Copy the code
Cause and solution: The /etc/resolv.conf file is used in ConfigMap, and the DNS value in the file is 127.0.1.1. Perform:
kubectl edit cm coredns -n kube-system
Copy the code
Delete the loop field, save, and exit (vim editor). Delete all coreDNS that failed:
kubectl delete pod coredns-9d85f5447-4jwf2 -n kube-system
Copy the code
The contents of coreDNS ConfigMap are as follows:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2019-12-21T09:50:31Z"
name: coredns
namespace: kube-system
resourceVersion: "171"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: 62485b55-3de6-4dee-b24a-8440052bdb66
Copy the code
Note: the /etc/resolv.conf file can be changed to 8.8.8.8, but the file is restored to network segment 127 after manual restart. Removing the loop field resolves the problem.
Failed to join the cluster
[preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set. [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading Configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial TCP 127.0.0.1:10248: connect: connection refused.Copy the code
Reasons and solutions: speculation may be the host name and master consistent cause, but there is no evidence.
The TLS timeout
Kubectl apply -f XXX
Unable to connect to the server: net/http: TLS handshake timeout
Copy the code
Possible cause: The memory allocated by the master is too small. Increase can. (It has been added to 4GB, but still error, restart once normal)
Collected online
WARNING FileExisting-socat Socat is a network tool that k8S uses to interact with POD data.
apt-get install socat
Copy the code
The working node fails to join the child node. After the kubeadm join command is executed on the child node, a timeout error is returned as follows:
root@worker2:~# kubeadm join 192.168.56.11:6443 --token wbryr0.am1n476fgjsno6wa --discovery-token-ca-cert-hash sha256:7640582747efefe7c2d537655e428faa6275dbaff631de37822eb8fd4c054807
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s
Copy the code
Run kubeadm token create –print-join-command on the master node to generate the join command, and run the command on the work node using the new command.
After the token of the master node expires in 24 hours, you can run the following command to generate a new token:
kubeadm token list
Copy the code
Create tokens that never expire
kubeadm token create --ttl 0
Copy the code
Run the following command on the master node to query the discovery-token-ca-cert-hash value:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'Copy the code
Rejoining a Node
Kubeadm join 192.168.124.195:6443 --token 8xwg8U. lkj382k9ox58qkw9 \ --discovery-token-ca-cert-hash sha256:86291bed442dd1dcd6c26f2213208e10cab0f87763f44e0edf01fa670cd9e8bCopy the code