Phenomenon of the problem

Input:

kubectl logs --tail 200 -f -n xxx pod-name-xxxx.
Copy the code

Return result:

error: You must be logged in to the server (the server has asked for the client to provide credentials (get nodes))
Copy the code

The known information

  1. Kubernetes version: 1.17.2
  2. The certificate was recently updated

The screening process

See this error, very intuitive information, locked on the certificate problem. I immediately execute kubectl get Nodes. Can return node information. The problem here is that we basically locked the application is kube-Apiserver interaction with Kubelet certificate problem. Is the certificate expired? This should not be possible, not long ago I updated the certificate of master! To be serious, I log on to the master node and enter the command

kubeadm alpha certs check-expiration
ls /etc/kubernetes
Copy the code

There is no problem, the certificate has not expired, and the certificate of Kubelet has been rotated automatically. To make sure the certificate is ok, I use curl to call kubelet’s interface directly with the certificate.

curl -k --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key https://xxx.xxx.xx.xx:10250/metrics
Copy the code

No problem, monitoring metrics can be returned normally. Since there is no problem with the certificate itself, there is probably a problem with the use of apiserver. Again, with a serious attitude, enter the command:

ps -ef|grep kube-apiserver
ps -ef|grep kubelet
Copy the code

No problem, both Kube-Apiserver and Kubelet configured the correct certificate path. This is amazing, since the configuration so far seems to be ok, I start to wonder if the kube-Apiserver certificate is not valid? Simply use the restart method.

## select a kube-apiserver pod kubectl delete pod-n kube-system kube-apiserver-xxxxxCopy the code

Performed again

kubectl logs --tail 200 -f -n xxx pod-name-xxxx.
Copy the code

Invalid!! Stunned! But that’s okay, because I’ve never seen anything, you know? ! Kube -apiserver is actually static pod, I execute delete pod, is there really a restart? Check the process of kube-apiserver. As a result, the startup time has not changed! So I’m going to restart it one more time, one at a time on each master node

# # find kube - apiserver container id docker ps | grep kube apiserver # # restart kube - apiserver docker resart XXXXXCopy the code

Performed again

kubectl logs --tail 200 -f -n xxxx pod-name-xxxx
Copy the code

Perfect! Solve the problem, but don’t worry, it is important to solve the problem, but we also need to understand the whole operation process, with our expectations, and confirm the real root cause.

Cause analysis,

What are some of you wondering about here?

Doubts point a

Why is it ok to check the node information when the certificate has been updated, but not the POD log?

  1. First of all, kubectl execution commands are essentially interacted with apiserver. This process is fine. Apiserver validates your client’s information in ~/.kube/config.
  2. To view node information, you can only obtain information from the ETCD through apiserver. To view logs, you need to go through apiserver. Apiserver finds kubelet of the node where you want to view logs pod and sends a request to Kubelet. This communication process also requires certificate verification.
  3. In addition, kubelet processes logs requests and actually forwards the requests to the CRI, which will start a temporary streaming Server service, and subsequent apiserver information is obtained by interacting with the Streaming Server.

The detailed implementations of Kubectl logs and Kube Exec are similar, and you can learn more about them in this article.Cloud.tencent.com/developer/a…

Doubts point 2

Why does removing a pod using Kubectl not restart the kube-apiserver process during execution?

By default kubelet is regularly scan/etc/kubernetes/manifests, kubelet regularly scan YAML/JSON files in this folder to create/delete static Pod

Kubelet should have full control over static Pod. After the pod is created, Kubelet uploads the information to the Apiserver, and the Apiserver simply records the information. The Controller-Manager has no associated controller to manage the resource. When you delete the static Pod, kubelet will detect the information in real time. If it finds the static Pod it maintains, it will upload the information again. There is no actual control over the state of static Pods.

Kubelet maintains a Mirro Pod for static Pod in apiserver. Look at the source code for the POD update:

  1. Kubelet listens to the delete event and determines whether it is a Mirror Pod
  2. If it is mirror Pod, run the DeleteMirrorPod command to delete it and recreate it immediately.
  3. Deleting static Pod simply deletes etCD information, and immediately repeats static information.
// Create Mirror Pod for Static Pod if it doesn't already exist if kubepod.IsStaticPod(pod) { podFullName := kubecontainer.GetPodFullName(pod) deleted := false if mirrorPod ! = nil { if mirrorPod.DeletionTimestamp ! = nil || ! kl.podManager.IsMirrorPodOf(mirrorPod, pod) { // The mirror pod is semantically different from the static pod. Remove // it. The mirror pod will get recreated later. klog.Warningf("Deleting mirror pod %q because it is outdated", format.Pod(mirrorPod)) if err := kl.podManager.DeleteMirrorPod(podFullName); err ! = nil { klog.Errorf("Failed deleting mirror pod %q: %v", format.Pod(mirrorPod), err) } else { deleted = true } } } if mirrorPod == nil || deleted { node, err := kl.GetNode() if err ! = nil || node.DeletionTimestamp ! = nil { klog.V(4).Infof("No need to create a mirror pod, since node %q has been removed from the cluster", kl.nodeName) } else { klog.V(4).Infof("Creating a mirror pod for static pod %q", format.Pod(pod)) if err := kl.podManager.CreateMirrorPod(pod); err ! = nil { klog.Errorf("Failed creating a mirror pod for %q: %v", format.Pod(pod), err) } } } } ...... func (mc *basicMirrorClient) DeleteMirrorPod(podFullName string) error { if mc.apiserverClient == nil { return nil } name, namespace, err := kubecontainer.ParsePodFullName(podFullName) if err ! = nil { klog.Errorf("Failed to parse a pod full name %q", podFullName) return err } klog.V(2).Infof("Deleting a mirror pod %q", podFullName) // TODO(random-liu): Delete the mirror pod with uid precondition in mirror pod manager if err := mc.apiserverClient.CoreV1().Pods(namespace).Delete(name, metav1.NewDeleteOptions(0)); err ! = nil && ! errors.IsNotFound(err) { klog.Errorf("Failed deleting a mirror pod %q: %v", podFullName, err) } return nil } ...... func (mc *basicMirrorClient) CreateMirrorPod(pod *v1.Pod) error { if mc.apiserverClient == nil { return nil } // Make a copy of the pod. copyPod := *pod copyPod.Annotations = make(map[string]string) for k, V := range pod.Annotations {copypod. Annotations[k] = v} hash := getPodHash(pod) Determine whether static pod copyPod. Annotations [kubetypes. ConfigMirrorAnnotationKey] = hash apiPod, err := mc.apiserverClient.CoreV1().Pods(copyPod.Namespace).Create(&copyPod) if err ! = nil && errors.IsAlreadyExists(err) { // Check if the existing pod is the same as the pod we want to create. if h, ok := apiPod.Annotations[kubetypes.ConfigMirrorAnnotationKey]; ok && h == hash { return nil } } return err }Copy the code

Also share the article about How Kubelet created pod,Blog.csdn.net/hahachenche…

Doubts point three

Why does it need to be restarted to take effect? Controller-manager is not restarted, does the certificate take effect?

Since the certificate has expired, k8S scheduling is fine, but the certificate for communication between Apiserver and Kubelet does not take effect. Is there a difference between the two security authentication mechanisms? With that in mind, I looked it up.

Both controller-Manager and Schduler use client-go to call apiserver. Controller – the manager and the schduler is using kubernetes configuration file/etc/kubernetes/controller – manager. Conf, The/etc/kubernetes/scheduler. Conf for secure communications. Kubernetse certificate articles can refer to: cloud.tencent.com/developer/a…

The Apiserver invokes Kubelet by reading the configuration cache information initialized by the Apiserver, obtaining the Kubelet client certificate, and then using HTTPS for secure communication.

None of these three components has the ability to automatically load certificates, and if we update the certificates, none of them will take effect if we don’t restart the process. In this case, the controller-manager and scheduler, which happen to be both nodes restarted, can be successfully elected through the Apiserver interface. So the Scheduler and controller-Manager services are not affected either. So finally the entire K8S service is displayed, there is no problem to view information, update the application, apiserver call kubelet interface is a problem.

Returning for sure

Based on the previous analysis, the root cause was as expected: the Apiserver process did not load the latest Kubelet client certificate.

How to avoid

  1. After log monitoring is configured for each core component of the K8S, problems can be detected early
  2. Standard certificate update. Based on this experience, form a proper certificate update script and periodically update the certificate

conclusion

The article is bound to have some not rigorous place, but also hope that we contain, we absorb the essence (if any), to its dregs. If you are interested, you can close my public account: Gungunxi. My wechat id is lCOMedy2021