Phenomenon of the problem

The Kubernetes cluster receives an alarm indicating that the CPU usage of Kubelet is high. The alarm is cleared later. And then the alarm, and it happens again and again.

The known information

  1. Kubebernetes 1.17.2
  2. Kubelet 1.17.2
  3. CPU overutilization is not persistent, and the initial suspicion here is that the CPU overutilization may be triggered periodically by a function.

Troubleshoot problems

To start with a routine, log in to the server and do the following:

# # find process pid ps - ef | grep kubelet # # check the process status, top - p # # XXX every 1 seconds to check the process status, a total of 10 times the data pidstat 1 to 10Copy the code

Finally, the following results are obtained:

It is found that the CPU usage of Kubelet is mainly used in user mode. So let’s see what’s going on with this Kubelet?

Refer to the online diagnosis document (www.cnblogs.com/zerchin/p/k… I did the following:

Strace -cp 7799Copy the code

Futex takes a lot of time, and epoll_ctl has a lot of errors. Let’s focus on time usage, which is more related to CPU usage this time. The number of errors, we’ll see later.

Continue to perform

Strace - tt-p <pid>Copy the code

This seems to be a similar problem to many articles on the Internet, but with a serious attitude, we will continue to look at it. Let’s use the tools of Perf and FlameGraph to look at the CPU time consumed by specific functions.

perf record -F 99 -p 391280 -g -- sleep 30
perf script -i perf.data &> perf.unfold
git clone https://github.com/brendangregg/FlameGraph.git
cp perf.unfold FlameGraph/
cd FlameGraph/
./stackcollapse-perf.pl perf.unfold &> perf.folded
./flamegraph.pl perf.folded > perf_kubelet.svg
Copy the code

After running the above command, we take a look at the results of the flame diagram. It seems that there are many file-related functions occupying time. Initially, we suspect that it is related to the file system on the host. Don’t worry, let’s continue to read. Execute kubelet’s flame map generation command.

Kubectl proxy --address='0.0.0.0' --accept-hosts='^*$' ## Create a go environment Docker Run on any client that can access k8s -d --name golang-env --net host golang:latest sleep 3600 Go tool pprof-seconds =60 -raw-output =kubelet.pprof http://APIserver:8001/api/v1/nodes/ < node_name > / proxy/debug/pprof/profile # # into flame figure. / stackcollapse - go. Pl go_kubelet.pprof > go_kubelet.out ./flamegraph.pl go_kubelet.out > go_kubelet.svgCopy the code

From the analysis of the flame diagram, it was found that The fs.GetDirUsage of Kubelet caused it. So that’s consistent with my guess. So why is kubelet so CPU high because of this function?

The problem here, probably led to this situation I have guessed. The message here is that we use a tool called Dubbo-Minitor for microservices, which produces a large number of chart files. I also used Kubectl describe nod-name to see which pods were running on the current node before I started to troubleshoot the problem.

Next I find the path to the volume of the container that corresponds to dubbo-monitor and run the directory traversal command. It did end up taking a long time.

Doubts point

Why is cAdvisor’s fs.GetDirUsage CPU intensive and how is it implemented?

I looked at the source code for Kubenetes -1.20 and found that this method was not included in the introduced CAdvisor. I then went straight to the CAdvisor fs.go commit record and found that the GetDirUsage code had been modified on October 13, 2016. The actual implementation was the du command.The du command calls fstat to get the size of each file. Its data is obtained based on files, so it has a lot of flexibility. It does not have to be specific to one partition, but can operate across multiple partitions. If you are targeting a directory with a lot of files, du will be very slow.

Why is kubelet CPU usage high and low?

This is where we look at Kubelet’s mechanism for checking filesystem usage. By default, kubelet’s — nodes-status-update-frequency configuration specifies the interval at which Kubelet reports the status of the node to the main control node. The default is 10 seconds. This means that a file system usage check is triggered every 10 seconds. The cAdvisor function also sets a timeout, which defaults to 60s.

The solution

Now that the cause of the problem has been found, how can we solve it?

I don’t actually have a good way to solve this problem, but I’m trying to solve it. I used LocalVolume to store the data. Kublet’s imagefs and nodefs calculations do not calculate the disk usage size of the volume. But the actual volume is still monitored to get the disk size of the volume, which is actually the CAdvisor. The difference is that the interval between obtaining monitoring data is longer than that of node-status-update-frequency, and the CPU usage is longer.

Here also hope that you can provide a good solution, let’s discuss.

conclusion

The article is bound to have some not rigorous place, but also hope that we contain, we absorb the essence (if any), to its dregs. If you are interested, you can close my public account: Gungunxi. My wechat id is lCOMedy2021