How do I troubleshoot Kubernetes components

Operation development story public account author: Jock

The basic components of Kubernetes are like the foundation of a house, and their importance is self-evident. As the maintainer of Kubernetes cluster, often encounter component problems, that is usually how to locate the solution?

Here is a brief analysis of my investigation ideas.

Locate the faulty node or component based on the cluster status
Analyze component logs
Use Pprof to analyze the specific performance of the component

Determine the scope of

Kubernetes has few basic components and is very easy to deploy, so it is easy to define the scope. For example, when we use Kubectl get Nodes, if the state of a node is NotReady, are there two possibilities? (1) There is a problem with the kubelet component of the node (2) there is a problem with the network component of the node

So now that you’ve got the general direction, then you do the elimination.

Why is it elimination? In the process of solving problems, we usually adopt the method of hypothesis first and then verification. We first list all possible factors, and then verify and exclude them one by one until the problem is solved.

Analysis of the log

Log analysis is the most direct way to troubleshoot faults. Most problems can be found in the log. Kubernetes component logs are usually viewed in one of two ways:

Services started by systemd, usingjournalctl -l -u xxxx
Service started using static POD, usingkubectl logs -n kube-system $PODNAME --tail 100

Of course, a lot of times we don’t just analyze the problem itself, we also look at the surrounding problems, such as the CPU, memory, IO, etc., of the infrastructure, and so on.

Performance analysis

Why leave performance analysis to the end?

For most people, analyzing component performance is neither good nor enjoyable. The first is a relatively long time, the second is to have a certain understanding of each performance index, and the third is the cost of learning is relatively large.

As we all know, Kubernetes version iteration is relatively fast, roughly 2-3 releases a year, such a fast iteration speed does not exclude some versions of bugs, there are some performance issues. So in the case of no action, you can try to analyze the performance of its components.

Kubernetes is developed using Golang, and Golang’s Pprof is a performance analysis tool, providing interactive interface and UI graphics, more intuitive, can be very convenient to find problems. In addition, you can use the GO-Torch to generate a flame graph from profile data, which is more intuitive.

All Kubernetes components can use pprof for performance analysis. The interface is host:port/debug/pprof/.

The usual way of pprof

Using interactive commands

View stack call information

go tool pprof http://localhost:8001/debug/pprof/heap
Copy the code

View the CPU information generated within 30 seconds

go tool pprof http://localhost:8001/debug/pprof/profile? seconds=30Copy the code

View Goroutine blocking

go tool pprof http://localhost:8001/debug/pprof/block
Copy the code

Collect the execution path within 5 seconds

go tool pprof http://localhost:8001/debug/pprof/trace? seconds=5Copy the code

The stack trace of the contention mutex holder

go tool pprof http://localhost:8001/debug/pprof/mutex
Copy the code

Through the UI

The UI analysis tool is relatively troublesome to use. We need to export the file first, and then use the Go tool to start the service for analysis.

For example, we export the stack information of kube-Scheduler.

curl -sK -v http://localhost:10251/debug/pprof/heap > heap.out
Copy the code

Then use the Go Tool to start a service as follows:

Go tool pprof -http=0.0.0.0:8989 heap. OutCopy the code

You can then see the detailed stack infographic in your browser.Note that this requires installation on the server sidegraphvizComponents, see [3] for the installation of various operating systems.

The main UI menus and their functions are briefly introduced as follows:

VIEW: VIEW mode
- Top: view in order from Top to bottom
- Graph: The default mode, viewed as a Graph
- Flame Graph: View as a Flame Graph
- Peek: Sorted view, display more information
- Source: sort view, with Source code annotation
- Disassemble: The total amount of reality
SAMPLE: Provides the VIEW mode for the VIEW
- Alloc_objects: The total number of objects allocated (whether freed or not)
- Alloc_space: The total amount of memory allocated (whether freed or not)
- Inuse_objects: The number of objects that have been allocated but not released
- Inuse_sapce: Indicates the amount of memory allocated but not freed
REFINE: Provides screening capabilities

The above is a brief introduction to the basic use of Pprof, and the following is a brief analysis of the various components of Kubernetes. The CPU information is captured and presented here.

Note: Due to version reasons, some versions have pprof enabled by default, while others do not. If not enabled, you need to enable it by yourself, and the parameters are basically profiling: true. The specific information can be checked on the official website [4].

Analysis of kube – apiserver

(1) Start a proxy using Kubectl Proxy

kubectl proxy
Copy the code

(2) Start another terminal to obtain CPU information

curl -sK -v  http://localhost:8001/debug/pprof/profile >apiserver-cpu.out
Copy the code

(3) Use the Go Tool to start the service

Go tool pprof -http=0.0.0.0:8989 apiserver-cpu.outCopy the code

(4) View it in a browser

Analysis of kube – the scheduler

(1) Obtain the CPU information

curl -sK -v  http://localhost:10251/debug/pprof/profile >scheduler-cpu.out
Copy the code

(2) Use the Go Tool to start the service

Go tool pprof -http=0.0.0.0:8989 Scheduler -cpu.outCopy the code

(3) View it in a browser

Analysis of kube controller — the manager