Wechat official account: Operation and maintenance development story, author: Mr. Dongzi

background

One of the best practices in container technology is to build as minimal a container image as possible. However, this practice can cause trouble in troubleshooting problems: the simplified containers are generally missing common troubleshooting tools, and some containers don’t even have shells (such as FROM Scratch). In this case, we can only troubleshoot problems through logs or the host computer through Docker-CLI or nsenter, which is very inefficient. After the application is deployed in K8s environment, we often need to enter POD for troubleshooting. In addition to viewing POD logs and describe, the traditional solution is to pre-install tools such as Procps, Net-tools, tcpdump, and Vim in the business POD base image. However, this does not conform to the principle of minimizing mirroring and increases the risk of Pod security vulnerabilities.

Kubectl-debug is a simple, easy to use, powerful kubectl plug-in. It can help you easily diagnose the pod on Kubernetes. It does this by starting a debugger container and adding it to the TARGET business container’s PID, Network, User, and IPC namespace. At this point, familiar tools like Netstat and tcpdump can be used directly in the new container to solve the problem, while the business container can be kept to a minimum without the need to pre-install any additional troubleshooting tools. Kubectl-debug consists of the following two parts:

  • Kubectl-debug: command line tool

  • Debug-agent: deployed on a K8s node, it is used to start the associated debugging tool container

The working principle of

As we know, a container is essentially a set of processes with cGroup resource constraints and namespace isolation. Therefore, we simply start a process and add it to the various namespaces of the target container to “go inside the container” (note the quotation marks), The same root file system, virtual network card, and process space that the process in the container “sees” — that’s how commands like Docker exec and Kubectl exec work.

Now, not only do we want to “get inside the container,” but we want to bring a set of tools with us to help troubleshoot problems. The best way to manage a toolset efficiently and cross-platform is to package the tools themselves in a container image. Next, we just need to start the container through the “tool image”, and then specify the container to join the target container’s various namespaces, naturally achieve “carry a set of tools into the container”. In fact, you can do this using docker-CLI:

Export TARGET_ID=666666666 Pid and IPC namespace docker run -it --network=container:$TARGET_ID --pid= Container :$TARGET_ID --ipc= Container :$TARGET_ID  busyboxCopy the code

This is the starting point of Kubectl-Debug: use the tool container to diagnose the business container. The idea behind the design is consistent with patterns like Sidecar’s: each container does one thing.

In terms of implementation, the logic behind a kubectl debug command looks like this:

img

The steps are as follows:

  1. Check whether demo- Pod exists in ApiServer

  2. ApiServer Returns to the node where Demo-Pod resides

  3. The plug-in requests that the Debug Agent Pod be created on the target node

  4. Kubelet Create Debug Agent Pod

  5. The plug-in finds that the Debug Agent is Ready and initiates a Debug request (long connection).

  6. After receiving a Debug request, the Debug Agent creates a Debug container and adds it to each Namespace of the target container. After the creation is complete, the Agent connects to the TTY of the Debug container

Next, the client can start debugging through connections 5 and 6. After the operation is complete, the Debug Agent clears the Debug container and the plug-in clears the Debug Agent

The installation

Github address: github.com/aylei/kubec…

  • Mac can be installed directly using BREW
brew install aylei/tap/kubectl-debug

Copy the code
  • Install by downloading binaries
Export PLUGIN_VERSION=0.1.1 # Linux x86_64 curl-lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/download/v${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_linux_amd64. tar.gz # macos curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/download/v${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_darwin_amd64 .tar.gz tar -zxvf kubectl-debug.tar.gz kubectl-debug sudo mv kubectl-debug /usr/local/bin/Copy the code

Windows users can choose to download the Windows version on the Release page and add environment variables for use

Github provides debug Agent to be installed in the cluster in DaemonSet mode. However, in DaemonSet mode, agent Pod is pre-deployed on all nodes, which always occupies resources and causes resource waste in the environment with low debugging frequency

Daily instructions

Simple to use

Kubectl 1.12.0 or higher, can be used directly

Kubectl debug -hCopy the code

Kubectl has supported automatic discovery of plug-ins from PATH since version 1.12. Kubectl prior to version 1.12 does not support this plug-in mechanism, but it can also be called directly with the command kubectl-debug.

Debug-agent daemonset (onset, onset, onset, onset) omit –agentless (onset, onset, onset

kubectl debug POD_NAME --daemonset-ns=default --daemonset-name=debug-agent

Copy the code

Github provides debug Agent to be installed in the cluster in DaemonSet mode. However, in DaemonSet mode, Agent Pod is pre-deployed on all nodes, which always occupies resources and causes resource waste in the environment with low debugging frequency. Deployment mode: Kubectl apply – raw.githubusercontent.com/aylei/kubec f…

3. In agentless mode, kubectl-debug creates agent Pod and debugging tool container after executing the command, and deletes the tool container and Agent Pod after exiting. The agent starts slightly slower than daemon-set mode because it is pulled up again each time it executes. Use -a, –agentless to enable agentless mode:

kubectl debug POD_NAME --agentless --port-forward

Copy the code

4. If the Node does not have a public IP address or cannot be directly accessed (for reasons such as firewalls), use the port-forward mode

kubectl debug POD_NAME --agentless --port-forward

Copy the code

Use the advanced

1. Error init-container

kubectl debug POD_NAME --container=init-pod

Copy the code

2. If the Pod is in the CrashLookBackoff state and cannot be connected, you can copy an identical Pod for diagnosis

kubectl debug POD_NAME --fork

Copy the code

Custom mirror configuration

Default: nicolaka/netshoot:latest --agent-image: In Agentless mode, you can customize the debug-agent image. The default value is aylei/ Debug-agent :latest. In daemon-set mode, change the DEBUg-Agent Daemonset pod template to a private repository imageCopy the code

The configuration file

~/. Kube /debug-config is used to change the default parameters in the configuration file so that flag is not set when the command is used.

# debug agent listening port(outside container)
default to 10027
agentPort: 10027
whether using agentless mode
default to false
agentless: true
namespace of debug-agent pod, used in agentless mode
default to 'default'
agentPodNamespace: default
prefix of debug-agent pod, used in agentless mode
default to  'debug-agent-pod'
agentPodNamePrefix: debug-agent-pod
image of debug-agent pod, used in agentless mode
default to 'aylei/debug-agent:latest'
agentImage: aylei/debug-agent:latest
daemonset name of the debug-agent, used in port-forward
default to 'debug-agent'
debugAgentDaemonset: debug-agent
daemonset namespace of the debug-agent, used in port-forwad
default to 'default'
debugAgentNamespace: kube-system
whether using port-forward when connecting debug-agent
default false
portForward: true
image of the debug container
default as showed
image: nicolaka/netshoot:latest
start command of the debug container
default ['bash']
command:
- '/bin/bash'
- '-l'

Copy the code

Typical cases

useiftopView pod network traffic

POD_NAME = kube-flannel-ds-amd64-2xwqp;

 ~  kubectl debug kube-flannel-ds-amd64-2xwqp -n kube-system
Agent Pod info: [Name:debug-agent-pod-b14bd868-61a9-11ec-bc72-acbc328370f3, Namespace:default, Image:registry.cn-hangzhou.aliyuncs.com/querycapimages/kubectl-debug-agent:latest, HostPort:10027, ContainerPort:10027]
Waiting for pod debug-agent-pod-b14bd868-61a9-11ec-bc72-acbc328370f3 to run...
Forwarding from 127.0.0.1:10027 -> 10027
Forwarding from [::1]:10027 -> 10027
Handling connection for 10027
                             set container procfs correct false ..
pulling image registry.cn-hangzhou.aliyuncs.com/querycapimages/netshoot:latest, skip TLS false...
latest: Pulling from querycapimages/netshoot
Digest: sha256:f0eba49c9bf66600788d58779e57c2d7334708e12cb292ff8ccc9414c1b6730c
Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/querycapimages/netshoot:latest
starting debug container...
container created, open tty...
bash-5.0# iftop -i eth0
interface: eth0
IP address is: 172.17.3.3
MAC address is: 52:54:be:83:3a:e4

Copy the code

usedrillDiagnosing DNS resolution

POD_NAME = kube-flannel-ds-amd64-2xwqp;

Bash-5.0 # drill any www.baidu.com; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 3214 ;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 3 ;; QUESTION SECTION: ;; www.baidu.com. IN ANY ;; ANSWER SECTION: www.baidu.com. 803 IN CNAME www.a.shifen.com. ;; AUTHORITY SECTION: baidu.com. 38993 IN NS ns4.baidu.com. baidu.com. 38993 IN NS ns3.baidu.com. baidu.com. 38993 IN NS ns7.baidu.com. baidu.com. 38993 IN NS dns.baidu.com. baidu.com. 38993 IN NS ns2.baidu.com. ;; ADDITIONAL SECTION: Ns2.baidu.com. 19348 IN A 220.181.33.31 ns3.baidu.com. 23022 IN A 112.80.248.64 ns7.baidu.com. ; Query time: 1 msec ;; SERVER: 100.64.9.5;; WHEN: Mon Dec 20 15:37:35 2021 ;; MSG SIZE rcvd: 196Copy the code

A: drill commands commandnotfound. Cn/Linux / 1/533…

usetcpdumpcaught

POD_NAME = kube-flannel-ds-amd64-2xwqp;

Bash -5.0# tcpdump -i eth0 -c 1-xvv tcpdump: Listening on eth0, link-type EN10MB (Ethernet), Capture size 262144 bytes 15:39:27.577342 IP (TOS 0x0, TTL 63, Id 41476, offset 0, flags [DF], proto TCP (6), length 89) 198.19.116.60.16710 > 172.17.3.3.6443: Flags [P.], cksum 0xf831 (correct), seq 677521811:677521848, ack 1388710574, win 1037, options [nop,nop,TS val 2849535414 ecr 1924260089], length 37 0x0000: 4500 0059 a204 4000 3f06 b036 c613 743c E.. Y.. @.? . 6.. t< 0x0010: ac11 0303 4146 192b 2862 2993 52c6 0aae .... AF.+(b).R... 0x0020: 8018 040d f831 0000 0101 080a a9d8 75b6 ..... 1... u. 0x0030: 72b1 e0f9 1703 0300 2047 49f1 8fbb 2835 r........ GI... (5 0x0040: 059a 5e82 0746 afaf bd2d 5af3 c797 16b5 .. ^.. F... -Z..... 0x0050: 8709 4666 7e61 6f5a 0b .. Ff~aoZ. 1 Packet Captured 18 packets received by filter 0 packets dropped by kernel bash-5.0# tcpdump-n-VVV-w /tmp/kube-flannel-ds-amd64-2xwqp.pcap tcpdump: listening on veth19416cac, link-type EN10MB (Ethernet), capture size 262144 bytes 50 packets captured 50 packets received by filter 0 packets dropped by kernelCopy the code

Note that if you want to use wireshark to analyze the -w captured file, you must copy it from the POD_NAME host for analysis

[root@k8s-demo-master-01-2 ~]# docker ps |grep netshoot 58b918b67b3f registry.cn-hangzhou.aliyuncs.com/querycapimages/netshoot:latest "bash" 15 minutes ago Up 15 minutes unruffled_fermat [root@k8s-demo-master-01-2 ~]# docker cp 58b918b67b3f:/tmp/kube-flannel-ds-amd64-2xwqp.pcap . [root@k8s-demo-master-01-2 ~] # ll | grep kube - flannel - ds - amd64-2 XWQP. Pcap - rw - r - r - 1 root root on December 20, 5404 suffering justly kube - flannel - ds - amd64-2 XWQP. PcapCopy the code

Diagnostic CrashLoopBackoff

Checking CrashLoopBackoff is a very troublesome problem, Pod may constantly restart, kubectl exec and Kubectl debug can not be stable to check the problem, basically can only hope that Pod logs printed out useful information. Kubectl -debug adds –fork to CrashLoopBackoff to make checking CrashLoopBackoff easier. When –fork is specified, the plugin copies the current Pod Spec, makes some minor changes, and creates a new Pod:

  • All Labels of the new Pod will be removed to prevent the Service from sending traffic to the Pod fork

  • The ReadinessProbe and LivnessProbe for new pods will also be removed to prevent Kubelet from killing pods

  • The startup command of the target container (the container to be defused) in the new Pod will be overwritten to prevent the new Pod from continuing to Crash

Next, we can try to replicate the problem that caused the Crash in the old Pod in the new Pod, with the example go service pod_name srV-ES-driver-7445F6CF48-FF7bq. To ensure consistency, chroot to the root file system of the target container:

~  kubectl-debug srV-es-driver-7445f6cf48-ff7bq -n devops --agentless --port-forward Agent Pod info: [Name:debug-agent-pod-177482f4-61ad-11ec-b297-acbc328370f3, Namespace:default, Image:registry.cn-hangzhou.aliyuncs.com/querycapimages/kubectl-debug-agent:latest, HostPort:10027, ContainerPort:10027] Waiting for pod debug-agent-pod-177482f4-61ad-11ec-b297-acbc328370f3 to run... Forwarding from 127.0.0.1:10027 -> 10027 Forwarding from [::1]:10027 -> 10027 Handling Connection for 10027 set container procfs correct false .. pulling image registry.cn-hangzhou.aliyuncs.com/querycapimages/netshoot:latest, skip TLS false... latest: Pulling from querycapimages/netshoot Digest: sha256:f0eba49c9bf66600788d58779e57c2d7334708e12cb292ff8ccc9414c1b6730c Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/querycapimages/netshoot:latest starting debug container... container created, open tty... Bash-5.0 # ls bin MNT sys dev opt termshark_2.1.1_linux_x64 etc proc TMP home root usr lib run var lib64 sbin media SRV Bash-5.0 # chroot /proc/1/root root@srv-es-driver-7445f6cf48-ff7bq:/# ls bin dev go lib media opt root sbin sys usr boot etc home lib64 mnt proc run srv tmp var root@srv-es-driver-7445f6cf48-ff7bq:/# cd /go/bin/ root@srv-es-driver-7445f6cf48-ff7bq:/go/bin# ls openapi.json srv-es-driver root@srv-es-driver-7445f6cf48-ff7bq:/go/bin# / srV-es-driver # Observe the information when executing the startup script and remove further obstacles based on the informationCopy the code

Customize image as sidercar installation command line debugging

If yum is not installed, you can mount centos or Ubuntu sidercar images, for example, install redis, and then run redis-cli

 ~  kubectl-debug srv-es-driver-7445f6cf48-ff7bq -n devops --agentless --port-forward --image centos
Agent Pod info: [Name:debug-agent-pod-f5077b08-61ad-11ec-8728-acbc328370f3, Namespace:default, Image:registry.cn-hangzhou.aliyuncs.com/querycapimages/kubectl-debug-agent:latest, HostPort:10027, ContainerPort:10027]
Waiting for pod debug-agent-pod-f5077b08-61ad-11ec-8728-acbc328370f3 to run...
Forwarding from 127.0.0.1:10027 -> 10027
Forwarding from [::1]:10027 -> 10027
Handling connection for 10027
                             set container procfs correct false ..
pulling image centos, skip TLS false...
latest: Pulling from library/centos
a1d0c7532777: Pull complete
Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
Status: Downloaded newer image for centos:latest
starting debug container...
container created, open tty...
[root@srv-es-driver-7445f6cf48-ff7bq /]# yum install -y redis
 

Copy the code

Reference links:

Aleiwu.com/post/kubect…

Public account: Operation and maintenance development story

Making:Github.com/orgs/sunsha…

Blog ** : www.devopstory.cn**

Love life, love operation

I am Mr. Dongzi, a member of the public account team of “Operation and Maintenance Development Story”. I am a front-line operation and maintenance migrant worker and a cloud native practitioner. There are not only core technology dry goods here, but also our thoughts and feelings on technology.

Scan qr code

Pay attention to me, not regular maintenance of quality content

Warm prompt

If my article is helpful to you, please like, review and forward it. Your support will encourage me to output higher quality articles. Thank you very much!

You can also set my official account as “star standard”, so that when the official account updates, you will receive push messages in the first time, so as to avoid missing my article updates.

.

This article uses the article synchronization assistant to synchronize