Why open source KubeEye

Kubernetes is the de facto standard for container orchestering. Despite its elegant architecture and powerful capabilities, Kubernetes has a number of problems and hidden problems that cluster administrators and Yaml engineers can’t handle on a daily basis.

The infrastructure daemon process is faulty. The NTP service is interrupted.
Hardware problems: For example, the CPU, memory, or disk is abnormal.
Kernel problems: kernel deadlock, corrupted file system;
Container runtime problem: The runtime daemon is not responsive;
…

There are many more such problems, and these hidden exceptions are invisible to the control plane of the cluster, so Kubernetes will continue to schedule pods to the exception node, thus putting the cluster and the running application at great risk for security and stability.

What is a KubeEye

KubeEye is an open source automated cluster inspection tool designed to detect problems on Kubernetes, such as application configuration errors, unhealthy cluster components, and node issues. Developed using the Go language based on Polaris and Nod-Problem-Detector, KubeEye has built in a series of exception detection rules. In addition to predefined rules, it also supports custom rules.

What can KubeEye do

Discover and detect problems of Kubernetes cluster control plane, including kube-apiserver/ Kube-controller-Manager/ETCD, etc.
Help you detect Kubernetes node problems, including memory /CPU/ disk pressure, unexpected kernel error logs, etc.
Validate your workloads against industry best practices to the YAML specification to help you keep your cluster stable.

Architecture diagram

KubeEye retrieves cluster diagnostic data by invoking the Kubernetes API by routinely matching key error messages in the log with the rules of the container syntax. See Architecture.

Built-in check items

Yes/no	Check the item	describe
Square root	ETCDHealthStatus	If etCD is up and running
Square root	ControllerManagerHealthStatus	If kubernetes kube-controller-Manager is up and running
Square root	SchedulerHealthStatus	If kubernetes kube-schedule is up and running
Square root	NodeMemory	If the node memory usage exceeds the threshold
Square root	DockerHealthStatus	If Docker is working properly
Square root	NodeDisk	If the node disk usage exceeds the threshold
Square root	KubeletHealthStatus	If Kubelet is active and running properly
Square root	NodeCPU	If the CPU usage of a node exceeds the threshold
Square root	NodeCorruptOverlay2	Overlay2 unavailable
Square root	NodeKernelNULLPointer	The node shows NotReady
Square root	NodeDeadlock	A deadlock is a phenomenon in which two or more processes wait for each other while competing for resources.
Square root	NodeOOM	Monitor processes that consume too much memory, especially those that consume too much memory very fast, and the kernel will kill them to prevent them from running out of memory
Square root	NodeExt4Error	Failed to mount Ext4
Square root	NodeTaskHung	Check whether the number of processes in state D exceeds 120s
Square root	NodeUnregisterNetDevice	Checking the Corresponding network
Square root	NodeCorruptDockerImage	Check the Docker image
Square root	NodeAUFSUmountHung	Check the storage
Square root	NodeDockerHung	Docker hang Docker hang
Square root	PodSetLivenessProbe	If you set a livenessProbe for each container in the POD
Square root	PodSetTagNotSpecified	Mirror address does not declare label or label is up to date
Square root	PodSetRunAsPrivileged	Running Pod in privileged mode means that Pod can access the host’s resources and kernel functions
Square root	PodSetImagePullBackOff	Pod cannot pull out the image correctly, so you can pull out the image manually on the corresponding node
Square root	PodSetImageRegistry	Check whether the mirror form is in the appropriate warehouse
Square root	PodSetCpuLimitsMissing	No CPU resource limit declared
Square root	PodNoSuchFileOrDirectory	Enter the container to check whether the corresponding file exists
Square root	PodIOError	This is usually due to file IO performance bottlenecks
Square root	PodNoSuchDeviceOrAddress	Checking the Corresponding network
Square root	PodInvalidArgument	Checking corresponding storage
Square root	PodDeviceOrResourceBusy	Check the corresponding directory and PID
Square root	PodFileExists	Check existing files
Square root	PodTooManyOpenFiles	Number of open file/socket connections exceeded system setting
Square root	PodNoSpaceLeftOnDevice	Check the usage of disks and inodes
Square root	NodeApiServerExpiredPeriod	The ApiServer certificate will be checked if the expiration date is less than 30 days
Square root	PodSetCpuRequestsMissing	CPU resource request value not declared
Square root	PodSetHostIPCSet	Setting the Host IP address
Square root	PodSetHostNetworkSet	Setting the Host Network
Square root	PodHostPIDSet	Setting the HOST PID
Square root	PodMemoryRequestsMiss	No memory resource request value declared
Square root	PodSetHostPort	Setting a Host Port
Square root	PodSetMemoryLimitsMissing	No memory resource limit value is declared
Square root	PodNotReadOnlyRootFiles	The file system is not set to read-only
Square root	PodSetPullPolicyNotAlways	The mirror pull strategy is not always the case
Square root	PodSetRunAsRootAllowed	Execute the command as the root user
Square root	PodDangerousCapabilities	You have risky choices in features such as ALL/SYS_ADMIN/NET_ADMIN
Square root	PodlivenessProbeMissing	No statement ReadinessProbe
Square root	privilegeEscalationAllowed	Allow privilege escalation
	NodeNotReadyAndUseOfClosedNetworkConnection	http 2-max-streams-per-connection
	NodeNotReady	Cannot start ContainerManager Cannot set property TasksAccounting or unknown property

Note: Unmarked projects are under development

How to use

Install KubeEye on the machine
- Download the pre-built executable from Releases.
- Or you can build from source code
```
git clone https://github.com/kubesphere/kubeeye.git
cd kubeeye 
make install
Copy the code
```
[Optional] Install the Node-problem-detector

Note: This line will install NPD on your cluster and is only needed if you want detailed reports. ke install npd

KubeEye performs automatic inspection:

root@node1:# ke diag NODENAME SEVERITY HEARTBEATTIME REASON MESSAGE node18 Fatal 2020-11-19T10:32:03+08:00 NodeStatusUnknown Kubelet stopped posting node status. node19 Fatal 2020-11-19T10:31:37+08:00 NodeStatusUnknown Kubelet stopped posting node status. node2 Fatal 2020-11-19T10:31:14+08:00 NodeStatusUnknown Kubelet stopped posting node status. node3 Fatal 2020-11-27T17:36:53+08:00 KubeletNotReady Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? The NAME SEVERITY TIME MESSAGE scheduler Fatal T17:2020-11-27 09:59 + 08:00 Get http://127.0.0.1:10251/healthz: Dial the TCP 127.0.0.1:10251: connect: Connection refused etcd Fatal T17:2020-11-27 0... + 08:00 Get https://192.168.13.8:2379/health: Dial the TCP 192.168.13.8:2379: connect: Connection refused NAMESPACE SEVERITY PODNAME EVENTTIME REASON MESSAGE Default Warning Node3.164b53D23ea79FC7 2020-11-27T17:37:34+08:00 ContainerGCFailed rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Default Warning Node3.164 B553CA5740AAE 2020-11-27T18:03:31+08:00 FreeDiskSpaceFailed failed to garbage collect required amount of images. Wanted to free 5399374233 bytes, but freed 416077545 bytes default Warning nginx-b8ffcf679-q4n9v.16491643e6b68cd7 2020-11-27T17:09:24+08:00 Failed Error: ImagePullBackOff Default Warning Node3.164B5861E041A60E 2020-11-27T19:01:09+08:00 SystemOOM SystemOOM Encountered, victim process: stress, pid: 16713 Default Warning Node3.164B58660F8D4590 2020-11-27T19:01:27+08:00 OOMKilling Out of Memory: Kill process 16711 (stress) score 205 or sacrifice child Killed process 16711 (stress), UID 0, total-vm:826516kB, anon-rss:819296kB, file-rss:0kB, Shmem - RSS: 0 KB insights - agent Warning workloads - 1606467120.164 b519ca8c67416 T16 2020-11-27:57:05 + 08:00 DeadlineExceeded Job was active longer than specified deadline kube-system Warning calico-node-zvl9t.164b3dc50580845d 2020-11-27T17:09:35+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 100.64.11.3 114.114.114.114.114 119.29.29.29 kube-system Warning KUbe-proxy-4bnn7.164b3dc4f4C4125d 2020-11-27t17:09:09 +08:00  DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 100.64.11.3 114.114.114.114 119.29.29.29 kube- System Warning Nodelocaldns-2zbhh.164b3dc4f42d358b 2020-11-27T17:09:14+08:00 DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 100.64.11.3 114.114.114.114 119.29.29.29 NAMESPACE SEVERITY NAME KIND TIME MESSAGE Kube-system Warning node-problem-detector DaemonSet 2020-11-27T17:09:59+08:00 [livenessProbeMissing runAsPrivileged] kube-system Warning calico-node DaemonSet 2020-11-27T17:09:59+08:00 [runAsPrivileged cpuLimitsMissing] kube-system Warning nodelocaldns DaemonSet 2020-11-27T17:09:59+08:00 [cpuLimitsMissing runAsPrivileged] default Warning nginx Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing livenessProbeMissing tagNotSpecified] insights-agent Warning workloads CronJob 2020-11-27T17:09:59+08:00 [livenessProbeMissing] insights-agent Warning cronjob-executor Job 2020-11-27T17:09:59+08:00 [livenessProbeMissing] kube-system Warning calico-kube-controllers Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing livenessProbeMissing] kube-system Warning coredns Deployment 2020-11-27T17:09:59+08:00 [cpuLimitsMissing]Copy the code

Refer to the FAQ to optimize your cluster.

Add a custom check rule

In addition to the pre-defined inspection items and rules mentioned above, KubeEye also supports custom inspection rules. Here is an example:

Add a custom NPD check rule

Install the NPD commandke install npd
Configmap kube-system/ nod-problem-detector -config

kubectl edit cm -n kube-system node-problem-detector-config
Copy the code

You can add exception logs under the configMap rules. The rules follow regular expressions.

Customize best practice rules

Prepare a rule yamL, for example, the following rule will validate your Pod specification to ensure that the image only comes from the authorized registry.

checks:
  imageFromUnauthorizedRegistry: warning

customChecks:
  imageFromUnauthorizedRegistry:
    promptMessage: When the corresponding rule does not match. Show that image from an unauthorized registry.
    category: Images
    target: Container
    schema:
      '$schema': http://json-schema.org/draft-07/schema
      type: object
      properties:
        image:
          type: string
          not:
            pattern: ^quay.io
Copy the code

Save the above rules as YAML, such as rule-.yaml.
Run KubeEye with rule-.yaml.

root:# ke diag -f rule.yaml --kubeconfig ~/.kube/config NAMESPACE SEVERITY NAME KIND TIME MESSAGE default Warning nginx Deployment 2020-11-27T17:18:31+08:00 [imageFromUnauthorizedRegistry] kube-system Warning node-problem-detector DaemonSet  2020-11-27T17:18:31+08:00 [livenessProbeMissing runAsPrivileged] kube-system Warning calico-node DaemonSet 2020-11-27T17:18:31+08:00 [cpuLimitsMissing runAsPrivileged] kube-system Warning calico-kube-controllers Deployment 2020-11-27T17:18:31+08:00 [cpuLimitsMissing livenessProbeMissing] kube-system Warning nodelocaldns DaemonSet 2020-11-27T17:18:31+08:00 [runAsPrivileged cpuLimitsMissing] default Warning nginx Deployment 2020-11-27T17:18:31+08:00 [livenessProbeMissing cpuLimitsMissing] kube-system Warning coredns Deployment 2020-11-27T17:18:31+08:00 [cpuLimitsMissing]Copy the code

Roadmap

Fine-grained inspection items are supported. For example, the cluster responds slowly
Cluster inspection reports can be generated based on inspection results
Cluster inspection reports can be exported to CSV or HTML files

What other features would you like KubeEye to offer? Please come to Github and submit your suggestions or requests

GitHub: github.com/kubesphere/…

Refer to the link

KubeEye Release:github.com/kubesphere/…

KubeEye FAQ documentation: github.com/kubesphere/…

Node-Problem-Detector:github.com/kubernetes/…

About KubeSphere

KubeSphere is a container hybrid cloud built on top of Kubernetes to provide full-stack IT automation capabilities and simplify DevOps workflows for enterprises.

KubeSphere has been adopted by thousands of enterprises at home and abroad such as Aqara Smart Home, Bentley Life, Sina, PICC Life insurance, Huaxia Bank, PUDONG Development Silicon Valley Bank, Sichuan Airlines, Sinopharm Group, Webank, Zijininsurance, Radore, ZaloPay and so on. KubeSphere provides an operational-friendly, wizard-like interface and rich enterprise-class functionality, It includes multi-cloud and multi-cluster management, Kubernetes resource management, DevOps (CI/CD), application lifecycle management, Service Mesh, multi-tenant management, monitoring logs, alarm notification, storage and network management, GPU support, etc. Help enterprises quickly build a powerful and rich container cloud platform.

IO/KubeSphere GitHub: github.com/kubesphere/…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

KubeSphere open source KubeEye: Kubernetes cluster automatic inspection tool

Why open source KubeEye

What is a KubeEye

What can KubeEye do

Architecture diagram

Built-in check items

How to use

Add a custom check rule

Add a custom NPD check rule

Customize best practice rules

Roadmap

Refer to the link

About KubeSphere

KubeSphere open source KubeEye: Kubernetes cluster automatic inspection tool

Why open source KubeEye

What is a KubeEye

What can KubeEye do

Architecture diagram

Built-in check items

How to use

Add a custom check rule

Add a custom NPD check rule

Customize best practice rules

Roadmap

Refer to the link

About KubeSphere

Related Posts

Five years on the job and you don’t know the volatile keyword?

“Three handshakes, four waves.” Make sure you don’t forget it

DOM Manipulation – 9 ways to get DOM tags