Docker health check

Docker HEALTHCHECK is mainly in dockerfile, HEALTHCHECK, there are two forms:

1. Disable any health checks inherited from the base image

FROM nginx:latest
ADD test.sh /opt/
HEALTHCHECK NONE
Copy the code

2. Use a health check

FROM nginx:latest
ADD test.sh /opt/
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 CMD [ "executable" ]
Copy the code
  • –interval=30s The health check starts 30 seconds after the container starts running, and then every 30 seconds after that

  • –timeout=30s Specifies the time required to execute a CMD. If a timeout occurs, an error occurs by default. After that, the time for each health check is timeout+interval.

  • –start-period=5s The container that boots at this time provides the initialization time. Probe failures during this period are not counted as maximum retries. However, if the health check succeeds during startup, the container is considered started and all consecutive failures count toward the maximum number of retries.

  • –retries=3 If three consecutive attempts fail, the container is unhealthy

The exit status of the command maps the health status of the container

  • 0: The container is healthy and available

  • 1: The container is unhealthy and cannot be used

  • 2: reserved code, please do not use this exit code

Health check in K8S

Pod Health Check

Pod health checks are performed primarily through two types of probes

  • LivenessProbe: checks whether the container is alive
  • ReadinessProbe: Checks whether the service is available

There are three implementations

  • ExecAction: Perform the health check by executing commands
apiVersion: v1
kind: Pod
metadata:
    name: pod-health
    lables: health
    namespaces: test
spec:
    containers:
    - name: nginx-pod
      image: nginx
      livenessProbe:
        exec:
          command:
          - cat
          - /tmp/health
        initialDelaySeconds: 80
        timeoutSeconds: 10
        periodSeconds: 5
      readinessProbe:
        httpGet:
          path: /data/health
          port: 8080
        initialDelaySeconds: 80
        timeoutSeconds: 10
        periodSeconds: 5

Copy the code
  • TCPSocketAction: Uses IP+PORT to perform health check
apiVersion: v1
kind: Pod
metadata:
    name: pod-health
    lables: health
    namespaces: test
spec:
    containers:
    - name: nginx-pod
      image: nginx
      livenessProbe:
        tcpSocket:
          port: 80
        initialDelaySeconds: 80
        timeoutSeconds: 10
        periodSeconds: 5
      readinessProbe:
        httpGet:
          path: /data/health
          port: 8080
        initialDelaySeconds: 80
        timeoutSeconds: 10
        periodSeconds: 5

Copy the code
  • HTTPGetAction: Health check by calling the container’s IP address, port number, and path
apiVersion: v1
kind: Pod
metadata:
    name: pod-health
    lables: health
    namespaces: test
spec:
    containers:
    - name: nginx-pod
      image: nginx
      livenessProbe:
        httpGet:
          path: /data/health
          port: 80
        initialDelaySeconds: 80
        timeoutSeconds: 10
        periodSeconds: 5
      readinessProbe:
        httpGet:
          path: /data/health
          port: 80
        initialDelaySeconds: 80
        timeoutSeconds: 10
        periodSeconds: 5

Copy the code

Node Health Check

Node-problem-detector, which is a daemon running on each Node that detects Node problems and reports them to Apiserver. The node problem detector can run as a DaemonSet or on its own. By using Event and NodeCondition to report problems to the API server, you can also customize scripts and plug-ins to allow NPD to perform checks periodically.

Reference:

https://github.com/kubernetes/node-problem-detector

https://kubernetes.io/zh/docs/tasks/debug-application-cluster/monitor-node-health/
Copy the code