Docker health check
Docker HEALTHCHECK is mainly in dockerfile, HEALTHCHECK, there are two forms:
1. Disable any health checks inherited from the base image
FROM nginx:latest
ADD test.sh /opt/
HEALTHCHECK NONE
Copy the code
2. Use a health check
FROM nginx:latest
ADD test.sh /opt/
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 CMD [ "executable" ]
Copy the code
-
–interval=30s The health check starts 30 seconds after the container starts running, and then every 30 seconds after that
-
–timeout=30s Specifies the time required to execute a CMD. If a timeout occurs, an error occurs by default. After that, the time for each health check is timeout+interval.
-
–start-period=5s The container that boots at this time provides the initialization time. Probe failures during this period are not counted as maximum retries. However, if the health check succeeds during startup, the container is considered started and all consecutive failures count toward the maximum number of retries.
-
–retries=3 If three consecutive attempts fail, the container is unhealthy
The exit status of the command maps the health status of the container
-
0: The container is healthy and available
-
1: The container is unhealthy and cannot be used
-
2: reserved code, please do not use this exit code
Health check in K8S
Pod Health Check
Pod health checks are performed primarily through two types of probes
- LivenessProbe: checks whether the container is alive
- ReadinessProbe: Checks whether the service is available
There are three implementations
- ExecAction: Perform the health check by executing commands
apiVersion: v1
kind: Pod
metadata:
name: pod-health
lables: health
namespaces: test
spec:
containers:
- name: nginx-pod
image: nginx
livenessProbe:
exec:
command:
- cat
- /tmp/health
initialDelaySeconds: 80
timeoutSeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /data/health
port: 8080
initialDelaySeconds: 80
timeoutSeconds: 10
periodSeconds: 5
Copy the code
- TCPSocketAction: Uses IP+PORT to perform health check
apiVersion: v1
kind: Pod
metadata:
name: pod-health
lables: health
namespaces: test
spec:
containers:
- name: nginx-pod
image: nginx
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 80
timeoutSeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /data/health
port: 8080
initialDelaySeconds: 80
timeoutSeconds: 10
periodSeconds: 5
Copy the code
- HTTPGetAction: Health check by calling the container’s IP address, port number, and path
apiVersion: v1
kind: Pod
metadata:
name: pod-health
lables: health
namespaces: test
spec:
containers:
- name: nginx-pod
image: nginx
livenessProbe:
httpGet:
path: /data/health
port: 80
initialDelaySeconds: 80
timeoutSeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /data/health
port: 80
initialDelaySeconds: 80
timeoutSeconds: 10
periodSeconds: 5
Copy the code
Node Health Check
Node-problem-detector, which is a daemon running on each Node that detects Node problems and reports them to Apiserver. The node problem detector can run as a DaemonSet or on its own. By using Event and NodeCondition to report problems to the API server, you can also customize scripts and plug-ins to allow NPD to perform checks periodically.
Reference:
https://github.com/kubernetes/node-problem-detector
https://kubernetes.io/zh/docs/tasks/debug-application-cluster/monitor-node-health/
Copy the code