Health check

Health Check (Health Check) can be used in the condition monitoring of service, such as tencent’s DNSPOD D monitoring, required to configure a access path to decide whether the website can normal access is actually a Health Check, when found Health Check failed to send a notification email or SMS to inform the webmaster for repair.

K8S traffic forwarding

However, in some modern distributed systems, user access is no longer a single host, but a cluster composed of hundreds of instances. User requests are distributed to different instances through the load balancer, which helps solve the access pressure of a single server and improves the high availability of the system. Health checks are often used to determine whether the current instance is “available.” That is, if the system detects that an instance fails the health check, the load balancer will not direct traffic to the instance.

Current cloud service vendors such as AWS generally provide health checks for load balancing, while Kubernetes provides two probes to check the state of containers, Liveliness and Readiness. According to official documents, Liveliness probes check whether containers are running. The Readiness probe is designed to check whether a container is ready to accept HTTP requests. In Kubernetes, a Pod is the smallest deployable computing unit created and managed by Kubernetes. A Pod consists of one or more containers (Docker, Rocket, etc.) that share memory, network, and a way to run the container.

The live and ready probes in the Kubernetes context are referred to as health checks. These container probes are small processes that run periodically, and the results returned by these probes (success, failure, or unknown) reflect the state of the container in Kubernetes. Based on these results, Kubernetes determines how to treat each container to ensure resilience, high availability, and longer uptime.

Ready probe

The ready probe is designed to let Kubernetes know if your application is ready to service the request. Kubernetes forwards traffic to Pod only when the probe is ready to pass. If ready probe detection fails, Kubernetes will stop sending traffic to the container until it passes.

Live probe

Liveness detectors let Kubernetes know if your application is alive or not. If your app is still alive, Kubernetes keeps it alive. If your application is dead, Kubernetes will remove the Pod and restart one to replace it.

The working process of the

Let’s look at two scenarios to see how ready probes and survivable probes can help us build more usable systems.

Ready probe

An application usually takes some time to warm up and start up. For example, a back-end project starts up by connecting to a database for database migration, and a Spring project starts up by relying on the Java virtual machine. Even if the process is started, your service cannot run until it is up and running. Applications should not receive traffic until they are fully ready, but by default Kubernetes will start sending traffic as soon as the processes inside the container start. Traffic is not allowed to be sent to a new copy until the application is fully started by ready probe probing.

Prepare the working process of the probe

Live probe

Let’s imagine another situation where our application “goes down” for some reason after a successful startup, or encounters a deadlock that prevents it from responding to user requests. By default, Kubernetes continues to send requests to the Pod, using a survival probe to detect that the service cannot process the request within the time limit (request error or timeout) and restart the Pod in question.

The working process of survival probe

The probe type

Probe type refers to the method by which health checks are performed. K8S has three types of probes: HTTP, Command, and TCP. HTTP HTTP probes are probably the most common type of probe. Even if the application is not an HTTP service, you can create a lightweight HTTP server to respond to probes. For example, ask Kubernetes to access a URL over HTTP and mark the application as healthy if the return code is in the range of 200 to 300, otherwise it is marked as unhealthy. More information about HTTP probes can be found here.

Command for command probe, Kubernetes runs commands inside the container. If the command returns with the exit code 0, the container is marked as normal. Otherwise, it is marked as unhealthy. More on command probing can be found here.

The final type of TCP probe is TCP probe, where Kubernetes attempts to establish a TCP connection on a specified port. If it can establish a connection, the container is considered healthy; If it can’t be considered unhealthy. This is often used for probing gRPC or FTP services.

More information about TCP probes can be found here.

Initial detection delay

We can configure how often the K8S health check is run, the conditions under which the check succeeds or fails, and the timeout for the response. Refer to the documentation for configuring probes.

A live probe probe failure causes the POD to restart, so it is important to configure the initial probe delay initialDelaySeconds to ensure that the probe is not started until the application is ready. Otherwise, the application will restart indefinitely!

I recommend using p99 startup time as initialDelaySeconds, or taking the average startup time plus a buffer. This value is also updated based on the startup time of the application.

For example,

For example, the following configuration code for K8S,

  • K8S uses HTTP to access /actuator/ Health of port 8080 within 120s after Pod starts. If it exceeds 10 seconds or the return code does not exceed 200 to 300, the readiness check fails

  • Similarly, the K8S still detects /actuator/ health on the 8080 port every 5s(periodSeconds) during Pod operation


     

    apiVersion: apps/v1beta1

    kind: Deployment

    .

    .

     readinessProbe:

     httpGet:

     path: /actuator/health

     port: 8080

     initialDelaySeconds: 120

     timeoutSeconds: 10

     livenessProbe:

     httpGet:

     path: /actuator/health

     port: 8080

     initialDelaySeconds: 60

     timeoutSeconds: 10

     periodSeconds: 5

Copy the code

The resources

  • Kubernetes Best Practices: Setting up Health checks with Readiness and Liveness Probes

  • Best practices for Kubernetes Survival probes and Ready probes

I’m a big guest