Cause-causing Messages Down Pods in a Kubernetes Cluster

Published time: Jan 26, 2019

Gruntwork. IO /zero-downti…

Article by Yorinasub17

This is the second part of our implementation of the Kubernetes cluster with zero downtime updates. In the first part of this series, we outlined the problems and challenges of simply using the Kubectl drain command to remove PODS from cluster nodes. In this article, we’ll cover one of the tools to address these issues and challenges: gracefully turning off Pod.

The Pod expulsion lifecycle

By default, the kubectl drain command follows the Pod life cycle on a node, which means the process follows the following rules:

  • kubectl drainA request is made to the control center to remove the Pod on the target node. The request is then notified on the target nodekubeletStart shutting down Pod.
  • nodeskubeletWill call the one in PodpreStopHook.
  • whenpreStopAfter the hook execution is complete, thekubeletIs sent to programs running in the Pod containerTERMSignal (SIGTERM).
  • nodeskubeletWill wait up to the specified grace period (specified on the POD, or passed in from the command line; The default is 30 seconds.) Then close the container and forcibly terminate the process (using SIGKILL). Note that this grace period includes enforcementpreStopHook time.

The wait grace period before terminating a Pod can be specified in two ways

  1. In the Pod is defined by Pod template spec. TerminationGracePeriodSeconds setting
  2. kubectl delete pod {podName} –grace-period=60

Based on this process, we can take advantage of the preStop hook and signal processing in the application Pod to shut down the application properly in order to “clean up” the application before finally terminating it. For example, if a worker process reads information from a queue and then processes tasks, we can have the application capture TERM system signals to indicate that the application should stop accepting new tasks and stop running when all current tasks are complete. Alternatively, if the running application cannot be modified to capture TERM signals (such as third-party applications), the preStop hook can be used to implement the custom API provided by the service to shut down the application normally.

In our example, Nginx cannot handle TERM signals by default, so we will instead rely on Pod’s preStop hook to stop Nginx properly. We will modify the resource definition to add lifecycle hooks to the container’s spec definition as follows:

lifecycle:
  preStop:
    exec:
      command: [
        # Gracefully shutdown nginx
        "/usr/sbin/nginx"."-s"."quit"
      ]
Copy the code

After this configuration is applied, kebulet calls the Pod’s lifecycle hook to issue the command /usr/sbin/nginx -s quit before sending the TERM signal to the Nginx process in the container. Note that since this command will normally stop the Nginx process and Pod, the TERM signal is actually a null operation in this example.

After the lifecycle hooks are added to the definition file, the definition of the entire Deployment resource looks like this

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: Nginx: 1.15
        ports:
        - containerPort: 80
        lifecycle:
          preStop:
            exec:
              command: [
                # Gracefully shutdown nginx
                "/usr/sbin/nginx"."-s"."quit"
              ]
Copy the code

Subsequent flow after shutdown

Using the preStop hook above to shut down the Pod properly ensures that Nginx will not stop until it has finished processing the existing traffic. However, you may find that the Nginx container continues to receive traffic even after it is shut down, resulting in service downtime.

To understand what causes this problem, let’s look at an example diagram. Assume that the node has received traffic from the client. The application generates a worker thread to process the request. We use the circle in the Nginx Pod sample diagram to represent this worker thread.

Suppose that while the worker thread is processing the request, the cluster operations staff decides to maintain Node1. After kubectl drain node-1 is run, the Kubelet on the node executes the preStop hook set by Pod and begins the normal shutdown of the Nginx process.

Since Nginx is still processing requests for existing traffic, Nginx does not immediately terminate the process once it enters the normal shutdown process, but rejects incoming traffic and returns an error to new requests.

At this point in time, suppose a new Service request reaches the Service at the upper level of the Pod. Since the Pod is still the Endpoint of the upper level of the Service, the Pod that is about to be shut down may still receive requests from the Service. If Pod actually receives the new request being distributed, Nginx will reject the processing and return an error.

Learn the K8s Service controller quickly

Eventually Nginx will finish processing the original existing request, and Kubelet will remove the Pod and the node will be emptied.

Why is that? How do I avoid receiving requests from clients during Pod shutdown? In the next part of this series, we’ll look at the Pod lifecycle in more detail and show how to introduce a delay in the preStop hook to de-stream the Pod to mitigate the impact of subsequent traffic from the Service.