The author | white cloud development engineer (ali), creek, constant (ali cloud experts)

< Pay attention to alibaba Cloud’s original public account, reply to the investigation to download e-books >

“Kubernetes” book a total of 12 technical articles, to help you understand 6 core principles, understand the basic theory, a learning of 6 typical problems of the gorgeous operation!

In a Kubernetes cluster, services are usually provided by Deployment + LoadBalancer type Service. The typical Deployment architecture is shown in Figure 1. This architecture is easy to deploy, operate and maintain, but service interruption may occur during application updates or upgrades, causing online problems. Today we’ll take a closer look at why outages occur when applications are updated and how to avoid outages.

Why did the service interruption occur

A rolling update will create a new pod and wait for the new pod to run before deleting the old pod.

New Pod

Cause of interruption: After Pod running, the Node is added to the backend Endpoint. The container service monitors the Endpoint change and adds the Node to the BACKEND SLB. At this point, the request is forwarded from the SLB to the Pod, but the Pod business code is not fully initialized to process the request, resulting in a service interruption, as shown in Figure 2. Solution: Configure readiness detection for pod and wait for business code initialization to complete before adding Node to SLB backend.

Delete the Pod

In the process of deleting old PODS, the status of multiple objects (such as Endpoint, IPVS/IPtables, and SLB) needs to be synchronized asynchronously. The overall synchronization process is shown in Figure 3.

Pod

Pod state change: Set the POD state to Terminating and remove it from the Endpoints list of all services. At this point, the Pod stops getting new traffic, but containers running in the Pod are not affected;
A preStop Hook is triggered when a Pod is deleted. The preStop Hook supports bash scripts, TCP or HTTP requests.
Send SIGTERM signal: Send SIGTERM signal to container in Pod;
Wait for the specified time: terminationGracePeriodSeconds field is used to control the waiting time, the default value for 30 seconds. The step and preStop Hook execution at the same time, therefore terminationGracePeriodSeconds need is greater than the preStop time, can appear otherwise preStop has not completed, pod was kill;
Send SIGKILL signal: After the specified time, send SIGKILL signal to the container in pod to delete the POD.

Cause of interruption: The above steps 1, 2, 3, and 4 are performed at the same time, so it is possible that the Pod has not been removed from the Endpoints after receiving the SIGTERM signal and stopping working. At this point, the request is forwarded from the SLB to the POD, and the POD has stopped working, so a service interruption occurs, as shown in Figure 4.

Solution: Configure a preStop Hook for pod to sleep for a period of time upon receiving SIGTERM instead of stopping immediately to ensure that traffic forwarded from SLB can continue to be processed by POD.

iptables/ipvs

Cause of interruption: When a POD becomes termintaing, it is removed from the endpoint of all services. Kube-proxy clears the corresponding iptables/ IPVS entries. After the container service Watch changes to the endpoint, SLB OpenAPI will be called to remove the backend, which will take several seconds. Since these two operations are done simultaneously, it is possible that the iptables/ IPVS entry on the node has been cleaned, but the node has not been removed from the SLB. At this point, traffic flows in from the SLB, and there is no corresponding Iptables/IPVS rule on the node, resulting in service interruption, as shown in Figure 5.

Solutions:

Cluster mode: In Cluster mode, kube-proxy writes all service pods to iptables/ ipvS of nodes. If the Node has no service pods, the request is forwarded to other nodes, so there is no service interruption, as shown in 6.

Local mode: In Local mode, kube-proxy only writes pod on Node to iptables/ipvs. When there is only one pod on the Node and the state becomes terminating, iptables/ipvs removes the POD record. When the request is forwarded to the node, there is no corresponding iptables/ IPVS record, causing the request to fail. This problem can be avoided by upgrading in place, ensuring that there is at least one Running Pod on the Node during the update process. An in-place upgrade ensures that there is always a business POD record in Node’s Iptables/IPVs, so there is no interruption of service, as shown in Figure 7.

ENI mode Service: ENI mode bypasses kube-proxy and mounts pods directly to the SLB backend, so there is no iptables/ IPVS Service interruption.

SLB

Cause of interruption: The container service will remove Node from the SLB backend after monitoring the Endpoints change. After a node is removed from the back end of the SLB, the SLB disconnects long connections sent to the node, resulting in service interruption. Solution: Set long link elegant interrupts for the SLB (depending on the cloud vendor).

How can service disruptions be avoided

To avoid Service interruption, you can start with Pod and Service resources. The following describes the configuration methods based on the causes of Service interruption.

Pod configuration

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx
    # Survival detection
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 30
      periodSeconds: 30
      successThreshold: 1
      tcpSocket:
        port: 5084
      timeoutSeconds: 1
    # Ready detection
    readinessProbe:
      failureThreshold: 3
      initialDelaySeconds: 30
      periodSeconds: 30
      successThreshold: 1
      tcpSocket:
        port: 5084
      timeoutSeconds: 1
    # Exit gracefully
    lifecycle: 
      preStop: 
        exec: 
          command: 
          - sleep
          - 30
  terminationGracePeriodSeconds: 60
Copy the code

Note: It is necessary to set the readinessProbe frequency, delay time, unhealthy threshold and other data properly. The startup time of some applications is long. If the setting time is too short, THE POD will be restarted repeatedly.

LivenessProbe is a survival detection. If the number of failures reaches the failureThreshold, THE POD restarts. For details, see the official documents.
The readinessProbe is a readiness check. After the readiness check passes, the POD is added to the Endpoint. The container service monitors the change of the Endpoint before mounting node to the SLB backend.
PreStop time suggested Settings for the business processing through all the time needed for the remaining requests, terminationGracePeriodSeconds suggested Settings for preStop time to add more than 30 seconds.

The Service configuration

Cluster mode (externalTrafficPolicy: Cluster)

apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: default
spec:
  externalTrafficPolicy: Cluster
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: LoadBalancer
Copy the code

Container services mount all nodes in the cluster to the back end of the SLB (except when configured with the BackendLabel label), thus rapidly consuming the SLB quota. SLBS limit the number of SLBS that can be mounted on each ECS. The default value is 50. When quota is used up, new listeners and SLBS cannot be created.

In Cluster mode, if the current Node has no service POD, it forwards requests to other nodes. NAT is required for cross-node forwarding, so the source IP address is lost.

Local mode (externalTrafficPolicy: Local)

apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: default
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: LoadBalancer
We need to make sure that every node has at least one Running Pod during the update process
# Keep rolling Update in place as much as possible by modifying UpdateStrategy and using nodeAffinity
# * UpdateStrategy can set Max Unavailable to 0 to ensure that the previous Pod will not be stopped until a new Pod is started
Label fixed nodes for scheduling
# * Use nodeAffinity+ and more Replicas than the number of related nodes to ensure that new pods are built in situ as much as possible
# such as:
apiVersion: apps/v1
kind: Deployment
.
strategy:
  rollingUpdate:
    maxSurge: 50%
    maxUnavailable: 0%
  type: RollingUpdate
.
    affinity:
      nodeAffinity:
      	preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
          	matchExpressions:
            - key: deploy
              operator: In
              values:
              - nginx
Copy the code

Container services add the node corresponding to Service’s Pod to the SLB backend by default, so SLB quota consumption is slow. In Local mode, requests are directly forwarded to the node where the POD resides. Cross-node forwarding does not exist, so the source IP address can be reserved. In Local mode, you can perform in-place upgrade to avoid service interruption.

ENI mode (Ali Cloud unique mode)

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/backend-type: "eni"
  name: nginx
spec:
  ports:
  - name: http
    port: 30080
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer
Copy the code

Terway network mode, by setting the service. The beta. Kubernetes. IO/backend – type: “eni” annotation can create eni SLB model. In ENI mode, POD is directly mounted to the SLB backend without going through Kube-Proxy, so there is no problem of service interruption. The request is forwarded directly to the POD, so the source IP address can be retained.

The following table compares the three SVC modes.

conclusion

Terway Network Mode (recommended)

Select SVC in ENI mode + Set Pod graceful stop + ready detection.

Flannel network mode

If the number of SLBS in the cluster is small and there is no need to retain the source IP: Select cluster mode + Set Pod graceful termination + ready detection;
If there are a large number of SLBS in the cluster or the source IP needs to be reserved, select local mode + Pod graceful termination + ready detection + in-place upgrade (ensure that there is at least one Running Pod on each node during the update process).

Reference

Container lifecycle hooks
Configure Liveness, Readiness and Startup Probes
Access services through load balancing
Kubernetes Best Practice: Graceful abort
Create ability to do zero downtime deployments when using externalTrafficPolicy: Local, Graceful Termination for External Traffic Policy Local
Container service Kubernetes (ACK) application graceful online

Course recommended

In order for more developers to enjoy the dividends brought by Serverless, this time, we gathered 10+ Technical experts in the field of Serverless from Alibaba to create the most suitable Serverless open course for developers to learn and use immediately. Easily embrace the new paradigm of cloud computing – Serverless.

Click to free courses: developer.aliyun.com/learning/ro…

“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”

How to implement K8s zero-interrupt rolling update when updating applications?