The gateway is up smoothly, and then it’s time for acceptance. I have accepted some basic operations in the form of Kong Gateway + Kubernetes, including creation, restart, upgrade, etc.

Back-end service background

The back-end service is structured as nginx + FPM, packaged as a single image and deployed in a K8s cluster. The link architecture is

The reason why an Ali Cloud default SLB is nested in front of Kong Gateway has been mentioned in the previous article, which will not affect the acceptance of this time.

Stability acceptance

I make concurrent requests during new deployments, restart instances, and rolling upgrades. In the case of multi-instance mutual backup, I expect to do this transparently for service stability. In the actual acceptance, the following problems were encountered:

5XX error occurred when the instance was newly created

We put nginx and FPM in one container, and there’s always a bit of a gap between the two when it comes to who runs first. The previous problem was that nginx started first and then started receiving requests, but the corresponding FPM was not ready yet, thus generating an error.

Improved: Run FPM before nginx, modify Dockerfile,

CMD service php5.6-fpm start &&nginx -g "daemon off;"Copy the code

During the upgrade, the Kong Gateway displays an instance DNS error.

During the upgrade, the back-end service is unavailable. The upgrade mode we choose is rolling upgrade, which is to create a new version of the instance, and then shrink an old version of the instance, and so on, until the upgrade is complete. In theory, this type of exception should not occur during the upgrade.

Cause analysis: Kong’s Controller needs to synchronize the start and stop information of Pods to Kong, so there may be inconsistent information between the two sides. The backend instance has been updated, but the target information of the old instance remains in Kong. At this point, if the request is forwarded to the target and the back-end instance is either Terminating or has been cleaned up, the request will fail.

My idea is to extend the actual graceful Shutdow time of the back-end instance so that when this happens, the old instance is still in Terminating state but still capable of processing requests. However, I couldn’t directly modify the program’s graceful shutdown logic, which I did through the preStop feature supported in the K8s Pod lifecycle.

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - sleep 10 && nginx -s quit
Copy the code

extended

Whether we will have the same problem when we upgrade Kong Deployment actually depends on whether the load balancer at the upper level of Kong can sense the start and stop changes of The Kong instance in time, but theoretically the same problem exists.

And Kong Deployment also has preStop enabled by default:

Image: Kong :2.2 imagePullPolicy: IfNotPresent lifecycle: preStop: exec: command: - /bin/sh - -c kong quitCopy the code

Let’s add a certain amount of “life duration” : 10 seconds, i.e. sleep 10 & Kong quit.

conclusion

When combining Kong gateway and Kubernetes, especially in the upgrade process, there are some defects in the allocation of requests. If we cannot start from Kong directly, we can try to combine K8s features.