This is the 9th day of my participation in Gwen Challenge
In the system life cycle, it is inevitable to do upgrade deployment, for key services, we should be able to do continuous service to complete the upgrade. In addition, the SLA standard of the service is generally above four 9, so it is very necessary to gracefully stop the service.
The original idea
The technology stacks used for our services are springboot2.0, springcloud2.0, and nacos. At the beginning, we came up with a solution. In SLB configuration, the health check port of all servers was changed to a different health check address for each item, which was forwarded to each server by domain name. The scheme is shown in the figure below:
Here are a few problems:
- A large number of clusters and servers. At present, there are about 20 online clusters with hundreds of servers. Each server needs to be recorded into SLB, and any additions or deletions need to be maintained once. It’s a lot of work and a lot of risk.
- The quality of service is degraded if the SLA detects the server on which the service is being released.
To solve the first problem, we consider periodically updating the SLB with scripts (SLB has relevant API interfaces). The second problem is that the release is a frequent operation, which can not be avoided or reduced if there is a need to release or bugfix. It is necessary to introduce elegant shutdown, which does not affect the interface service, but it is also necessary to take THE SLB offline before delivery, and then online after delivery.
It seems that there is a solution to all the problems, but since we have a gateway, why do we need to maintain a set of server information on the SLB, and the release of the SLB also need to maintain the SLB, if there are more than one SLB or later to do migration will have to change. Later, consider merging the health check port with the application port or providing a health check interface on the application port. If the two ports cannot be combined, try applying the port to provide a health check interface. It is necessary to turn over the source code of the actuator component. There are many online materials, which will not be described here.
The codes of the outer actuator output mainly include the following codes:
So we followed suit with our custom health check interface
Or you can pass the data through in the form of calling the health check interface
The above method is only applicable to the service module of microservices, but not to gateways and registries. Because the gateway is not only under the management of micro-services, but also under the SLB, the gateway needs to maintain THE SLB online and offline during the release. Refer to the SLB documentation for specific API interfaces.
Micro service elegant downtime final solution
The official provides elegant shutdown mode: actuator/shutdown. Two problems were found during the investigation:
-
Applications can be stopped, services are no longer provided, and services in the registry are offline. However, the grep process still exists.
-
Microservice Ribbon calls will still be made to the closed server until a meltdown occurs or the Ribbon updates the service list.
The first problem is that because there are ScheduledExecutor dependencies in the application that are not closed by the application context, this object will keep the JVM alive, and the corresponding ScheduledExecutor needs to be explicitly closed in the code. Many applications do not explicitly use ScheduledExecutor because of internal spring framework dependencies. For example, the Ribbon uses timers to refresh the list of microservices. To shutdown the process, remove spring cloud-related dependencies and run shutdown again.
The second problem is that the Ribbon retrieves the service list through a scheduled task pull, rather than active notification from the registry. When you can’t do graceful downtime with shutdown in springcloud, you need to take a different approach. According to the current situation to achieve elegant shutdown, need to meet two points:
-
Send a notification to the registry before shutting down the service. A server needs to be taken offline. The micro service Ribbon can no longer be invoked after updating the service list.
-
The NACOS console has the “offline” function for each microservice, and the corresponding interface can be offline by using the NACOS/V1 / NS /instance interface (the service name, application IP address and port need to be specified).
Or use http://ip:port/service-registry? Status =DOWN Notification (To facilitate script configuration, use the latter method. If the registry is Eureka, you also need to do UP after the application is published. Nacos does not need to do this.)
Note the hole:
Ribbon refresh service list the default time is 30 s, can pass parameters: ribbon. ServerListRefreshInterval adjustment for 10 s. This allows each microservice to be updated to the latest list of available services up to 10 seconds later.