1. Background
-
Problem Background description
According to the feedback from the operation and maintenance colleagues, the service of the requests received by the old instance is interrupted and the processing is not complete. To ensure the quality of service, the old instance is expected to complete the processing of the existing requests before the release. In essence, it is a question of elegant offline of microservices.
-
Why should we solve this problem
First, the possible impact is as follows: (1) Service processing interruption leads to incomplete data, which is a fatal hazard. For example, file data is interrupted and file data is incomplete. Payment completed, local service processing failed. Applications can ensure business processes through some means, such as business compensation, idempotent design, and strict control of state machines, which require greater considerations in programming. Even if these can be achieved, maintenance costs will be brought, and problems will need to be checked, or even manual compensation to ensure data integrity. (2) has been offline services continue to provide services Micro services are generally provided by the registry of service discovery mechanism automatically, within a certain period of time, even if the service offline, registry, and service with time Windows, eureka default is 30 s, for example, within the time window, the registry will not referral service, upstream invokes the service lead to a large number of error, Alarms or even circuit breakers are generated, and a large number of service requests fail.
There is no absolute elegant offline service, on the one hand, there are power outage, pull power and other external factors; After kill -15, applications cannot complete services within the normal offline window. Therefore, kill -9 is triggered, and services are interrupted. But we can do it gracefully within the scope of our control.
Kill essentially sends a signal, and the semaphore is an asynchronous communication mechanism between processes. Kill 15, the system sends a SIGTERM (15) signal to the application, which can be executed, blocked and ignored. After receiving the signal, the application can do many things, even decide not to terminate; Kill 9: The system sends the SIGKILL (9) signal, and the operating system kernel completes the process killing operation. The signal cannot be ignored or blocked, so the application program will terminate immediately. During the release, kill 15 is executed first and a reasonable time window is set for offline processing. If services are not processed within the time window, kill 9 kills the process forcibly.
2. Gracefully log out to solve the problem
-
Processing of accepted requests
(1) The background service logic is executed properly
(2) The caller receives the signal that normal processing has completed -
No new requests are accepted
The current deployment architecture and publishing mode, publishing is a rolling publishing mode, K8S has implemented the mechanism to automatically remove services from the registration list in the event of downtime.
K8s of pod is a terminationGracePeriodSeconds configuration items (default is 30 seconds), when using RollingUpdate update application, a new pod up and pass the health check, will inform k8s terminate a pod. Pod is set to the Terminating state and removed from the service/endpoint list of K8s. Pod stops getting new traffic. K8S sends a SIGTERM (kill 15) signal to the container in Pod to notify the application process.Copy the code
This saves a lot of trouble to the application level, otherwise have to logoff when a hook (shutdownHook) removes service initiative from registration list, not so simple, container to restart services, if no change before and after the IP, have exposed outside interface, configuration script execution will join the list of registered service, need a complete set of solutions to guarantee.
DiscoveryManager.getInstance().shutdownComponent();
Copy the code
3. Problem verification
-
The validation test
The test environment is consistent with production: Springboot v1.5.7.RELEASE embedded jetty-9.4.6.v20170531, and the test command kill 15.
Test code:@GetMapping("/test")
public String test1(@RequestBody Map<String, Integer> params) throws InterruptedException {
TimeUnit.SECONDS.sleep(params.get("s"));
System.out.println(Thread.currentThread().getName() + " finished");
return "m1";
}
Copy the code
Jetty shutdown log: Can see that the system will request processing is completed, but you can't get a response to the 2021-05-30 client 12:47:26. 1235-190 the INFO [main] S.B.C.E.J.J ettyEmbeddedServletContainer: Jetty started on Port (s) 8080 (HTTP /1.1) 2021-05-30 12:47:41.030 INFO 1235 -- [Thread-14] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@709ba3fb: startup date [Sun May 30 12:47:25 CST 2021]; The root of the context hierarchy 12:47:41 2021-05-30. 1235-033 the INFO [14] Thread - O.S.J.E.A.A nnotationMBeanExporter: Unregistering JMX-Exposed Beans on shutdown 2021-05-30 12:47:41.047 INFO 1235 -- [Thread-14] o.e.jetty.server.AbstractConnector : Stopped ServerConnector@3185fa6b{HTTP/1.1,[HTTP/1.1]}{0.0.0.0:8080} 2021-05-30 12:47:41.047 INFO 1235 -- [thread-14] org.eclipse.jetty.server.session : Stopped scavenging 12:47:41 2021-05-30. 1235-049 the INFO [14] Thread - O.E.J.S.H.C ontextHandler. Application: Destroying Spring FrameworkServlet 'dispatcherServlet' 2021-05-30 12:47:41.049 INFO 1235 -- [thread-14] o.e.jetty.server.handler.ContextHandler : Stopped o.s.b.c.e.j.JettyEmbeddedWebAppContext@7651218e{/,[file:///private/var/folders/kt/428mmnyd6vj_bgzz5xrfgtfm0000gn/T/jetty - the docbase. 8898877921167659801.8080 /], UNAVAILABLE} qtp1805164661-19 finished Process finished with exit code 143 15: (interrupted by signal SIGTERM) client response. Org. Apache HTTP. NoHttpResponseException: localhost: 8080 failed to respondCopy the code
Tomcat outage log: can see received instruction system, business interruption 12:44:27 2021-05-30. 1177-042 the INFO [main] S.B.C.E.T.T omcatEmbeddedServletContainer: Tomcat started on port(s): 8080 (HTTP) 12:44:27 2021-05-30. 1177-046 the INFO [main] C.E.C.T est. BrpcClientTestApplication: Started BrpcClientTestApplication in 1.634 seconds (JVM running 2.076) for the 2021-05-30 12:44:39. 1177-702 the INFO [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring FrameworkServlet 'dispatcherServlet' 2021-05-30 12:44:39.702 INFO 1177 -- [NIO-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': Initialization started the 2021-05-30 12:44:39. 733 INFO - 1177 [nio - 8080 - exec - 1] O.S.W eb. Servlet. DispatcherServlet: FrameworkServlet 'dispatcherServlet': Initialization completed in 31 ms 2021-05-30 12:44:41.519 INFO 1177 -- [Thread-6] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@21a947fe: startup date [Sun May 30 12:44:25 CST 2021]; The root of the context hierarchy 12:44:41 2021-05-30. 1177-522 the INFO [Thread - 6] O.S.J.E.A.A nnotationMBeanExporter: Unregistering JMX-exposed beans on shutdown Process finished with exit code 143 (interrupted by signal 15: SIGTERM) client response org. Apache. HTTP. NoHttpResponseException: localhost: 8080 failed to respondCopy the code
-
Springboot’s support for elegant shutdown
According to close the print log as you can see, close related classes AnnotationConfigEmbeddedWebApplicationContext, parent AbstractApplicationContext provides support for the expansion of the elegant offline, When Springboot is closed, if a request is not fully answered, different containers will have different results. In fact, the closing process is left to the container itself. Here's the key source:/**
* Register a shutdown hook with the JVM runtime, closing this context
* on JVM shutdown unless it has already been closed at that time.
* <p>Delegates to {@code doClose()} for the actual closing procedure.
* @see Runtime#addShutdownHook
* @see #close()
* @see #doClose()
*/
@Override
public void registerShutdownHook(a) {
if (this.shutdownHook == null) {
// No shutdown hook registered yet.
this.shutdownHook = new Thread() {
@Override
public void run(a) {
synchronized(startupShutdownMonitor) { doClose(); }}}; Runtime.getRuntime().addShutdownHook(this.shutdownHook); }}/**
* Actually performs context closing: publishes a ContextClosedEvent and
* destroys the singletons in the bean factory of this application context.
* <p>Called by both {@code close()} and a JVM shutdown hook, if any.
* @see org.springframework.context.event.ContextClosedEvent
* @see #destroyBeans()
* @see #close()
* @see #registerShutdownHook()
*/
protected void doClose(a) {
if (this.active.get() && this.closed.compareAndSet(false.true)) {
if (logger.isInfoEnabled()) {
logger.info("Closing " + this);
}
LiveBeansView.unregisterApplicationContext(this);
try {
// Publish shutdown event.
publishEvent(new ContextClosedEvent(this));
}
catch (Throwable ex) {
logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex);
}
// Stop all Lifecycle beans, to avoid delays during individual destruction.
try {
getLifecycleProcessor().onClose();
}
catch (Throwable ex) {
logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
}
// Destroy all cached singletons in the context's BeanFactory.
destroyBeans();
// Close the state of this context itself.
closeBeanFactory();
// Let subclasses do some final clean-up if they wish...
onClose();
this.active.set(false); }}Copy the code
-
Why does Tomcat not gracefully shut down by default?
Tomcat does not do special processing for the service shutdown event, the service processing is interrupted, and the caller reports an error.
-
What’s wrong with jetty’s elegant shutdown?
Jetty’s server.stop () method closes connections and then processes the pending request in the queue. When the request completes, the response cannot be written back because the connection has been closed.
4. Solution for graceful shutdown of the Springboot embedded container
-
Spring Boot 2.3 or later, out of the box
## Enable graceful shutdown. If the default value is IMMEDIATE, stop immediately
server.shutdown=graceful
## Grace period of downtime
spring.lifecycle.timeout-per-shutdown-phase=30s
Copy the code
The 2021-05-30 13:42:20. 3103-211 the INFO/extShutdownHook O.S.B.W eb.. Embedded jetty. JettyWebServer: Commencing graceful shutdown. Waiting for active requests to complete QTP1662592920-27 finished 2021-05-30 13:42:25.545 INFO 3103 --- [ jetty-shutdown] o.s.b.web.embedded.jetty.JettyWebServer : Graceful shutdown completeCopy the code
-
The Spring Boot version is later than 2.3
Equivalent to manual implementation of high version of the function, a little bit more perfect, can do the switch and off time can be configured
(1) Jetty implementation:First look at the jetty core class org. Eclipse. Jetty. Server. The server two key method/** * Set a graceful stop time. * The {@link StatisticsHandler} must be configured so that open connections can * be tracked for a graceful shutdown. * @see org.eclipse.jetty.util.component.ContainerLifeCycle#setStopTimeout(long) */ @Override public void setStopTimeout(long stopTimeout) { super.setStopTimeout(stopTimeout); } /** Set stop server at shutdown behaviour. * @paramstop If true, this server instance will be explicitly stopped when the * JVM is shutdown. Otherwise the JVM is stopped with the server running. *@see Runtime#addShutdownHook(Thread) * @see ShutdownThread */ public void setStopAtShutdown(boolean stop) Copy the code
Solution: Jetty uses StatisticsHandler to monitor the number of requests. When you close an application, the number of requests goes from N to O.@Bean
public EmbeddedServletContainerFactory jettyEmbeddedServletContainerFactory(a) {
JettyEmbeddedServletContainerFactory factory = new JettyEmbeddedServletContainerFactory();
factory.addServerCustomizers(server -> {
server.setStopAtShutdown(false);
StatisticsHandler statisticsHandler = new StatisticsHandler();
statisticsHandler.setHandler(server.getHandler());
server.setHandler(statisticsHandler);
server.setStopTimeout(30000); // For simplicity, write for 30 seconds
});
return factory;
}
Copy the code
The 2021-05-30 12:50:24. 1282-410 the INFO [main]. S.B.C.E.J.J ettyEmbeddedServletContainer: Jetty started on Port (s) 8080 (HTTP /1.1) 2021-05-30 12:50:51.196 INFO 1282 -- [Thread-13] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@709ba3fb: startup date [Sun May 30 12:50:23 CST 2021]; The root of the context hierarchy 12:50:54 2021-05-30. 1282-199 the INFO/Thread - 13 O.S.J.E.A.A nnotationMBeanExporter: Unregistering JmX-Exposed beans on shutdown QTP548482954-17 Finished 2021-05-30 12:51:16.503 INFO 1282 -- [thread-13] o.e.jetty.server.AbstractConnector : Stopped ServerConnector@545de5a4{HTTP/1.1,[HTTP/1.1]}{0.0.0.0:8080} 2021-05-30 12:51:16.503 INFO 1282 -- [thread-13] org.eclipse.jetty.server.session : Stopped scavenging 12:51:16 2021-05-30. 1282-505 the INFO/Thread - 13 O.E.J.S.H.C ontextHandler. Application: Destroying Spring FrameworkServlet 'dispatcherServlet' 2021-05-30 12:51:16.505 INFO 1282 -- [thread-13] o.e.jetty.server.handler.ContextHandler : Stopped o.s.b.c.e.j.JettyEmbeddedWebAppContext@32f0fba8{/,[file:///private/var/folders/kt/428mmnyd6vj_bgzz5xrfgtfm0000gn/T/jetty - the docbase. 2324546490925478110.8080 /], UNAVAILABLE} Process finished with exit code 143 (interrupted by signal 15: 200 OK Date: Sun, 30 May 2021 04:50:46 GMT Content-type: application/json; charset=utf-8 Content-Length: 2 m1 Response code: 200 (OK); Time: 30145ms; Content length: 2 bytesCopy the code
(2) Tomcat implementation:
@Bean
public EmbeddedServletContainerCustomizer tomcatCustomizer(a) {
return container -> {
if (container instanceof TomcatEmbeddedServletContainerFactory) {
((TomcatEmbeddedServletContainerFactory) container).addConnectorCustomizers(newGracefulShutdown()); }}; }private static class GracefulShutdown implements TomcatConnectorCustomizer.ApplicationListener<ContextClosedEvent> {
private static final Logger log = LoggerFactory.getLogger(GracefulShutdown.class);
private volatile Connector connector;
@Override
public void customize(Connector connector) {
this.connector = connector;
}
@Override
public void onApplicationEvent(ContextClosedEvent event) {
this.connector.pause();
Executor executor = this.connector.getProtocolHandler().getExecutor();
if (executor instanceof ThreadPoolExecutor) {
try {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
threadPoolExecutor.shutdown();
if(! threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
log.warn("Force shutdown ..."); }}catch(InterruptedException ex) { Thread.currentThread().interrupt(); }}}}Copy the code
5. Matters needing attention in development
- Assess how long each service outage takes to be offline to complete processing requests
- To ensure that the application is offline for a specified period of time requests are processed and responsive to the caller
- Manual graceful offline of custom thread resources Service processing Thread resources release example:
@Bean
public ExecutorService bizExecutorService(a) {
ExecutorService executorService = Executors.newFixedThreadPool(10);
/ / shutdownAndAwaitTermination can refer to guava thread pool elegant shut down
Runtime.getRuntime().addShutdownHook(new Thread(() -> shutdownAndAwaitTermination(executorService, 10l, TimeUnit.SECONDS)));
return executorService;
}
Copy the code
- Monitor production releases for business requests that are not completed within the specified time and optimize performance
6. References
Github.com/spring-proj… Github.com/eclipse/jet… Github.com/timpeeters/…