A simple introduction

Most of the time, we need to safely shut down the service, that is, to finish the unfinished work. For example, it is necessary to stop some dependent services, output some logs, and send some signals to other application systems to ensure the high availability of the system.

In particular, the Mercury gateway, which manages a large number of TCP connections, must not be shut down by brute force. It must ensure that all existing tasks are processed and that no new requests come in during the closing process.

Elegant closed

Gracefully shutting down services is usually done using the Runtime.addShutdownhook (Thread hook) method provided by the JDK. The JVM provides a ShutdownHook mechanism that allows the JVM to receive a shutdown notification from the system. Methods in ShutdownHook are called to complete the cleanup operation and exit the application smoothly.

  1. Program exit normally
  2. Using the System. The exit ()
  3. Interrupt triggered by terminal using Ctrl+C
  4. System shut down
  5. OutofMemory downtime
  6. Kill the process with Kill PID (Kill -9 will not be called)

Things like Spring, Dubbo, and so on are gracefully closed based on this implementation

Spring gracefully closes

Spring framework itself is also dependent on shutdown hook to perform elegant downtime, by calling the AbstractApplicationContext registerShutdownHook method

In the doClose method, the following things are done:

  1. Publish ContextClosedEvent event (Spring uses synchronous event publishing by default)
  2. performLifecycleProcessor(Many Spring-related jars use LifecycleProcessor to do some cleanup when the container is closed, such as Kafka Listener)
  3. Destroy all beans

Because spring beans are used in the Mercury business processing Handler (mainly RocketMQ-related), the time to close the connection and clean up Netty resources must be before the Spring container is closed.

As you can see from the doClose process above, simply close the Netty resource before destroyBeans.

Spring provides the ApplicationListener interface, which developers can implement to listen for the ContextClosedEvent closure event of the Spring container. That’s how I chose to do it. Because there is no change in the Spring of the default event publishing SimpleApplicationEventMulticaster, synchronous execution onApplicationEvent method, thus ensure the after closing Netty related resources to destroy the bean.

Netty closes gracefully


The main process for gracefully closing Netty related resources is as follows:

  1. Shut down the server channel
  2. The server sends a reconnection message
  3. The server actively closes the undisconnected connection
  4. Wait for all channels to close
  5. Call Netty’s shutdownGracefully

Shut down the server channel


The server disables the NioServerSocketChannel, cancels port binding, and stops the service

Call channel.close() directly;

The close() method here is actually implemented in AbstractChannel.

Inside the method, the close() method of the corresponding ChannelPipeline is called to propagate the close event across the pipeline. The close event is an Outbound event, and therefore is propagated from the tail node to the head node, which is then closed using Unsafe:

Within unsafe, Java native NIO ServerSocketChannel closure is eventually performed

Send a reconnection message


When the client receives the reconnect message from the server, it will disconnect the current connection and re-establish the connection.

There will be tens of thousands of connections on a single machine of Mercury gateway, so the reconnect message will not be sent to all the connections at one time. Otherwise, it may cause the tens of thousands of clients to initiate a new connection at the same time, resulting in a sudden increase in CPU usage of other gateway machines. So it’s better to smooth things out here.

We would only send the Reconnect message to a specified number of connections simultaneously, and then wait a few seconds before sending it again.

Actively close an undisconnected connection


Although the Server sends the reconnection message, sometimes the client does not receive the message due to various network reasons, or the client receives the message but is not disconnected due to some reasons, so the connection is not re-established.

In this case, we need to close the server actively, which is also smoothed, closing a specified number of channels at a time.

Later, it kept checking whether there was still a valid connection. If so, it would wait for 250 ms for re-testing, but the detection time was 3 seconds at most

The Netty shutdownGracefully


Netty itself provides a graceful exit, which is The EventExecutorGroup’s shutdownGracefully() method

The NioEventLoopGroup is actually a thread group of NioEventLoop, and its graceful exit is relatively simple, iterating directly through the EventLoop array, looping to call their shutdownGracefully.

Final call is SingleThreadEventExecutor shutdownGracefully inside

Here is the core code

This code allows for the simultaneous closure of multiple calls, using spin + CAS to modify the thread state associated with the current NioEventLoop (volatile variable state).

No specific shutdown is performed here. The key is to change the thread state to ST_SHUTTING_DOWN.

The thread associated with NioEventLoop has a total of five states:


private static final int ST_NOT_STARTED = 1// The thread is not started yet

private static final int ST_STARTED = 2// The thread has been started

private static final int ST_SHUTTING_DOWN = 3// Thread is closing

private static final int ST_SHUTDOWN = 4// The thread is closed

private static final int ST_TERMINATED = 5// The thread has terminated
Copy the code

After the state changes are complete, the rest of the operations are done primarily in the NioEventLoop

The most important element in NioEventLoop is the Run method, which loops through select, IO events, and tasks.

At the end of each loop, the thread state is checked, or if ST_SHUTTING_DOWN, the closeAll method is executed

The main thing to do is to close all channels registered with the selector, loop through the Channel Unsafe close method, but some channels are sending messages and need to be closed later.

  1. Check whether messages are being sent on the link. If so, encapsulate the shutdown operation into a Task and put it in the eventLoop for later execution
  2. The send queue is emptied and no new messages are allowed to be sent
  3. Call the close method of SocketChannel to close the link
  4. Invokes Pipeline’s fireChannelInactive to trigger a link shutdown notification event
  5. Call deregister to cancel the SelectionKey from the multiplexer

After the NioEventLoop has executed closeAll (), you need to call confirmShutdown to see if you can really exit

  1. Cancel all scheduled tasks
  2. Execute all tasks in the TaskQueue
  3. Execute ShutdownHook registered in NioEventLoop
  4. Determines whether the specified timeout period for graceful exit has been reached, and exits immediately if the timeout period has been reached or passed
  5. If the timeout period does not reach, do not exit the system. Check whether a new task is added every 100 ms and continue to execute the task

The runAllTasks method is already called in the Run method of the NioEventLoop, and is then called again in confirmShutdown.

This is because Netty has an I/O ratio, which is 50 by default, in order to prevent scheduling tasks or user-defined tasks from taking up too much of the NioEventLoop thread scheduling resources. Represents the ratio of NioEventLoop thread I/O to non-I /O operation time. Due to the execution time limit, scheduled tasks and common tasks that have expired may not be completed and need to wait for the next Selector poll to continue execution. The runAllTasks method is called again in confirmShutdown because tasks that should have been executed but were not completed need to be cleaned up before the thread exits.

At this point, the Netty thread just officially quit.