In this article, we introduce the principles of the Netty memory model. There are few information about the Netty out-of-heap memory leakage caused by the improper use of Netty. Therefore, we write this article to introduce the knowledge points, diagnostic tools and troubleshooting ideas of the Netty out-of-heap memory

The phenomenon of

The main phenomenon of out-of-heap memory leakage is that the process occupies a high amount of memory (which can be queried using the top command in Linux), but the Java heap does not occupy a high amount of memory (which can be queried using the jmap command). This section describes how to troubleshoot memory leaks outside the Netty heap

Off-heap memory release low-level implementation

1 Java.nio off-heap memory release

Netty out-of-heap memory is implemented based on the native Java.nio DirectByteBuffer object, so it is necessary to understand how it is freed

The DirectByteBuffer provided by Java.nio provides the Clean () method of the Sun.misc.cleaner class. There are two ways to trigger the clean() method when a system call is made to free out-of-heap memory

  • (1) The application actively calls
ByteBuffer buf = ByteBuffer.allocateDirect(1);
((DirectBuffer) byteBuffer).cleaner().clean();
Copy the code
  • (2) Based on GC collection

The Cleaner class inherits Java.lang.ref. Reference, and the GC thread sets internal variables of Reference by setting pending variables for the head node of the linked list and discovered variables for the next node of the linked list. Organize unreachable Reference objects that can be reclaimed in a linked list

The internal daemon thread of Reference consumes data from the head of the list. If the Reference object consumed is also of the Cleaner type, the thread calls the clean() method (Reference#tryHandlePending()).

2. Netty noCleaner strategy

Before introducing the noCleaner strategy, you need to understand what DirectByteBuffer with Cleaner objects does when initialized:

The Cleaner object is initialized only in the DirectByteBuffer(int Cap) constructor, which checks if the current memory exceeds the allowed maximum out-of-heap memory (configurable by -xx :MaxDirectMemorySize)

If it does, it first tries to add unreachable Reference objects to the Reference linked list, and the internal daemons that rely on Reference trigger Cleaner run() methods that can be recycled for DirectByteBuffer associations

If memory is still insufficient, system.gc () is executed to trigger full GC to reclaim DirectByteBuffer objects in heap memory to trigger out-of-heap memory collection. If the limit is still exceeded, The thrown Java. Lang. OutOfMemoryError (code in Java. Nio. Bits# reserveMemory () method)

Netty introduced the noCleaner strategy in 4.1: Create a DirectByteBuffer object without Cleaner. The advantage of this is to bypass some of the extra overhead of the DirectByteBuffer constructor with Cleaner execution and the Cleaner Clean () method execution. When out-of-heap memory is not enough, Does not trigger system.gc (), improving performance

The main differences between hasCleaner DirectByteBuffer and noCleaner DirectByteBuffer are as follows:

  • Different constructors: noCleaner objects: Created by reflection calling private DirectByteBuffer(long Addr, int Cap) hasCleaner objects: created by new DirectByteBuffer(int Cap)

  • FreeMemory in different ways noCleaner objects: use unsafe.freememory (address); HasCleaner objects: Cleaner () methods that use DirectByteBuffer

* * note: Unsafe is a class in the Sun. misc package that provides native methods such as memory operations, object operations, and thread scheduling. These methods are important for making Java more efficient and enabling the underlying resources of the Java language. The program is no longer “safe”, so it is not officially recommended and may be removed in a future JDK release

Netty needs to check whether the current environment and environment configuration parameters allow noCleaner policies (the specific logic is in the static code block of PlatformDependent) during startup. For example, when running on Android, there is no Unsafe class. The noCleaner strategy is not allowed, and if not, the hasCleaner strategy is used

Note: you can call PlatformDependent. UseDirectBufferNoCleaner () method to check the current Netty whether the program use noCleaner strategy

If Netty, based on the hasCleaner policy, triggers cleaner.clean () through GC to automatically reclaim the out-of-heap memory, it can not worry about the call of bytebuf.release () without memory leakage.

Of course not, partly because automatic triggering is not real-time: ByteBuffer objects need to be collected by the GC thread to trigger, and if ByteBuffer objects are not recyclable until they reach the old age, the old age GC will not trigger until the sending frequency is low

On the other hand, Netty needs to perform other operations based on the bytebuf.release () method, such as pooling memory back to the pool, otherwise the object will always be marked as used by the pool

Bytebuf.release () trigger mechanism

There is a misconception in the industry that ByteBuf allocated by Netty framework is automatically released by the framework and services do not need to be released. ByteBuf created by services needs to be released, but Netty framework does not

The Netty framework calls bytebuf.release () in a few scenarios:

1 Inbound message processing

When an inbound message is processed, Netty creates a ByteBuf to read the message on a channel and trigger a call to the ChannelHandler on the pipeline. The application-defined ChannelHandler that uses ByteBuf is responsible for release().

public void channelRead(ChannelHandlerContext ctx, Object msg) {
    ByteBuf buf = (ByteBuf) msg;
    try{... }finally{ buf.release(); }}Copy the code

If the ByteBuf is not handled by the current ChannelHandler, it is passed to the next pipeline handler:

public void channelRead(ChannelHandlerContext ctx, Object msg) { ByteBuf buf = (ByteBuf) msg; . ctx.fireChannelRead(buf); }Copy the code

Commonly used we will through inheritance ChannelInboundHandlerAdapter defines the handler of inbound message processing, in this case if all hanler program not call release () method, the inbound message Netty finally will not release (), Can cause memory leaks;

When an exception is thrown in a Pipeline handler, the Netty framework finally catches the exception and bytebuf.release (); A complete process in AbstractNioByteChannel. NioByteUnsafe# read (), extracting key segments below:

try {
	do {
		byteBuf = allocHandle.allocate(allocator);
		allocHandle.lastBytesRead(doReadBytes(byteBuf));
		// The inbound message has been read
		if (allocHandle.lastBytesRead() <= 0) {
            // ...
			break;
		}
        // Trigger the handler on the pipline for processing
		pipeline.fireChannelRead(byteBuf);
		byteBuf = null;
	} while (allocHandle.continueReading());
	// ...
} catch (Throwable t) {
    Bytebuf.release () ¶
	handleReadException(pipeline, byteBuf, t, close, allocHandle);
} 
Copy the code

However, commonly used is defined through inheritance SimpleChannelInboundHandler inbound message processing, in the class will ensure that message was eventually release:

@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
	boolean release = true;
	try {
	    // This message is handled by the current handler
		if (acceptInboundMessage(msg)) {
			I imsg = (I) msg;
			channelRead0(ctx, imsg);
		} else {
		    // Not handled by the current handler, passed to the next pipeline handler
			release = false; ctx.fireChannelRead(msg); }}finally {
	    / / triggers the release
		if(autoRelease && release) { ReferenceCountUtil.release(msg); }}}Copy the code

2 Outbound message processing

Unlike inbound messages, which are automatically created by the Netty framework, outbound messages are typically created by the application and then call the channel-based write() or writeAndFlush() methods, which internally take care of calling the release() method of the incoming byteBuf

Note: The write() method has a problem with netty-4.0.0.cr2 pre-release and will not call bytebuf.release ().

3 Release (

  • (1) Reference count

There is a common misconception is that as long as the call the ByteBuf release () method, or ReferenceCountUtil. Release () method, the object’s memory is ensure released, it’s not

Since Netty’s ByteBuf reference count manages the life cycle of ByteBuf objects, ByteBuf inherits the ReferenceCounted interface and provides retain() and release() methods to increase or decrease the reference count value. When release() is called, The internal count is reduced to 0 to trigger the reclaim action

  • (2) derived ByteBuf

ByteBuf is derived from bytebuf.duplicate (), bytebuf.slice () and bytebuf.order (ByteOrder). The created ByteBuf shares the reference count with the original ByteBuf, and the release() method call of the original ByteBuf also causes these objects to be reclaimed

Instead, the bytebuf.copy () and bytebuf.readBytes (int) methods create objects that are not Derived ByteBuf and do not share reference counts with the original ByteBuf. The original ByteBuf call to the release() method does not cause these objects to be reclaimed

Out-of-heap memory size control parameter

Configuration of the parameters of the external memory size has – XX: MaxDirectMemorySize and – Dio.net ty. MaxDirectMemory, what is the difference between these two parameters?

  • -xx :MaxDirectMemorySize Is used to limit the size of memory outside the DirectByteBuffer heap of the hasCleaner policy in Netty. The default value is the maximum memory that the JVM can request from the operating system, if the memory itself is not limited. The value is long.max_value bytes (the default value is returned by Runtime.geTruntime ().maxMemory()), and the code is in the java.nio.bits #reserveMemory() method

Note: -xx :MaxDirectMemorySize Cannot limit the size of the DirectByteBuffer off-heap memory of the noCleaner policy in Netty

  • – Dio.net ty. MaxDirectMemory used to limit the noCleaner strategy of Netty DirectByteBuffer distribution of the maximum heap memory size, if the value is 0, then use hasCleaner strategy, The code is located in the PlatformDependent#incrementMemoryCounter() method

Out-of-heap memory monitoring

How do I get the usage of off-heap memory?

1 Code Tools

  • (1) hasCleaner DirectByteBuffer monitoring for hasCleaner policy DirectByteBuffer, java.nio.Bits class is to record the use of out-of-heap memory, but this class is package level access, can not be directly obtained. This can be obtained through mxBeans

Note: ** MxBeans, a series of special beans provided by Java for monitoring statistics, through different types of Mxbeans can obtain JVM process memory, thread, class loading information and other monitoring metrics

List<BufferPoolMXBean> bufferPoolMXBeans = ManagementFactoryHelper.getBufferPoolMXBeans();
BufferPoolMXBean directBufferMXBean = bufferPoolMXBeans.get(0);
// The number of hasCleaner directBuffers
long count = directBufferMXBean.getCount();
// The out-of-heap memory footprint of hasCleaner DirectBuffer, in bytes
long memoryUsed = directBufferMXBean.getMemoryUsed();
Copy the code

Note: MappedByteBuffer: Is another type of out-of-heap memory ByteBuffer derived from mmap memory mapping (an implementation of zero copy) based on FileChannelImpl. Map. Can pass ManagementFactoryHelper. GetBufferPoolMXBeans (). Get outside the reactor (1) access to the memory monitoring indicator

  • (2) noCleaner DirectByteBuffer monitoring of Netty noCleaner DirectByteBuffer monitoring is simpler, directly through PlatformDependent. UsedDirectMemory () to visit

2 The Netty provides a tool for detecting memory leaks

Netty also provides a memory leak detection tool, which can be used to detect memory leaks when ByteBuf objects are reclaimed by GC but the memory managed by ByteBuf is not released. However, this tool does not apply to memory leaks when ByteBuf objects are not reclaimed by GC, such as a backlog of tasks

To help users detect memory leaks, Netty provides four detection levels:

  • Disabled Disables memory leak detection
  • Simple detects leaks at a sampling rate of about 1%, the default level
  • Advanced has the same sampling rate as Simple, but displays a detailed leak report
  • Paranoid sampling rate is 100%, display report information as advanced

Use the command line to set the parameters:

Dio.net - ty. LeakDetectionLevel = [level]Copy the code

An example program is as follows to set the detection level to paranoid:

// -Dio.netty.leakDetectionLevel=paranoid
public static void main(String[] args) {
	for (int i = 0; i < 500000; ++i) {
		ByteBuf byteBuf = UnpooledByteBufAllocator.DEFAULT.buffer(1024);
		byteBuf = null;	
	}
	System.gc();
}
Copy the code

You can see the console output leak report:

December 27, 2019 io.net ty 8:37:04 morning. Util. ResourceLeakDetector reportTracedLeak severe: LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:96) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115) org.caison.netty.demo.memory.BufferLeaksDemo.main(BufferLeaksDemo.java:15)Copy the code

The principle of memory leakage is to use weak references. When weak references are created, a refQueue needs to be specified. By wrapping ByteBuf objects with weak references (the code entry is in the AbstractByteBufAllocator#toLeakAwareBuffer() method)

When GC occurs, if the GC thread detects that the ByteBuf object is only associated with the WeakReference object, the WeakReference will be added to the refQueue. When ByteBuf memory is released normally, WeakReference’s clear() method will be called to remove the reference to ByteBuf, and the subsequent GC thread will not add the WeakReference to the refQueue.

When Netty creates ByteBuf each time, based on the sampling rate, the WeakReference object in refQueue will be polled when the sampling hits. The ByteBuf of the non-null WeakReference association returned by the polling is the leaked out-of-heap memory (the code entry is located in the ResourceLeakDetector#track() method)

Graphic chemical tools

On the basis of code acquisition of off-heap memory, custom access to some monitoring tools, such as the popular Prometheus or Zabbix, can periodically detect the acquisition and draw graphs

The underlying principle is to access the monitoring indicators in MXBean, and can only obtain the usage of hasCleaner DirectByteBuffer

In addition, out-of-heap memory allocations resulting from JNI calls can be monitored using Google-PerfTools

Out-of-heap memory leak diagnosis

There are many specific causes of out-of-heap memory leakage. This paper first introduces the monitoring of task queue accumulation, and then introduces the general out-of-heap memory leak diagnosis idea

1 Task queues accumulate

NioEventLoop Queue taskQueue NioEventLoop Queue taskQueue

  • (1) User-defined common tasks
ctx.channel().eventLoop().execute(runnable);
Copy the code
  • (2) Write to channel
channel.write(...)
channel.writeAndFlush(...)
Copy the code
  • (3) User-defined scheduled tasks
ctx.channel().eventLoop().schedule(runnable, 60, TimeUnit.SECONDS);
Copy the code

A memory leak occurs when messages cannot be written to a channel and then released due to a backlog of tasks in the queue

The diagnosis procedure is to monitor the number of tasks in the task queue, the size of the backlogged ByteBuf, and the information about tasks. The monitoring program is as follows (code address github.com/caison/cais…). :

public void channelActive(ChannelHandlerContext ctx) throws NoSuchFieldException, IllegalAccessException {
	monitorPendingTaskCount(ctx);
	monitorQueueFirstTask(ctx);
	monitorOutboundBufSize(ctx);
}
/** Monitors the number of tasks piled up in the task queue. The tasks in the task queue include I/O read/write tasks and service program submission tasks
public void monitorPendingTaskCount(ChannelHandlerContext ctx) {
	int totalPendingSize = 0;
	for (EventExecutor eventExecutor : ctx.executor().parent()) {
		SingleThreadEventExecutor executor = (SingleThreadEventExecutor) eventExecutor;
		// Note that the pendingTasks() method has a bug that causes thread blocking in Netty4.1.29
		/ / reference https://github.com/netty/netty/issues/8196
		totalPendingSize += executor.pendingTasks();
	}
	System.out.println("Total number of tasks in task queue =" + totalPendingSize);
}
/** Monitors the class information of the first task in each stacked task queue */
public void monitorQueueFirstTask(ChannelHandlerContext ctx) throws NoSuchFieldException, IllegalAccessException {
	Field singleThreadField = SingleThreadEventExecutor.class.getDeclaredField("taskQueue");
	singleThreadField.setAccessible(true);
	for (EventExecutor eventExecutor : ctx.executor().parent()) {
		SingleThreadEventExecutor executor = (SingleThreadEventExecutor) eventExecutor;
		Runnable task = ((Queue<Runnable>) singleThreadField.get(executor)).peek();
		if (null! = task) { System.out.println("First task information in the task queue:"+ task.getClass().getName()); }}}/** The size of the backlogged byteBuf in the queue that monitors outbound messages */
public void monitorOutboundBufSize(ChannelHandlerContext ctx) {
	long outBoundBufSize = ((NioSocketChannel) ctx.channel()).unsafe().outboundBuffer().totalPendingWriteBytes();
	System.out.println("Size of backlogged BUF in outbound message queue" + outBoundBufSize);
}
Copy the code
  • Note: The above program must be at least based on Netty4.1.29 version to use, otherwise there will be performance problems

How to deal with the time-consuming business logic code in the actual Netty based business development?

Conclusion first, it is recommended to define a new set of business thread pools and submit time-consuming services to the business thread pool

The Netty worker thread (NioEventLoop), in addition to being a NIO thread that reads connection data, performs channelHandler logic on pipeline, and consumes tasks submitted in taskQueue, including channel write operations.

Submitting time-consuming tasks to the taskQueue also affects NIO thread processing and tasks in the taskQueue, so it is recommended to isolate processing in a separate business thread pool

2 General Diagnosis roadmap

The causes of memory leaks outside the Netty heap are various. For example, code misses writing calls to release(). The reference count value of ByteBuf was added via retain() but was not cleared when release() was called; Failed to release() because of Exception; The ByteBuf reference object is prematurely GC, and the associated out-of-heap memory is not reclaimed, etc., which cannot be listed here, so we try to provide a set of general diagnostic ideas for reference

First, you need to be able to reproduce the problem. In order not to affect the running of online services, try to simulate in the test environment or local environment. However, these environments typically do not have as much concurrency as online, and requests can be simulated using a pressure gauge tool

For some scenarios that cannot be simulated, Linux traffic replication tools such as Gor, TCPreplay, and tcpCopy can be used to copy online traffic to the test environment without affecting online services

After being able to reappear, the next step is to locate the problem. First, try to find the problem directly through the monitoring means and log information introduced earlier. If not, you need to locate the trigger condition of the out-of-heap memory leak, but sometimes the application is large and provides many traffic entry points, so it is impossible to troubleshoot one by one.

In an off-line environment, you can comment out the traffic entry half at a time, then run to check if the problem still exists, and if so, comment out the remaining half again. This dichotomy strategy can quickly locate the trigger condition of the problem through several attempts

After locating the trigger condition, check the processing logic of the trigger condition in the program. If the process is very complex and cannot be seen directly, you can also continue to comment out part of the code and conduct dichotomy investigation until the specific problem code block is finally found

The idea is that problem recurrence, monitoring, and elimination can also be used to troubleshoot other problems, such as memory leaks in the heap, 100% CPU, or service process failure

conclusion

The whole article focuses on introducing knowledge points and theories, but lacks practical links. Here are some excellent blog posts:

Outside the netty heap memory leak screening feast flash handy with how to debug heap memory leak outside www.jianshu.com/p/4e96beb37…

The Netty measures to prevent a memory leak, Netty authoritative guide to the author, huawei lin-feng memory leak knowledge sharing mp.weixin.qq.com/s/IusIvjrth…

“Mystery tracking: Spring Boot screen memory leak”, the case of the Meituan technology group DuiJi share mp.weixin.qq.com/s/aYwIH0TN3…

“Netty entry and actual combat: imitation write micro letter IM instant messaging system”, the flash nuggets small volume (pay), the individual is to learn this column entry Netty juejin.cn/book/684473…

More exciting, welcome to the public number distributed system architecture