Java server netty memory leakage problem
Memory leaks tend to be rare, but they can be very troublesome. Here through two online accidents experience summary, hope to share some experience for the future.
Say conclusion first: do not hand lu netty, encounter beforehand doubt whether hand lu netty
Two production environment accidents
- Leakage caused by uploading files
Due to the high performance requirements of the platform, the concurrency of pressure test is more than 1W, which cannot be achieved by ordinary Spring Boot Tomcat container. Therefore, we installed a layer of NetTY in Spring Boot (the new version already supports NetTY) to improve the concurrency of network connection layer.
Object result = ReflectionUtils.invokeMethod(method, bean, paramObjs);
Copy the code
Netty uploads files to netty. Netty uploads files to Netty. Netty uploads files to Netty
HttpPostRequestDecoder decoder = new HttpPostRequestDecoder(factory, request);
try {
Map<String, String> attrs = Maps.newHashMap();
while (decoder.hasNext()) {
InterfaceHttpData data = decoder.next();
try {
switch (data.getHttpDataType()) {
case FileUpload:
FileUpload fileUpload = (FileUpload) data;
if (fileUpload.isCompleted()) {
File file = new File("somedir", fileUpload.getFilename());
fileUpload.renameTo(file);
files.add(file);
}
break;
case Attribute:
Attribute attribute = (Attribute) data;
attrs.put(attribute.getName(), attribute.getValue());
break;
default:
break; }}catch (IOException e) {
e.printStackTrace();
} finally{ decoder.removeHttpDataFromClean(data); data.release(); }}}finally {
decoder.cleanFiles();
decoder.destroy();
}
Copy the code
Jmeter stress test results are as follows, easily supporting up to 28,000 concurrent sessions.
After going online to the production environment, it was found that the memory only increased after a period of time, so various memory analysis tools took turns to find that the stack memory was normal, but no result.
Later, some users reported that uploading pictures would fail. Finally, they checked the logs and found that many times there was insufficient memory out of the heap, so they adjusted the memory, but the problem remained. Finally, I suspected the problem of uploading pictures, so I removed the function of uploading files from the application layer and directly changed it to the Nginx Upload module. After observation, the out-of-heap memory problem was solved.
spring-cloud-gateway
Memory leak this is another platform, uniform adoptionspring-cloud
The technical architecture, after a long time of operation after the launch, also has the problem of slow memory growth.
According to the experience doubt is before developers to rewrite the netty related logic, then review the code, found only in the org. Springframework. Cloud. Gateway. Filter. GlobalFilter, did not directly on the bottom layer of netty related operations. Through the ANALYSIS of JVM, it is found that the stack memory is normal, mainly because the out-of-heap memory occupation is too large. Turn on memory leak monitoring:
-Dio.netty.leakDetection.level=advanced
Copy the code
Error log was found after pressure measurement tracking:
LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: #2: io.netty.buffer.AdvancedLeakAwareByteBuf.nioBuffer(AdvancedLeakAwareByteBuf.java:712) org.springframework.cloud.gateway.filter.NettyWriteResponseFilter.wrap(NettyWriteResponseFilter.java:115) org.springframework.cloud.gateway.filter.NettyWriteResponseFilter.lambda$null$1(NettyWriteResponseFilter.java:87) reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:100) org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90 ) reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:192) org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90 ) reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114) reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:256) reactor.netty.channel.FluxReceive.lambda$request$1(FluxReceive.java:135) io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) java.base/java.lang.Thread.run(Thread.java:834)Copy the code
Search one by one, and find that there is indeed a point to NettyWriteResponseFilter problem in the official issue of Spring-Cloud-Gateway
‘NettyWriteResponseFilter. Wrap never releases the the original pooled buffer in case of DefaultDataBufferFactory.’
Github.com/spring-clou…
Spring Cloud Hoxton.SR8 contains netty bug code.
<dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-gateway-core</artifactId> < version > 2.2.5. RELEASE < / version > < / dependency >Copy the code
In order not to affect other modules, the big version is not changed, only the spring-cloud-gateway-core version is upgraded from 2.2.5.RELEASE to 2.2.6.RELEASE, check the source code, the problem has been fixed. Finally, the out-of-heap memory problem was solved by pressure measurement.
The accident summary
One problem can be found from the above two accidents, that is, the problem can be found in the test phase, as long as the pressure test process is well designed and the test time is long enough, so it is essentially a problem in the management process.
Accident by-product
Troubleshooting for out-of-heap memory problems and stack memory problems is completely different, and the dump snapshot can hardly detect problems.
Traditional heap memory structure:
But out-of-heap memory has almost nothing to do with this graph.
- open
NativeMemoryTracking
monitoring
# enable parameter
-XX:NativeMemoryTracking=[off | summary | detail]
# Enable monitoring
jcmd 1 VM.native_memory baseline
# View live memory
jcmd 1 VM.native_memory summary.diff scale=MB
Copy the code
- open
io.netty.leakDetection.level
monitoring
# enable parameter
-Dio.netty.leakDetection.level=paranoid
Copy the code
JXRay
This third-party tool keeps track of out-of-heap memory
- Low-level memory debugging tools
Gperftools, BTrace, Jemalloc, pmap, etc. These tools analyze memory problems at the operating system level, so you need to have some knowledge of the C language.
The above is a summary of personal experience, welcome to exchange correction.