Not envy mandarin duck not envy fairy, a line of code half a day. GZH ID: Xjjdog

This article is “Java Memory Failure? Just Because you’re Not Handsome enough!” A continuation of this article. The first part focuses on theory, while this part focuses on practice. For memory troubleshooting, the pain of theory, practice is also painful, there is no clean place.

According to? Because memory overflow is a permanent problem for Java programmers.

Overflows can be interpreted in a variety of ways. There are fine overflow, there are buffer overflow attacks, and there is another type of overflow called leader overflow. Not knowing what overflow theory is, XjjDog is here to generalize.

The Spillover Theory of Leadership

What is important about memory overflow?

In fact, an overflow of memory is like a traffic accident. The cause of accidents is specific services; The party that handles the accident is the programmer involved. One of the most important parts of the process is taking photographs at the scene of the accident.

If there’s no photos, no dashcam and no evidence, it’s just the mouth, and you can’t trust it.

What is the most important thing in memory troubleshooting? Information gathering, of course, leaving something to support our investigation. ** Don’t be interested in memory troubleshooting itself, it is self-abuse.

There are many tools to help us locate the problem, but only if you leave it behind. The following article was left by XJjDog a long time ago and you may have missed it due to the title, but these tools can help us quickly locate problems.

What did medical examiners do before moving Java processes to the anatomy Table?

ss -antp > $DUMP_DIR/ss.dump 2>&1
netstat -s > $DUMP_DIR/netstat-s.dump 2>&1
top -Hp $PID -b -n 1 -c >  $DUMP_DIR/top-$PID.dump 2>&1
sar -n DEV 1 2 > $DUMP_DIR/sar-traffic.dump 2>&1
lsof -p $PID > $DUMP_DIR/lsof-$PID.dump
iostat -x > $DUMP_DIR/iostat.dump 2>&1
free -h > $DUMP_DIR/free.dump 2>&1
jstat -gcutil $PID > $DUMP_DIR/jstat-gcutil.dump 2>&1
jstack $PID > $DUMP_DIR/jstack.dump 2>&1
jmap -histo $PID > $DUMP_DIR/jmap-histo.dump 2>&1
jmap -dump:format=b,file=$DUMP_DIR/heap.bin $PID > /dev/null  2>&1
Copy the code

GC Log Configuration

But not every time something goes wrong, you’re there for the machine. Rely on artificial also cannot guarantee real time. Therefore, it is highly recommended that you print the GC logs in more detail so that when problems occur, you will be more comfortable.

In fact, this requirement seems to me mandatory.

A lot of people come up and say, “I’m out of memory. But you and it want some log information, to stack, to save snapshots in the field. All have no. This is purely for laughs.

Here are the JDK8 or below GC log parameters, which are still quite long.

#! /bin/sh
LOG_DIR="/tmp/logs"
JAVA_OPT_LOG=" -verbose:gc"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -XX:+PrintGCDetails"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -XX:+PrintGCDateStamps"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -XX:+PrintGCApplicationStoppedTime"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -XX:+PrintTenuringDistribution"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -Xloggc:${LOG_DIR}/gc_%p.log"

JAVA_OPT_OOM=" -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${LOG_DIR} -XX:ErrorFile=${LOG_DIR}/hs_error_pid%p.log "

JAVA_OPT="${JAVA_OPT_LOG} ${JAVA_OPT_OOM}"
JAVA_OPT="${JAVA_OPT} -XX:-OmitStackTraceInFastThrow"
Copy the code

Here is the logging configuration for JDK9 and above. You can see that the configuration has all changed, and it’s not backward compatible. This change in Java is a pain in the ass.

#! /bin/sh

LOG_DIR="/tmp/logs"
JAVA_OPT_LOG=" -verbose:gc"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -Xlog:gc,gc+ref=debug,gc+heap=debug,gc+age=trace:file=${LOG_DIR}/gc_%p.log:tags,uptime,time,level"
JAVA_OPT_LOG="${JAVA_OPT_LOG} -Xlog:safepoint:file=${LOG_DIR}/safepoint_%p.log:tags,uptime,time,level"

JAVA_OPT_OOM=" -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${LOG_DIR} -XX:ErrorFile=${LOG_DIR}/hs_error_pid%p.log "

JAVA_OPT="${JAVA_OPT_LOG} ${JAVA_OPT_OOM}"
JAVA_OPT="${JAVA_OPT} -XX:-OmitStackTraceInFastThrow"

echo $JAVA_OPT
Copy the code

Once a problem is found, the GC log can be used to quickly locate an in-heap problem. But you don’t have to go through them line by line, that’s inefficient. Because logs can be very, very long, and they don’t always make sense. At this point, you can use some online tools to help solve. The one I use most often is GCEasy, and here is a screenshot of it.

http://gceasy.io
Copy the code

A GC log is not enough because it only records changes in heap space, not changes in operating system resources. So, if you have a monitoring system, you can also help in finding problems. You can see some changes to system resources in the figure below.

Overflow sample

Stack overflow

The code.

The logs.

Java - Xmx20m - Xmn4m - XX: + HeapDumpOnOutOfMemoryError OOMTest [s] 18.386 [info] [gc] gc (10) Concurrent Mark 5.435 ms [18.395s][info][GC] GC(12) Pause Full (Allocation Failure) 18M->18M(19M) 10.572ms [18.400s][info][GC] GC(13) Pause Full (Allocation Failure) 18M->18M(19M) 5.348ms Exceptionin thread "main" java.lang.OutOfMemoryError: Java heap space
    at OldOOM.main(OldOOM.java:20)
Copy the code

Jvisualvm’s response.

Metaspace overflow

The code.

The logs.

java -Xmx20m -Xmn4m -XX:+HeapDumpOnOutOfMemoryError -XX:MetaspaceSize=16M -XX:MaxMetaspaceSize=16M MetaspaceOOMTest 6.556 s] [info] [gc] gc (30) Concurrent Cycle 46.668 Java. Ms lang. OutOfMemoryError: Metaspace Dumping heap to /tmp/logs/java_pid36723.hprof ..Copy the code

Jvisualvm’s response.

Direct memory overflow

The code.

The logs.

java -XX:MaxDirectMemorySize=10M -Xmx10M OffHeapOOMTest
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Direct buffer memory
    at java.nio.Bits.reserveMemory(Bits.java:694)
    at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
    at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
    at OffHeapOOMTest.oom(OffHeapOOMTest.java:27)...
Copy the code

Stack overflow

The code.

The logs.

java -Xss128K StackOverflowTest
Exception in thread "main" java.lang.StackOverflowError
    at java.io.PrintStream.write(PrintStream.java:526)
    at java.io.PrintStream.print(PrintStream.java:597)
    at java.io.PrintStream.println(PrintStream.java:736)
    at StackOverflowTest.a(StackOverflowTest.java:5)
Copy the code

What code is prone to problems

Forget to override hashCode and equals

Look at the code below. The hashCode and equals methods of the Key class were not overridden. The result is that all objects put into the HashMap cannot be retrieved. They’re disconnected from the outside world.

The following article describes its principle in detail. Bugs written by architects are unusual

Result set out of control

Don’t think this code is funny. Xjjdog has found this kind of painful code more than once in actual work reviews. This could be a rush or just learning how to write Java. This line of code has a high probability of being cratered.

Condition is out of control

The code. Similarly, the condition is out of control. When a condition is not met, the result set will be out of control. If you look at the code below, what happens when fullname and other are empty?

Universal parameters

Others use various objects and Hashmaps for information exchange. This code works fine when it works, but when it does, it’s almost impossible to troubleshoot. Check parameters, check stack, check call chain, all failed.

Some preventive measures

  • Reduce the frequency of creating large objects: such as passing byte arrays
  • Don’t cache too much heap data: Use Guava’s weak reference pattern
  • The scope of query must be controllable: such as sub-database sub-table middleware; ES et al have the same problem
  • Exhausted resources must be closed: you can use the new try-with-resources syntax
  • Use less intern: Strings are too long and cannot be reused, causing memory leaks
  • An appropriate Session timeout period
  • Use less third-party native code and use Java solutions instead
  • Reasonable pool size
  • XML (SAX/DOM), JSON parsing should pay attention to the object size

Case Study 1

This is the most common one. With this knowledge, you can deal with most memory overflow and leak problems.

The phenomenon of

  • Environment: CentOS7, JDK1.8, SpringBoot
  • G1 garbage collector
  • Just start no problem, slowly after the volume, OOM happened
  • The system automatically generates heapdump
  • Temporary solution: Restart, but the problem is still found

Information collection

  • Log: GC log information: Memory increases and decreases suddenly and changes rapidly
  • Stack: Thread Dump file: mostly blocked on a method
  • Pressure test: Use WRK to pressure test, found 20 concurrent users, memory overflow
WRK - t20 - c20 - d300s http://127.0.0.1:8084/api/test
Copy the code

MAT analysis

Stack file fetch:

jmap -dump:format=b,file=heap.bin 37340
jhsdb jmap  --binaryheap --pid  37340
Copy the code

MAT tool is based on eclipse platform development, itself is a Java program. Analyze the Heap Dump file: A large number of report objects are created in memory.

Use the Find Leaks menu to Find the black Plum Kui with one click.

Just follow the instructions and dig down.

To solve

Analysis results:

  • The system has a large amount of data query service, and do merging in memory
  • When the amount of concurrency reaches a certain level, a large amount of data will be piled into memory for calculation

Solution:

  • Reconstruct the query service to reduce the query fields
  • Use SQL queries instead of memory concatenation to avoid manipulation of result sets
  • Example: Find the intersection of two lists

Case Study 2

The phenomenon of

  • Environment: CentOS7, JDK1.8, JBoss
  • CMS garbage collector
  • The CPU resources of the operating system are exhausted. Procedure
  • Access to any interface, the response is very slow

Analysis of the

  • I found that each GC worked exceptionally well, but very frequently
  • You learned that an in-heap cache was used and set to a large capacity
  • The cache fills really fast!

Conclusion:

  • Very large caches are open and fill up quickly after GC, resulting in frequent GC

Case Study 3

The phenomenon of

  • The Java process exits unexpectedly. Procedure
  • The Java process disappears
  • There is no dump file left
  • GC logs are normal.
  • When death is detected by monitoring, there is very little memory footprint in the heap and a large amount of free space in the heap

Analysis of the

  • XX: + HeapDumpOnOutOfMemoryError doesn’t work
  • The operating system memory continues to increase

The following situations can cause an application to exit with no response.

  • Dmesg oom-killer Is killed by the OS
  • System.exit()
  • Java com.cn.AA & The terminal is shut down
  • kill -9

To solve

Findings:

  • Found in dmesg command is indeed oom-kill

Solution:

  • Allocate less memory to the JVM to make room for other processes

Case Study 4

See the article troubleshooting out-of-heap memory. Java Off-heap Memory Troubleshooting overview

End

Finally, the following figure is used as a summary. Practical knowledge can never escape the support of theory, without the consolidation of practice theory will become a body without soul. Java memory problems can never escape from this graph, just as computers can never escape from zeros and ones.

Little Sister Flavor (XjjDog), a GZH that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste.