Remember a Metaspace OOM problem caused by Arthas

Author: Technology of idle fish — firefly sound

Unless otherwise stated, this article is based on the following context by default:

JDK: OpenJDK 14GA

MacOS 10.15

Arthas 3.3.9

VisualVM 2.0.2

As of Arthas 3.4.2, this issue has been fixed. Thank you to the Arthas team for taking this issue seriously.

background

Arthas by alibaba is an open source Java application of diagnostic tool, it is powerful, and don’t need to make any changes to the original application, can help developers observation Java application in all aspects of running state, especially in the online service is not easy to debug, question repetition probability is low scenario has made great developers debug work, Arthas is very popular with developers both inside and outside the group, and I often use Arthas in my work to help identify problems with services.

In mid-August of this year, Arthas needed trace to count a huge method with a large number of get sets and multiple interface calls. After executing trace, Arthas did not show successful command calls and the terminal connected to Arthas was not responding. Try to reconnect to Arthas and trace again, only to be prompted that trace failed:

Enhance error! exception: java.lang.InternalError
error happens when enhancing class: null, check arthas log: /path/to/server-log/arthas.log
Copy the code

I looked at the Arthas run log on the server and found the following exception stack in the log:

java.lang.InternalError: null at sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) at sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:144) at com.taobao.arthas.core.advisor.Enhancer.enhance(Enhancer.java:368) at com.taobao.arthas.core.command.monitor200.EnhancerCommand.enhance(EnhancerCommand.java:149) at com.taobao.arthas.core.command.monitor200.EnhancerCommand.process(EnhancerCommand.java:96) at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl.process(AnnotatedCommandImpl.java:82) at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl.access$100(AnnotatedCommandImpl.java:18) at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl$ProcessHandler.handle(AnnotatedCommandImpl.java:111) // .Copy the code

At about the same time, I received an alarm from the monitoring platform about the target machine Metaspace OOM. I checked the server monitoring panel and found that the Metaspace of the current JVM was full. Back in the development environment, I tried the same operation a few times, but the Metaspace OOM stably reappeared. So we started to investigate this problem.

Problem analysis

A first look at the Metaspace structure

The target application runs on the AliJDK, which is deeply customized based on OpenJDK 8. Like normal OpenJDK, Metaspace is implemented as out-of-heap memory, so traditional Dump heap analysis before and after the change in the number of objects in the heap is not possible. We can only start with the storage structure of Metaspace.

Metaspace is mainly divided into non-class space and Class space. Their roles are as follows:

Class space

Maps that store Klass objects, vtables, itable, and maps that record the addresses of objects referenced by non-static members of the class.
- Klass objects are runtime data structures for Java classes at the JVM level. When a Class is loaded, an InstanceKlass object describing the current Class is generated. These Klass objects are stored in the Class Space of Metaspace. In the object header of a Java object there are Pointers to Klass objects of the class to which the object belongs.
- Vtable exists to implement virtual dispatch in Java. HotSpot abstracts all Java methods into Method objects, and the member attribute _methods in InstanceKlass holds Method instances corresponding to all the methods of the current class. Instead of explicitly designing the virtual table as a Klass field, HotSpot provides a view of the virtual table. The size of the vtable is calculated as the.class file is parsed, and the vtable is actually generated when the class is connected.
- Itable records the offset of interface methods in the VTable when a class has an implementation interface. The size of the.class file is calculated as the.class file is parsed, and itable is actually generated when the class is connected.

Non-class Space

There are a lot of things in this area, and the following ones take up the most space:
- Constant pool, variable size (note the structured representation of the constant pool in the class file, not the String constant at runtime);
- Metadata of each member method: ConstMethod structure, containing several variable size internal structures, such as method bytecode, local variable table, exception table, parameter information, method signature, etc.
- Runtime data that controls the behavior of the JIT
- Annotated data and so on

View the diagnostic command output

Once you know what data is primarily stored in Metaspace, you can use diagnostic commands to check Metaspace memory usage.

For JDK 8, you can use commands

jstat -gc <pid>
Copy the code

Later versions of the JDK (usually after JDK 12) introduce the vm. metaspace diagnostic command,

jcmd <pid> VM.metaspace
Copy the code

You can output more comprehensive diagnostic information.

So let’s go before TracejstatOutput:

You can see that MU is around 95MB and CCSU is around 14MB. Since MU = non-class Space + class Space, non-class Space is approximately 80 MB.

If you are using a higher JDK, you can use the vm. metaspace command to see more detailed results:

You can see that the data is in line with what was expected. Let’s look at the diagnostic information after trace:

It was found that the size of the non-class section increased dramatically, while the size of the class section and the number of classes loaded did not change significantly. This suggests that the Metaspace OOM was most likely caused by the JVM claiming a large amount of non-class space while parsing Arthas’s enhanced bytecode-like data and putting newly generated methods, constant pools, and other data into the non-class section. Therefore, the next step is to analyze the bytecode differences before and after enhancement.

Analyze Arthas’s command execution process

Since the enhanced bytecode is output by Arthas and injected into the JVM, it is necessary to understand how Arthas generates the enhanced bytecode before analysis. Since Arthas in this example runs as an Agent, look directly at the source code to learn about ArthasAgent’s additional processes:

// arthas-agent-attach/src/main/java/com/taobao/arthas/agent/attach/ArthasAgent.java
public void init(a) throws IllegalStateException {
 	 // ...
   / / by reflection calls ArthasBootstrap bootstrap. = ArthasBootstrap getInstance (inst);Class<? > bootstrapClass = arthasClassLoader.loadClass(ARTHAS_BOOTSTRAP); Object bootstrap = bootstrapClass.getMethod(GET_INSTANCE, Instrumentation.class, Map.class).invoke(null,instrumentation, configMap);
		boolean isBind = (Boolean) bootstrapClass.getMethod(IS_BIND).invoke(bootstrap);
		if(! isBind) { String errorMsg ="Arthas server port binding failed! Please check $HOME/logs/arthas/arthas.log for more details.";
				throw new RuntimeException(errorMsg);
		}
  // ...
}
Copy the code

The ArthasBootstrap constructor is eventually called:

    private ArthasBootstrap(Instrumentation instrumentation, Map<String, String> args) throws Throwable {
        // ...
        shutdown = new Thread("as-shutdown-hooker") {
            @Override
            public void run(a) {
                ArthasBootstrap.this.destroy(); }};// Here we use the instrumentation previously passed in to construct the bytecode class transformerManager.
        transformerManager = new TransformerManager(instrumentation);
        Runtime.getRuntime().addShutdownHook(shutdown);
    }
Copy the code

Following TransformManager, you can see the code to register the bytecode enhancement callback function of the class:

    public TransformerManager(Instrumentation instrumentation) {
        this.instrumentation = instrumentation;
        classFileTransformer = new ClassFileTransformer() {
            @Override
            public byte[] transform(ClassLoader loader, String className, Class<? > classBeingRedefined, ProtectionDomain protectionDomain,byte[] classfileBuffer) throws IllegalClassFormatException {
				// ...
              	// TraceTransformer
                for (ClassFileTransformer classFileTransformer : traceTransformers) {
                    byte[] transformResult = classFileTransformer.transform(loader, className, classBeingRedefined,protectionDomain, classfileBuffer);
                    if(transformResult ! =null) { classfileBuffer = transformResult; }}returnclassfileBuffer; }}; instrumentation.addTransformer(classFileTransformer,true);
    }
Copy the code

Coincidentally, there’s a traceTransformers. Breakpoint debugging of the Arthas source shows that the trace operation does go to this callback method. Modify Arthas’s code here to determine that if the class to be transformed is the target class that causes OOM, save both the classfileBuffer and the transformResult that the transform completes. In this way, the bytecode before and after enhancement was successfully obtained.

Analyze the bytecode structure before and after enhancement

The newly generated.class file is much larger than the old.class file. Drag two. Class files into IDEA for decompilation to see the corresponding Java code. Because the body of the method being traced is itself very large, with a lot of DTO conversion operations and a lot of get set method calls, Arthas inserted a lot of timing code before and after the method calls while generating the enhanced bytecode

However, if you look closely, you can see that even though it looks like there are a lot of strings in the code, many of the strings are actually the same, just repeated during decompression, which can be inferred from the.class file size: Although there are many more strings in the new class, the number of different strings must be very small, otherwise the.class file would have to consume a lot of space to store these different strings, and the file size would have ballooned greatly. The new.class file is only about 1 Megabyte, a far cry from the 500 Megabyte increase in Metaspace OOM, so it’s not too many constants that cause Metaspace to explode.

Since you can’t get the answer to the problem from the results of the decompilation, you try itjavap -verboseOutput the bytecode-like content before and after enhancement.

Comparing the output from the javap tools before and after reveals two interesting details:

The content structure of the class constant pool area has changed completely. Before the enhancement, the class constant pool started with only method references, and the constant index of the string type was usually around 400 or 1200. The new class constant pool starts with indexes for classes and string constants, with method references and class references interspersed between string constants.
StackMapTable generates a large number of Entries, many of which are full frame.

Common meanings of frame_type:

> > - frame_type = SAME ; / *0-63* / Same as the local variable table at the previous comparison position, and the operand stack is empty, this value is also implied offset_delta > -frame_type = SAME_LOCALS_1_STACK_ITEM; / *64-127* / The current frame has the same local variables as the previous frame, the number of variables in the operand stack is 1, Implicit offset_delta is frame_type -- 64 > -frame_type = SAME_LOCALS_1_STACK_ITEM_EXTENDED; / *247* / > -frame_type = CHOP / * 248-250 * / > -frame_type = SAME_FRAME_EXTENDED / *251* / Local variable information is the same as the last frame, And operand stack is empty > -frame_type = APPEND; / * frame_type = FULL_FRAME; / * FULL_FRAME = FULL_FRAME; / * FULL_FRAME = FULL_FRAME; / *255* / local variable table and operand stack to complete the recordCopy the code

Given that StackMapTable is basically used to validate bytecode validity during bytecode validation, consider turning off the JVM’s bytecode validation first to see if removing StackMapTable can alleviate the symptoms of Metaspace space increase.

As you can see, turning off bytecode verification does alleviate the problem of Metaspace space increase, but turning off bytecode verification on the JVM is not necessarily a safe operation, which makes the application more vulnerable to illegal bytecode: It not only increases the risk of malicious bytecode attacks, but also introduces many dynamic bytecode generation tools to implement AOP in applications. The lack of bytecode verification capability also increases the risk of application stability resulting from illegitimate bytecode due to possible problems with bytecode generation tools. Therefore, simply turning off bytecode verification without understanding the root cause of the problem can do more harm than good. It is necessary to further analyze the causes of Metaspace OOM problems.

Problem orientation

So far, we’ve seen abnormal ConstantPool layouts and huge StackMaptables at the bytecode level, but we don’t have much information to find the problem. So you have to start at the JVM level.

I found that the problems of Metaspace OOM also exist in common JDK (OpenJDK 8 and 14 on macOS, OpenJDK 12 on Ubuntu 18, problems exist in both), so I downloaded a copy of OpenJDK 14 source code. Slowdebug mode is enabled to compile a debuggable JDK. We know that in the process of class loading application Metaspace space will eventually call to share/memory/Metaspace/spaceManager cpp# spaceManager: : get_new_chunk method:

Metachunk* SpaceManager::get_new_chunk(size_t chunk_word_size) {
      // Get a chunk from the chunk freelist
      Metachunk* next = chunk_manager() - >chunk_freelist_allocate(chunk_word_size);
      if (next == NULL) {
        next = vs_list() - >get_new_chunk(chunk_word_size,
                                        medium_chunk_bunch());
      }
      Log(gc, metaspace, alloc) log;
      if (log.is_trace() && next ! =NULL &&
          SpaceManager::is_humongous(next->word_size())) {
        log.trace(" new humongous chunk word size " PTR_FORMAT, next->word_size());
      }
      return next;
}
Copy the code

So you can set the conditional breakpoint chunk_word_size > 8192 under the method header and expect to see the “culprit” consuming Metaspace in the call stack.

A new normal ClassLoader will initially receive chunks of 4KB until a maximum number of applications is reached (currently 4), and then Allocator will “lose patience” and allocate chunks of 64 KB each time to the ClassLoader. Since it is word_size, on my x64 Mac, a Word size is 64, 64 Kbytes = 65536 bytes = 8192 * 64/8, so 8192 is appropriate.

Soon, we found a call stack that applied for a lot of Metaspace:

Following the call stack step by step, there are two methods of note for annotation:

    // We have entries mapped between the new and merged constant pools
    // so we have to rewrite some constant pool references.
	// There are entries that need to be mapped between the new and merged Constant pools, so we have to rewrite some references to Constant pools.
    if (!rewrite_cp_refs(scratch_class, THREAD)) {
      return JVMTI_ERROR_INTERNAL;
    }
Copy the code

// Rewrite constant pool references in the specific method. This code
// was adapted from Rewriter::rewrite_method().
void VM_RedefineClasses::rewrite_cp_refs_in_method(methodHandle method,methodHandle *new_method_p, TRAPS) {
  			// ...
            // the new value needs ldc_w instead of ldc
            u_char inst_buffer[4]; // max instruction size is 4 bytes
            bcp = (address)inst_buffer;
            // construct new instruction sequence
            *bcp = Bytecodes::_ldc_w;
            bcp++;
            Bytes::put_Java_u2(bcp, new_index);
            Relocator rc(method, NULL /* no RelocatorListener needed */);
            methodHandle m;
            {
              PauseNoSafepointVerifier pnsv(&nsv);
              // ldc is 2 bytes and ldc_w is 3 bytes
			  // Enter space allocation
              m = rc.insert_space_at(bci, 3, inst_buffer, CHECK);
            }
            // return the new method so that the caller can update
            // the containing class
            *new_method_p = method = m;
            // switch our bytecode processing loop from the old method
            // to the new method
            // ...
          } // end we need ldc_w instead of ldc
        } // end if there is a mapped index
      } break;
          // ...
Copy the code

This method overwrites the bytecode reference of the specified method in the constant pool. As you can see from the debugging information, the current bytecode instruction to be overridden is LDC. In the old constant pool, LDC’s constant pool reference index is 2. The condition new_index <= max_jubyte (255) is not met and the LDC instruction needs to be extended to LDC_w, so a new bytecode instruction is inserted

While inserting bytecode instructions, the JDK copies the current method’s StackMapTable,

The StackMapTable for this method is large, reaching over 900 KB, so every extension of LDC instructions to LDC_W requires approximately 1MB of space from Metaspace. The Constant Pool layout of the new class is completely different from the Constant Pool layout of the old class, which means that many of the LDC directives need to be extended. Considering that max_jubyte is 255,1054 /2 is about 500 LDC instructions that need to be extended. This led to the scenario at the beginning of this article: Metaspace ballooned by about 500MB.

That leaves one final question: why doesn’t Metaspace proliferate when you turn off the JVM’s bytecode verification? When JVM bytecode verification is turned off, ClassFileParser does not parse the StackMapTable portion of the.class file, and thus does not use if(m->has_stackmap_table()) statements, avoiding StackMapTable duplication. This is also supported by the JVM source code:

// src/hotspot/share/classfile/classFileParser.cpp # parse_stackmap_table
static const u1* parse_stackmap_table(const ClassFileStream* const cfs,
                                      u4 code_attribute_length,
                                      bool need_verify,
                                      TRAPS) {
  // ...
  // check code_attribute_length first
  cfs->skip_u1(code_attribute_length, CHECK_NULL);

  // Focus on this line
  if(! need_verify && ! DumpSharedSpaces) {return NULL;
  }
  return stackmap_table_start;
}
Copy the code

If verify is not required and DumpSharedSpaces is not required, then parse_STACKMAP_table returns NULL.

Looking at the call stack, which is triggered by the VM_RedefineClasses::load_new_class_versions method,

jvmtiError VM_RedefineClasses::load_new_class_versions(TRAPS) {
	// ...
  for (int i = 0; i < _class_count; i++) {
    // Create HandleMark so that any handles created while loading new class
    // versions are deleted. Constant pools are deallocated while merging
    // constant pools
    HandleMark hm(THREAD);
    InstanceKlass* the_class = get_ik(_class_defs[i].klass);
    Symbol*  the_class_sym = the_class->name(a);log_debug(redefine, class, load)
      ("loading name=%s kind=%d (avail_mem=" UINT64_FORMAT "K)",
       the_class->external_name(), _class_load_kind, os::available_memory() > >10);

    // Construct the ClassFileStream object ↓
    ClassFileStream st((u1*)_class_defs[i].class_bytes,
                       _class_defs[i].class_byte_count,
                       "__VM_RedefineClasses__",
                       ClassFileStream::verify); 
    // ...
Copy the code

The ClassFileStream method begins by constructing a ClassFileStream object whose verify_stream property is set to ClassFileStream::verify, which is true by default.

In the ClassFileParser constructor there is code to set _need_verify:

    // Figure out whether we can skip format checking (matching classic VM behavior)
  if (DumpSharedSpaces) { // No startup argument, false
    // verify == true means it's a 'remote' class (i.e., non-boot class)
    // Verification decision is based on BytecodeVerificationRemote flag
    // for those classes.
    _need_verify = (stream->need_verify())? BytecodeVerificationRemote : BytecodeVerificationLocal; }else {
    // Go to the branch
    _need_verify = Verifier::should_verify_for(_loader_data->class_loader(),
                                               stream->need_verify());
  }
Copy the code

bool Verifier::should_verify_for(oop class_loader, bool should_verify_class) {
  return (class_loader == NULL| |! should_verify_class) ? BytecodeVerificationLocal : BytecodeVerificationRemote; }Copy the code

But class_loader! = null, should_verify_class is true, then reached the BytecodeVerificationRemote values, and the value just is decided by — noverify boot parameters. As long as the closed JVM bytecode verifier in the launch parameters, then BytecodeVerificationRemote to false, the final method will not carry StackMapTable information, avoids StackMapTable replication and lead to take up a lot of Metaspace space.

At this point, we’ve finally figured out what causes the Metaspace OOM: When tracing huge methods, the Layout of Arthas’s Constant Pool generated a new class changed, resulting in the rewrite LDC directive. After the new index directive exceeded max_jubyte, the LDC directive needed to be extended to ldc_w. The instruction expansion process required the insertion of new bytecode operators, which in turn required the replication of StackMapTable. The huge StackMapTable and a large number of LDC instructions needed to be extended, resulting in the explosion of Metaspace space and causing problems.

Problem solving

Now that you know that Metaspace OOM is caused by StackMapTable replication, and StackMapTable replication occurs when the old and new Constant Pool indexes need to be mapped, Is there a way to keep the Constant Pool layout as consistent as possible and avoid such remapping? After reading Arthas’s source code and the interface methods of byteBuddy, the bytecode enhancement library it uses, the answer is yes. I tried to modify Arthas’s code to keep the Constant Pool Layout of the new and old classes as consistent as possible.

/ / com/alibaba/repackage – asm / 0.0.7 / com/alibaba/deps/org/objectweb/asm/ClassWriter class

ClassReader: The ClassReader instance is used to read the original class file. It will be used to copy the entire constant pool, Bootstrap Method, and other copy-able bits of bytecode from the original class.

Modify com. Taobao. Arthas. Core. Advisor. Enhancer class two, one for instance ClassReader references:

// src/main/java/com/taobao/arthas/core/advisor/Enhancer.java
// ...
if(matchingClasses ! =null && !matchingClasses.contains(classBeingRedefined)) {
      return null;
}
ClassNode classNode = new ClassNode(Opcodes.ASM8);
// Add a new method in AsmUtils that returns a ClassReader for a ClassNode.
// The Constant Pool of the original class is stored in the ClassReader
// Holds the ClassReader object, which is useful when the bytecode is generated at the end
ClassReader classReader = AsmUtils.toClassReader(classfileBuffer, classNode);
// remove JSR https://github.com/alibaba/arthas/issues/1304
classNode = AsmUtils.removeJSRInstructions(classNode);
// Generate enhanced bytecode
DefaultInterceptorClassParser defaultInterceptorClassParser = new DefaultInterceptorClassParser();

// ...
Copy the code

A previously obtained ClassReader instance is passed into the bytecode generation method for copying the constant pool

// src/main/java/com/taobao/arthas/core/advisor/Enhancer.java
// ...
// https://github.com/alibaba/arthas/issues/1223
if (classNode.version < Opcodes.V1_5) {
     classNode.version = Opcodes.V1_5;
}
byte[] enhanceClassByteArray = AsmUtils.toBytes(classNode, inClassLoader, classReader);
// The enhancement succeeded, recording the class
classBytesCache.put(classBeingRedefined, new Object());
// dump the class
dumpClassIfNecessary(className, enhanceClassByteArray, affect);
// Success count
affect.cCnt(1);
// ...
Copy the code

To modify the class com. Taobao. Arthas. Bytekit. Utils. AsmUtils, accept new ClassReader parameter overloading method, is used to produce the information such as new bytecode copy the constant pool

// src/main/java/com/taobao/arthas/bytekit/utils/AsmUtils.java
// ...
// Add the following method
public static byte[] toBytes(ClassNode classNode, ClassLoader classLoader, ClassReader classReader) {
	int flags = ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS;
	ClassWriter writer = new ClassLoaderAwareClassWriter(classReader, flags, classLoader);
    classNode.accept(writer);
    return writer.toByteArray();
}
Copy the code

Compile and package the modified Arthas, trace the target class again, see that Metaspace does not increase again, and AliJDK and OpenJDK 8 tests fine.

By comparing the data output from javap-verbose, it can be seen that the Constant Pool on both sides does not change the index of the common Constant item in the old and new classes.

The counter identifier constants introduced by Arthas for trace statistics are pretty much at the bottom of the constant pool in the new classes, not competing with the constants in the old classes.

At this point, Arthas’ Metaspace OOM problems really came to an end.

thinking

From the discovery of the Metaspace OOM problem mentioned in the article to the real solution, it took the author nearly 2 weeks intermittently. In the daily development work, we usually encounter most of the heap OOM failure, in this case, as long as the heap Dump down the “crime scene” is clear. The Metaspace OOM problem described in this article became an “out-of-heap memory leak” problem after JDK 8; Moreover, there is even a lack of diagnostic commands like vm. metaspace in the JDK 8 environment, which adds up to the fact that out-of-heap memory leaks are more difficult to locate and analyze than in-heap memory leaks.

After the analysis of the whole article, it can be found that the most powerful way to solve this problem is “the number of loaded classes “,” non-class Space Size”, “class Space Size” and other important out-of-heap memory monitoring indicators. At present, these more detailed off-heap memory usage metrics are not clearly reflected in production environment monitoring systems. Private think that, in the later work, can give full play to the advantages of group has since research AliJDK, complement the monitoring indicators, to its high version to have diagnostic commands are integrated into the early AliJDK, convenient to develop students’ running state to the JVM has a more comprehensive grasp, reduce degree of difficulty of screening outside such heap memory leak, Further ensure the safety and stability of the production environment.

Afterword.

The author prepared a Demo that can reproduce the above situation: github.com/LinZong/Hug… Interested readers are welcome to try it.