Reading notes, if necessary, please indicate the author: Yuloran (t.cn/EGU6c76)

Preface

Java Virtual Machine Specification Java SE 11 Edition, Java Garbage Collection Basics Excerpted from In-depth Understanding of the Java Virtual Machine _JVM Advanced features and Best Practices, 2nd edition by Zhipeng Zhou.

First, let’s be clear: The Java Virtual Machine Specification is implemented independently of the specific programming language and the specific virtual machine. Java is only the most familiar JVM programming language. Other POPULAR JVM languages include:

  • Clojure, a modern, dynamic, and functional dialect of the Lisp programming language
  • Groovy, a dynamic programming and scripting language
  • JRuby, an implementation of Ruby
  • Jython, an implementation of Python
  • Kotlin, a statically-typed language from JetBrains, the developers of IntelliJ IDEA
  • Scala, a statically-typed object-oriented and functional programming language[1]

The cornerstone of this language independence is the platform-independent program storage format called ByteCode, or binary file *.class:

In addition, the Java Virtual Machine Specification does not specify how GC should be implemented, so specific virtual machines should be specified when describing GC algorithms and GC collectors for Java Virtual machines. The original:

THIS document specifies an abstract machine. It does not describe any particular implementation of the Java Virtual Machine. To implement the Java Virtual Machine correctly, you need only be able to read the class file format and correctly perform the operations specified therein. Implementation details that are not part of the Java Virtual Machine’s specification would unnecessarily constrain the creativity of implementors. For example, the memory layout of run-time data areas, the garbage-collection algorithm used, and any internal optimization of the Java Virtual Machine instructions (for example, translating them into machine code) are left to the discretion of the implementor.

The translation:

This paper describes an abstract machine. No specific implementation of any Java virtual machine is described. To implement the virtual machine correctly, you just need to be able to read the Class file format and perform the specified operations in it. Implementation details are not part of the Java virtual Machine specification, as they may limit the creativity of implementers. For example, the memory layout of the runtime data area, the GC algorithm, and the internal optimization of any Java virtual machine instruction set (i.e., translation to machine code) are all at the discretion of the implementer.

The acquisition of Sun gives Oracle two main Java Virtual Machine (JVM) implementations, namely the Java HotSpot VM and the Oracle JRockit JVM. Without special explanation, this article uses HotSpot JVM as an example to illustrate the implementation of the Java Virtual Machine Specification.

HotSpot JVM Architecture

From the Java SE HotSpot Overview, Java Garbage Collection Basics

The HotSpot Virtual machine is a core component of the Java SE platform, an implementation of the Java Virtual Machine specification, and is provided as a shared library of the JRE. As a Java bytecode execution engine, it provides Java runtime facilities such as thread and object synchronization on a variety of operating systems and architectures. It includes adaptive dynamic compilers that compile Java bytecode into optimized machine instructions, and efficient management of the Java heap using a garbage collector optimized for reduced pause times and throughput.

The HotSpot VIRTUAL machine can choose the right compiler, Java heap configuration, and garbage collector, depending on the platform configuration, to ensure good performance for most applications. The following is the HotSpot VIRTUAL machine architecture:

Its main components include: Class Loader, Runtime Data Areas, and Execution Engine.

Note: Java Threads refer to Java Virtual Machine Stacks and Native Internal Threads refer to Native Method Stacks in run-time Data Areas.

Runtime Data Areas

For ease of understanding, I redraw the Runtime Data Areas in the figure above in combination with the JVM specification as follows:

Above is an illustration of the JVM specification, regardless of the specific virtual machine. For example, if Native Method Stacks might not exist, HotSopt JVM combines JVM Stacks with Native Method Stacks.

The PC Register

Java virtual machines support concurrent execution of multiple threads, each thread has its own Program Counter Register. At any given moment, the JVM thread can execute only one method, called the current method. If the current method is a Java method, the program counter stores the address of the instruction that the JVM is currently executing. If the current method is local, the value of the program counter is null (Undefined). The program counter is a small memory space that can be thought of as a line number indicator of the bytecode being executed by the current thread, and the only area that does not generate OutofMemoryErrors.

Java Virtual Machine Stacks

Each JVM thread is used for a private Java virtual machine stack, which is allocated as a process is created and destroyed as a process exits. Java virtual machine stack, which describes the memory model of Java method execution, is also called Java method stack. Each time a Java method executes, a stack frame is created. A stack frame is a data structure that describes the virtual machine’s method invocation and method execution, and is used to store the method’s local variation table, operand stack, dynamic joins, method return address, and additional information. The process from method invocation to completion corresponds to the process from a stack frame to a stack frame in the virtual machine stack.

When compiling the program Code, the size of the local variable table and the depth of the operand stack needed in the stack frame are fully determined and written into the Code property of the method table. So how much memory a stack frame needs to allocate is not affected by the runtime variable data of the program, but only depends on the specific virtual machine implementation.

Local variable scale

Used to store method parameters and local variables defined in the method, with Variable Slot as the smallest unit. You can store data of a Boolean, byte, CHAR, short, int, float, Reference, or returnAddress type (which can be understood by the Java language, but is not the same in nature).

  • Type of reference: The virtual machine specification does not specify its length, nor does it specify how this reference should be structured. But virtual machine implementation should do at least two things:
    • From this reference, you can directly or indirectly find the starting address index of the data store address of the object in the Java heap.
    • You can directly or indirectly look up the type information stored in the method area for the data type of the object from this reference.
  • The returnAddress type: an address that points to a bytecode instruction, which is now rarely used, but was used by older virtual machines to implement exception handling.

When the Java program is compiled as a Class file, the max_locals data item of the method’s Code property determines the maximum size of the local variable table that the method needs to allocate.

The operand stack

Also known as an operation stack, its maximum depth is written at compile time to max_stacks in the Code attribute. Stack elements can be any data type in the Java language. The stack is empty when the method is first executed. During method execution, various bytecode instructions read and write to the operation stack, for example, arithmetic operations are done through the operation stack, or arguments are passed through the operation stack when other methods are called.

Dynamic connection

Method invocation instructions in bytecode take symbolic references to the method in the constant pool as arguments. Some of these symbolic references are converted to direct references at class load or the first time they are used, and some are converted to direct references at run time, which is called dynamic concatenation. Each stack frame contains a reference to the method that the stack frame belongs to in the runtime constant pool for dynamic linkage.

Method return address

After the method exits, you need to return to the location where the method was called. When a method exits normally, the caller’s PC value can be used as the return address, and it is likely to be stored in the stack frame. When a method exits abnormally, the return address needs to be determined by the exception handler table, which is generally not stored in the stack frame.

Additional information

Information not described in the vm specification, such as debugging information.

Possible exceptions:

  • StackOverflowError: This exception is thrown when the stack depth of a thread request is greater than the stack depth of the virtual machine
  • OutOfMemoryError: This exception is thrown if the virtual stack supports dynamic scaling but does not have enough memory to apply for the scaling, or does not have enough memory to initialize the virtual stack when the thread is created

Native Method Stacks

It is very similar to the Java virtual machine stack, except that it is a memory model that describes the execution of Java’s native methods, so it is called the local method stack. Although the virtual machine specification does not specify the language, usage, or data structure of methods in the local method stack, it generally refers to C. At this point, I have to say that from a process perspective, no matter what programming language is written, the memory model or memory layout is the same. The following figure shows the memory layout of C programs in Linux:

The Java native method Stack is functionally similar to the C Stack for C programs. Likewise, the Java native method stack throws stackOverflowErrors and OutofMemoryErrors.

Method Area

Java method areas are shared by all threads and are functionally similar to Text segments in C programs. This area stores data such as run-time constant pools, loaded classes, references to static variables, references to member variables and member methods, and compiled code with the JIT Compiler.

The method area is created when the virtual machine starts. Although logically in the Java Heap (GC focus), garbage collection or collation can not be implemented. The method area size can be fixed, dynamically expanded, and compressible (when such a large method area is not required), and memory is not required to be contiguous.

The HotSpot virtual machine extends GC Generation collection to the method area, or they implement the method area with Permanent Generation. There are some problems with this: (1) method GC does not work, (2) permanent generation memory is limited and OOM is easy to appear, (3) very few methods behave differently on different virtual machines for this reason, such as string.intern () :

public class RuntimeConstantPoolOOM {
    public static void main(String[] args) {
        String str1 = new StringBuilder("Computer").append("Software").toString();
        System.out.println(str1.intern() == str1);

        String str2 = new StringBuilder("ja").append("va").toString(); System.out.println(str2.intern() == str2); }}Copy the code

This code gets two false when run in JDK 1.6, and one true and one false when run in JDK 1.7. The reason: JDK 1.6 copies the first encountered string instance to the persistent generation and returns a reference to that string instance in the persistent generation, and StringBuilder creates string instances in the Java heap, so it must not be the same reference. In JDK 1.7 (which has moved the string constant pool out of the permanent generation), intern() no longer copies instances, only records the first instance reference in the constant pool, so intern() returns the same reference as the string instance created by StringBuilder. The “Java” string virtual machine was already loaded into the constant pool by the JDK Version class when it started, so it is not the same reference.

OutOfMemoryError is thrown when the method area cannot meet memory allocation requirements.

Run-Time Constant Pool

The runtime constant pool is part of the method area. The Constant Pool Table is used to store various literals (integer literals, string literals, etc.) and symbol references generated at compile time. The Constant Pool Table is used to store various literals generated at compile time. Similar to C program Symbol Table, but much richer than C language data type, this part of the content will be loaded after the method area constant pool.

The Java virtual machine has strict rules on each part of a Class file, and each byte must be used to store what kind of data to be accepted, loaded, and executed by the virtual machine. However, there are no specific requirements for the runtime constant pool, so the runtime constant pool is implemented to be dynamic, meaning that constants can also be generated at run time, as in the String intern() method.

An OutOfMemoryError is raised when the runtime constant pool can no longer allocate memory.

Heap

The Java heap is an area shared by all JVM threads. All class instances and arrays are allocated on the heap.

The Java heap is created when the virtual machine starts. Objects on the heap are automatically managed by the Garbage Collector.

From a memory collection point of view, the Java heap can also be divided into Young Generation and Old Generation, since most garbage collectors now use generational collection algorithms. Among them, the young generation can be divided into Eden, From Survivor and To Survivor in a ratio of 8:1:1.

From a memory Allocation perspective, the Java heap can be partitioned into Thread Local Allocation Buffers (TLabs) that are private to multiple threads.

Java heap memory is not required to be physically continuous, just logically continuous. OutOfMemoryError is thrown if there is no memory in the heap to complete the instance allocation and the heap cannot continue to expand.

Automatic Garbage Collection

Unlike C or C++, where memory is allocated and freed manually by the developer, memory in the JVM is automatically managed by the garbage collector with the following basic steps:

Step1: Mark

Mark which objects are used and which are no longer used:

Step2: Sweep

Delete objects that are no longer referenced, keep objects that are alive, and maintain a list of references in free memory:

Step2a: Sweep-compact

To improve performance, some collectors use a “mark-sweep-tidy” algorithm to reclaim memory. Move objects that are still alive to one side to make it easier to find contiguous free memory next time:

Generational Garbage Collection

The mark-and-sweep method of collecting objects is inefficient, especially when there are more and more objects, and it takes longer to perform garbage collection. This is scary, because the Java program needs to be frozen when the GC is triggered, otherwise the object’s references will not be traceable. Studies have shown that most subjects have a short survival time. So most JVMS today use generational collection algorithms to improve performance:

Young Generation

Where all new objects are assigned and aged. A Minor Garbage Collection is triggered when the young generation runs out. After GC, objects that are still alive age and eventually age. The younger generation uses the “mark-copy” method for GC.

Stop the World Event: All Minor Garbage collections are “Stop the World Event”, which means that all threads are paused until the GC completes.

The Old Generation

Used to store objects that live for a long time. Typically, the young generation sets an age threshold, and when objects age beyond this threshold, they are moved to the old age. Eventually, older objects will also be collected, which is called a Major Garbage Collection, or “Stop the World Event”. In general, the Major GC is slow because all the live objects are involved. Therefore, the HotSpot VIRTUAL machine uses multiple garbage collectors simultaneously to reduce the GC time. The old days used the “mark-tidy” method for GC.

Permanent Generation

For the HotSpot VIRTUAL machine, this is the method area. This area is dynamic, and constants can be added at run time, constants that are no longer used can be discarded, and classes that are no longer used can be unloaded, but the conditions for unloading classes are very stringent:

  1. All instances of the class are reclaimed;
  2. The ClassLoader that loaded the class has been reclaimed;
  3. The corresponding Class object of this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere by reflection.

Steps for generational collection

Step1. The newly allocated object enters Eden space and both Survivor start empty:

Step2. Trigger a Minor GC when Eden space is full:

Step3. Move the surviving objects to Survivor space with age 1, delete those no longer referenced, and clear Eden space:

Step4. Repeat the above steps the next time Minor GC is triggered. This time, however, you need to move the surviving objects to another Survivor space, and the age of the objects that survived the previous Minor GC is +1. Then clean up the original Survivor and Eden Spaces:

Step5. Next time Minor GC triggers, repeat the above operations, i.e. switch Survivor space, age increment, etc. :

Step6. After multiple Minor GC triggers, some of the surviving objects exceed the age threshold of the young generation (assumed to be 8 here) and are promoted to the old generation:

Step7. As the Minor GC continues to trigger, the young generation survivable objects are promoted to the old generation:

Step8. The above process almost covers the whole process of the young generation. Eventually, the older generation also triggers a Major GC for garbage collection:

Object analysis

Object access location

Java programs use references on the stack to manipulate specific objects on the heap. Because the JVM specification only specifies that the Reference type is a Reference to an object, it does not define how the Reference should locate and access the object in the heap. At present, there are two mainstream access methods: handle and direct pointer:

  • Use a handle:

  • Direct pointer:

HotSpot virtual machine uses direct pointer mode, the biggest benefit is faster and saves the overhead of a pointer location.

Object reference analysis

  • Reference counting: When two objects refer to each other, it is impossible to determine whether they are no longer in use
  • Reachability analysis: Introduces the concept of GC Roots. If an object has no reference chain to GC Roots, it can be reclaimed. In Java, objects that can be used as GC Roots are:
    • Objects referenced in the Java method stack
    • The object referenced by the class static property in the method area
    • The object referenced by the method area constant
    • Objects referenced in the local method stack

Reference segment

  • Strong reference: similar to “Object obj = new Object(); Both are strong references and the GC will never reclaim them;
  • Soft reference: Objects that are useful but unnecessary will be recycled a second time before the OOM occurs. If there is still insufficient memory after the OOM is recycled, the OOM will be thrown. The implementation class forSoftReference;
  • Weak references: Weaker than soft references and only survive until the next GC. The next time GC occurs, objects associated only with weak references are reclaimed, whether memory is tight or not. The implementation class forWeakReference;
  • Virtual reference: The weakest type of reference relationship through which the referenced object cannot be retrieved. The only function is to receive a system notification when it is collected. The implementation class forPhantomReference.

finalize()

The Java programming specification explicitly states that you should not override finalize() unless you know what you are doing. The Finalize () method is executed by the JVM only once. When an object is marked dead for the first time, there is a filter to determine if finalize() is needed, and if finalize(object overwrites Finalize ()) is needed, then the object is thrown into a Queue called f-Queue. Waiting for a low-priority Finalizer thread automatically created by the JVM to execute. Later, the GC will do a second small markup of the objects in the F-Queue, and if it has not escaped by then (assigning this to a member variable of another class in Finalize ()), it will basically really be reclaimed.

Garbage Collectors

There are many Java garbage collectors, and HotSpot JVMS after JDK 1.7 Update 14 have 7 garbage collectors at the same time, and the young generation and the old generation have different collectors. Why so many garbage collectors? To improve the performance of virtual machines. But no matter how you optimize it, “Stop The World” is inevitable, just a matter of time. The same is true for the Android Dalvik or ART virtual machine. So one is to reduce GC time, and two is to avoid frequent GC triggers.

The HotSpot VIRTUAL machine used after JDK 1.7 Update 14 garbage collector:

A line indicates that it can be used together.

Serial

The oldest single-threaded collector, using the “mark-copy” algorithm. During GC, all other worker threads must be paused until GC is complete.

ParNew

Is a multithreaded version of the Serial collector that uses the mark-copy algorithm.

Parallel Scavenge

Similar to the ParNew collector, but focused on achieving a manageable throughput, using a “mark-copy” algorithm.

Serial Old

An older version of the Serial collector that uses a “mark-tidy” algorithm.

Parallel Old

Older version of the Parallel Avenge, using the mark-and-collate algorithm.

CMS

Concurrent Mark Sweep, a collector whose goal is to achieve the shortest collection pause times, uses a “mark-sweep” algorithm.

G1

Garbage First, one of the most recent advances in collector technology, is a collector for server-side applications. It has the characteristics of parallelism and concurrency, generational collection, spatial integration and predictable pause.

The attached

How do I get the JVM specification?

  1. Enter the Oracle official website and press the following figure:

  1. Click on Java SE Documentation:

  1. Click on Language and VM:

  1. Select version: