preface
Understanding the JVM is a basic requirement for Java programmers, but how many students like me are so obsessed with solving the bug heap layout that they forget the internal discipline and have a fragmentary understanding of the JVM? A systematic study of the JVM may take us further down the road.
Learn about the Java virtual Machine family
We can refer to the Java programming language, Java virtual machine, and Java class library as the Java Development Kit (JDK), which is the minimum environment for supporting Java program Development.
The JVM is part of the JDK, The Java Virtual Machine Specification is a separate set of specifications parallel to The Java Language Specification, Different companies have different implementations of it (similar to an interface being implemented by different classes).
VM ancestor: Sun Classic/Exact VM
- The world’s first commercial Java virtual machine
- The only virtual machine in the JDK prior to JDK 1.2
- It was only with JDK 1.4 that the Classic VM completely dropped out of the business VIRTUAL machine scene and was replaced by HotSpot
Wulin Alliance master: HotSpot VM
- It is the default Java virtual machine in the Sun/OracleJDK and OpenJDK, and is the most widely used Java virtual machine today
- For example, HotSpot in its name refers to its HotSpot code detection technology
- The most widely used Java virtual machine in the world
Xiajiabiyu: Mobile/Embedded VM
- Java Virtual Machines in Java ME for mobile and embedded markets
# 2: BEA JRockit/IBM J9 VM
- The JRockit VIRTUAL machine, once touted as the “fastest Java VIRTUAL machine in the world,” is a virtual machine that is highly optimized for server hardware and server application scenarios. JRockit is no longer growing following BEA’s acquisition by Oracle
- The IBM J9 VIRTUAL machine is positioned closer to HotSpot as a multi-purpose virtual machine designed with server-side, desktop, and embedded applications in mind
Soft/hard combination: BEA Liquid VM/Azul VM
- A dedicated VM that is bound to a specific hardware platform and works with hardware and software
Challenger: Apache Harmony/Google Android Dalvik VM
- Apache Harmony is an Apache Software Foundation open source Java platform compatible with JDK 5 and JDK 6 under the Apache License. It contains its own virtual machine and Java library API. Users can run common Java programs such as Eclipse, Tomcat, and Maven.
- The Dalvik VIRTUAL machine is not a Java VIRTUAL machine. It does not follow the Java Virtual Machine Specification. It cannot execute Java Class files directly and uses register architecture instead of the stack architecture common in Java virtual machines. However, it has a myriad of links with Java. The DEX (Dalvik Executable) file it executes can be converted from the Class file, the application program can be written using Java syntax, and most of the Java API can be directly used.
Java memory region partitioning and OutOfMemory
According to the Java Virtual Machine Specification, the memory managed by the Java Virtual Machine will include the following runtime data areas:
Program counter
- Is the line number indicator of the bytecode executed by the current thread
- Thread private: each thread needs to have a separate program counter, which is stored independently of each other
- If the thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed. If you are executing a Native method, this counter value should be null.
- There is no OutOfMemory
Java virtual machine stack
- Thread private
- The virtual machine Stack describes the threaded memory model of Java method execution: For each method execution, the Java VIRTUAL machine synchronously creates a Stack Frame to store information about local variables, operand stacks, dynamic connections, method exits, and so on. The process of each method being called and executed corresponds to the process of a stack frame moving from the virtual machine stack to the virtual machine stack
- The local variable table stores various Java virtual machine basic data types (Boolean, byte, CHAR, short, int, float, long, double) and object references (reference type, which is not the same as the object itself) known at compile time. May be a reference pointer to the start address of the object, may be a handle to the object or some other location associated with the object) and the returnAddress type (which points to the address of a bytecode instruction)
- In the Java Virtual Machine Specification, two types of exceptions are specified for this memory area: a StackOverflowError is thrown if the stack depth of a thread request is greater than the depth allowed by the virtual machine; If the Java virtual machine stack can expand dynamically, OutOfMemoryError will be raised when sufficient memory cannot be allocated during stack expansion
Local method stack
- Similar to the role played by the virtual machine stack, it serves local methods used by the virtual machine
The Java heap
- It is the largest chunk of memory managed by a VM
- All object instances and arrays should be allocated on the heap
- The Java heap is an area of memory shared by all threads
- The Java heap can be allocated as a private Thread Local Allocation Buffer (TLAB) for multiple threads. Interviewer: Are you sure?
- The Java heap can be implemented as either fixed size or extensible, but most current Java virtual machines are implemented as extensible (with the -xmx and -xms parameters). The Java virtual machine will throw an OutOfMemoryError if there is no memory in the Java heap to complete the instance allocation and the heap can no longer be extended.
Methods area
- Like the Java heap, is an area of memory shared by individual threads
- It is used to store type information that has been loaded by the virtual machine, constants, static variables, just-in-time compiler compiled code cache, and so on
- In JDK 8, the concept of permanent generations was scrapped entirely, replaced by metaclases.
- The runtime constant pool is part of the method area. The Constant Pool Table is used to store various literals and symbolic references generated at compile time. This part of the Table is stored in the runtime Constant Pool of the method area after the Class is loaded
- New constants can also be pooled at runtime: for example, the Intern () method of the String class
- OutOfMemoryError is thrown if the method area cannot meet the new memory allocation requirements.
Direct memory
- It is not part of the virtual machine runtime data area, nor is it an area of memory as defined in the Java Virtual Machine Specification
- The NIO (New Input/Output) class was introduced in JDK 1.4, introducing a Channel and Buffer based I/O method that can allocate off-heap memory directly using Native libraries. This is then referenced by a DirectByteBuffer object stored in the Java heap.
- Is limited by the size of the total native memory (including physical memory, SWAP partition, or paging file) and the addressing space of the processor, resulting in an OutOfMemoryError during dynamic scaling
Hotspot VIRTUAL machine object
Object creation
- When the Java virtual machine hits a bytecode new instruction, it checks to see if the class is loaded, parsed, initialized, and if not, it loads the class
- After the class load check passes, the virtual machine next allocates memory for the new objects.
- Thread-safe solution: 1, CAS+ failure retry to ensure atomicity of update operations 2, TLAB allocates memory in different Spaces according to threads
- After the memory allocation is complete, the virtual machine must initialize the allocated memory space (but not the object header) to zero
- The Java virtual machine then sets up the Object as necessary, such as which class the Object is an instance of, how to find the metadata information about the class, the Object’s hashCode(which is actually deferred until the Object::hashCode() method is actually called), the Object’s GC generation age, and so on. This information is stored in the Object Header of the Object. The object header can be set differently depending on the VM running status, for example, whether biased locking is enabled.
- The new instruction then executes the () method and initializes the object as the programmer wishes, so that a truly usable object is fully constructed
Object memory layout
- The storage layout in heap memory can be divided into three parts: object Header, Instance Data, and align Padding.
- The object header contains: the object’s own runtime data (Mark Word), the pointer to its owning class, and the length of the array (if it is an array object).
Object access location
- Java programs manipulate specific objects on the heap using reference data on the stack.
- Object access methods are also determined by virtual machines. The main access methods are handle and direct pointer
Garbage Collection (GC)
Death of the object
Reference counting method
- Adds a reference counter to the object, incrementing its value by one each time it is referenced somewhere; When a reference is invalid, the counter value is reduced by one; An object whose counter is zero at any point in time cannot be used again.
- Objects that refer to each other looping cannot be recycled
Accessibility analysis
- A series of root objects called “GC Roots” are used as the initial node set. From these nodes, search down according to Reference relationship. The path traversed in the search process is called “Reference Chain”. Or in graph theory terms, when the object is unreachable from GC Roots, then the object cannot be used again
- Available as objects for GC Roots:
- The object referenced in the virtual machine stack (the local variable table in the stack frame)
- The object referenced by the class static property in the method area
- The object referenced by the constant in the method area
- Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack
- Internal references to the Java virtual machine, such as Class objects corresponding to basic data types, resident exception objects (NullPointExcepiton, OutOfMemoryError), and system Class loaders
- All objects held by the synchronized keyword
- Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, local code caches, and so on
Four types of references
- Strongly re-references: In any case, the garbage collector will never reclaim the referenced object as long as the Strongly referenced relationship exists
- Soft Reference: Objects that are associated with Soft references are listed in the recycle range for the second time before the overflow exception occurs. If there is not enough memory in the recycle range, an overflow exception is thrown
- Weak references: When the garbage collector starts working, objects associated only with Weak references are reclaimed regardless of whether there is currently enough memory
- Phantom Reference: The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector. (The virtual reference must be used with the ReferenceQueue. When the GC is about to reclaim an object, if it finds that it has a virtual reference, Add this virtual reference to the ReferenceQueue associated with it.
Declaration of death
- To actually declare an object dead, there must be at least two tagging processes
- First of all, the object is unreachable, it will be marked for the first time, and then a filter will be carried out. The filter condition is whether the object needs to execute finalize() method.
- If the object is determined to be necessary to finalize(), then the object will be placed ina Queue named f-queue, and their Finalize () methods will be executed later by a low-priority Finalizer thread automatically set up by the virtual machine.
- If objects want to save themselves successfully in Finalize () — just re-associate with any object on the reference chain
- If the object does not escape the second time it is marked, it will basically be reclaimed
Garbage collection algorithm
Mark-sweep algorithm
- First, mark all objects that need to be reclaimed. After the mark is completed, all marked objects are uniformly reclaimed. Alternatively, mark surviving objects and uniformly reclaimed all unmarked objects.
- Disadvantages: 1. Unstable execution efficiency 2. Fragmentation of memory space
Semispace Copying algorithms
- Divide the available memory into two equally sized pieces by capacity and use only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again. Resolve memory fragmentation.
- Cons: Wasted space
Mark-collation algorithm
- The marking process is still the same as the “mark-clean” algorithm, then all surviving objects are moved to one end of the memory space, and the memory beyond the boundary is cleaned up.
- This object movement operation can only happen if The user’s application is paused at all times, and was graphically described by The original virtual machine designers as “Stop The World.”
Generational collection
After the Java heap is divided into different regions, the garbage collector can only reclaim one or some of the regions at a time — hence the division of collection types such as “Minor”, “Major”, and “Full”; It is also possible to arrange garbage collection algorithms that match the survival characteristics of stored objects in different regions.
- Minor /Young GC: Garbage collection that targets only the new generation.
- Major GC: Garbage collection that targets only Old GC.
- Mixed GC: Garbage collection that aims to collect the entire new generation and parts of the old generation. Currently, only the G1 collector has this behavior.
- Full GC: Collects the entire Java heap and method area garbage collection
Hotspot virtual machine heap memory partition
Memory allocation and reclamation policies
- Objects are allocated in Eden first: In most cases, objects are allocated in Eden of the new generation. When the Eden area does not have enough space to allocate, the virtual machine will initiate a Minor GC
- Large objects go straight to the old days: Large objects are Java objects that require a large amount of contiguous memory, typically long strings or arrays with a large number of elements
- Long-lived objects are aged: the virtual machine defines an object Age counter for each object, stored in the object header. An object is usually born in Eden, and if it survives after the first Minor GC and can be accommodated by Survivor, it is moved to Survivor and its object age is set to 1. Each time an object survives a Minor GC in a Survivor zone, its age increases by one year, and when it reaches a certain age (15 by default), it is promoted to the old age
- Dynamic object age determination: If the sum of all object sizes of the same age in a Survivor space is greater than half of that in a Survivor space, objects older than or equal to that age can enter the old age directly
- Space allocation guarantee: When Survivor space is insufficient to accommodate objects surviving after a Minor GC, other memory regions (in fact, older generations in most cases) need to be relied on for allocation guarantees. Before a Minor GC occurs, the virtual machine must first check that the maximum contiguous space available for the old generation is greater than the total space available for all objects of the new generation. If this condition is true, then the Minor GC is safe for this time. If not, then the virtual opportunity to check the – XX: HandlePromotionFailure parameter setting values are allowed to guarantee Failure (Handle Promotion Failure); If allowed, it continues to check whether the maximum available contiguous space of the old age is greater than the average size of the objects promoted to the old age, and if so, a Minor GC is attempted, although this Minor GC is risky; If less than, or the -xx: HandlePromotionFailure setting does not allow risk-taking, then a Full GC should be performed instead.
References:
Chapter 1-3 in Understanding the JAVA Virtual Machine
Java Virtual Machine series 2: Garbage collection mechanism details, GIF to help you understand