Preface:
Every Java developer knows that bytecode is executed by the JRE (Java runtime environment). But many people don’t know that the JRE is an implementation of the Java Virtual Machine (JVM), which analyzes bytecode, interprets the code, and executes it. As developers, we should know that the architecture of the JVM is very important because it allows us to write code more efficiently. In this article, we’ll take a closer look at the JVM architecture in Java and the different components of the JVM.
What is the JVM?
A Virtual Machine is a software implementation of a physical Machine. Java was developed with the concept of WORA (Write Once Run Anywhere) running on a VM. The compiler compiles the Java file to a java.class file and then inputs the.class file into the JVM, which loads and executes the class file. The following is an architecture diagram of the JVM.
Java memory area and memory overflow exception
Runtime data area
According to the Java Virtual Machine Specification (Java SE 7 Edition), the following figure shows the memory managed by the Java VIRTUAL Machine.
Program counter
Small memory space, thread private. The bytecode interpreter works by changing the value of this counter to select the next bytecode instruction that needs to be executed. Branch, loop, jump, exception handling, thread recovery and other basic functions need to be completed by the counter
If a thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed. If the Native method is being executed, the counter value is (Undefined). This memory region is the only one where the Java Virtual Machine specification does not specify any OutOfMemoryError cases.
Java virtual machine stack
The thread is private and has the same life cycle as the thread. It describes the memory model of Java method execution: each method execution creates a Stack Frame to store information about local variables, operand stacks, dynamic links, method exits, etc. Each method from the call to the end of execution, corresponding to the process of a stack frame from the virtual machine stack to the stack.
Local variable table: stores basic types known at compile time (Boolean, byte, CHAR, short, int, float, long, double), object references (reference types), and returnAddress types (which refer to the address of a bytecode instruction)
StackOverflowError: Stack depth of thread request is greater than the depth allowed by the virtual machine.
OutOfMemoryError: If the vm stack can be dynamically expanded and sufficient memory cannot be allocated during the expansion.
Local method stack
It differs from the Java virtual machine stack in that the Java virtual machine stack performs Java methods (that is, bytecode) services for the virtual machine, whereas the Native method stack serves Native methods used by the virtual machine. StackOverflowError and OutOfMemoryError exceptions also occur.
The Java heap
For most applications, this area is the largest chunk of memory managed by the JVM. Thread sharing, mainly to store object instances and arrays. Multiple Thread Local Allocation buffers (TLabs) are allocated internally. It can be physically discontinuous, but logically continuous.
OutOfMemoryError: Thrown if there is no memory in the heap to complete instance allocation and the heap can no longer be extended.
Methods area
It is a shared memory area that stores data such as class information, constants, static variables, and code compiled by the just-in-time compiler that has been loaded by the VIRTUAL machine.
Now use a diagram to illustrate what each area stores.
Run-time constant pool
It is part of the method area and is used to hold various literal and symbolic references generated at compile time. Both the compiler and the runtime (intern() of String) can pool constants. An OutOfMemoryError is raised when memory is limited and cannot be applied.
Direct memory
Part of the non-running data area of the virtual machine
In JDK 1.4, the NIO (New Input/Output) class introduced a Channel – and buffer-based I/O method, which can use Native libraries to allocate out-of-heap memory directly. It then operates through a DirectByteBuffer object stored in the Java heap as a reference to this memory. Can avoid time-consuming data operations back and forth between the Java heap and Native heap.
OutOfMemoryError: Is limited by native memory and occurs when the sum of memory regions is greater than the physical memory limit, resulting in dynamic expansion.
HotSpot VIRTUAL machine object Exploration
It focuses on how data is created, laid out, and accessed.
Object creation
The creation process is complicated, so it is recommended to read a book. Here is a personal summary.
When you encounter a new instruction, you first check to see if the instruction’s arguments locate a symbolic reference to a class in the constant pool, and check whether the class represented by the symbolic reference has been loaded, parsed, and initialized. If not, perform the appropriate class load.
After the class load check passes, the new object is allocated memory (the size of memory can be determined after the class load is complete). Divide an area of free memory in the heap (‘ pointer collision – memory regularization ‘or’ free list – memory interleaving ‘allocation).
Each thread described above has a private allocation buffer (TLAB) in the heap, which largely avoids thread insecurity caused by frequent object creation in concurrent situations.
Once allocated, the memory space is initialized to 0(excluding the object header), and the object header is populated with information about which class the object is an instance of, how to find the metadata information about the class, the object’s hash code, the object’s GC generation age, and so on.
The new directive is followed by the init method before a usable object is actually created.
Object memory layout
In the HotSpot virtual machine, there are three sections: object Header, Instance Data, and Padding.
Header: Contains two parts. The first part is used to store the runtime data of the object itself, such as hash code, GC generation age, lock status flag, thread held lock, biased thread ID, biased timestamp, etc. The value is 32 bits for 32-bit VMS and 64 bits for 64-bit VMS. It’s officially called the ‘Mark Word’. The second part is the type pointer, the metadata pointer that an object points to its class, which the virtual machine uses to determine which class the object is an instance of. In addition, if it is a Java array, there must be a piece of data in the object header that records the length of the array, because normal objects can be sized using Java object metadata, whereas array objects cannot.
Instance Data: The contents of various types of fields defined in program code (both inherited from parent classes and defined in subclasses).
Padding: Not necessarily required. Padding is used to make sure the object size is an integer multiple of a byte.
Object access location
When working with objects, specific objects on the heap are manipulated using reference data on the stack.
Access via a handle
A chunk of memory is allocated in the Java heap as a handle pool. Reference stores the address of the handle. See figure for details.
Use direct pointer access
Object addresses are stored directly in reference
Comparison: The biggest advantage of using handles is that reference stores stable handle addresses. In object movement (GC), only the address of the instance data pointer is changed, and Reference itself does not need to be modified. The biggest advantage of direct pointer access is that it is fast and saves the time cost of a pointer location. Handle methods are good if the object is frequently GC, and direct pointer access is good if the object is frequently GC.
Vm class loading mechanism
The virtual machine loads the data describing the Class from the Class file into memory, verifies, parses, and initializes the data, and eventually forms Java types that the virtual machine can use directly.
In the Java language, type loading, concatenation, and initialization are all done during program execution.
Class loading timing
Life cycle of a class (7 phases)
The order of the five stages of load, validation, preparation, initialization and unload is determined. The parsing phase can start after initialization (run-time binding or dynamic binding or late binding).
There are five cases in which classes must be initialized (and loading, validation, and preparation naturally need to be done before that) :
-
Initialization is triggered when four bytecode instructions — new, getstatic, putstatic, or Invokestatic — are not initialized. Usage scenarios: Instantiate an object with the new keyword, read a static field of a class (except static fields that are modified by final and have put the result into the constant pool at compile time), and call a static method of a class.
-
When a reflection call is made to a class using the methods of the java.lang.Reflect package.
-
When initializing a class, if the parent class is not initialized, the initialization of the parent class is triggered first.
-
When the virtual machine starts, the user specifies a main class (the one containing the main() method) to load, and the virtual machine initializes this main class.
-
When using JDK 1.7 dynamic language support, if a Java lang. Invoke. The final analytical results REF_getStatic MethodHandle instance, REF_putStatic, REF_invokeStatic method handles, If the class to which the method handle corresponds has not been initialized, initialization must be triggered first.
The previous five methods are active references to a class. In addition, all methods that reference a class do not trigger initialization. Passive references are preferred. To name a few examples
public class SuperClass { static { System.out.println(“SuperClass init!” ); } public static int value = 1127; }
public class SubClass extends SuperClass { static { System.out.println(“SubClass init!” ); }}
public class ConstClass { static { System.out.println(“ConstClass init!” ); } public static final String HELLOWORLD = “hello world!” }
public class NotInitialization { public static void main(String[] args) { System.out.println(SubClass.value); /** * output : SuperClass init! * * Static objects that refer to the parent class by subclass do not result in subclass initialization * only classes that directly define this field are initialized */
SuperClass[] sca = new SuperClass[10]; /** * output: ** Referencing a class through an array definition does not trigger initialization of the class ** The virtual machine dynamically creates an array class */ at run time
System.out.println(ConstClass.HELLOWORLD); /** * output: ** Constants are stored in the constant pool of the calling class at compile time. In essence, they are not directly referenced to the class that defines the class constant, and therefore do not trigger initialization of the class that defines the constant. * “Hello World” is stored in the NotInitialization constant pool during compile-time constant propagation optimization. * /}}
Class loading process
loading
-
Get the binary stream (ZIP package, network, operation generation, JSP generation, database read) that defines the subclasses through the fully qualified name of a class.
-
Transform the static storage structure represented by this byte stream into the runtime data structure of the method area.
-
Generate a java.lang.Class object in memory that represents the Class and acts as an access point for methods to the Class’s various data.
Array classes are special: The array class itself is not created by the classloader; it is created directly by the Java virtual machine. However, array classes and class loaders are still closely related, because array classes are ultimately created by the class loader. Array creation is as follows:
-
If the component type of the array is a reference type, it is recursively class-loaded.
-
If the component type of the array is not a reference type, the Java virtual machine marks the array for bootstrap classloader association.
-
The visibility of an array class is the same as that of its component type. If the component type is not a reference type, the visibility of the array class defaults to public.
The java.lang.Class object of the in-memory instance exists in the method area. As an external interface for the program to access these types of data in the method area.
Parts of the load and connect phases intersect, but the start times remain sequential.
validation
Is the first step in connecting and ensures that the byte stream in the Class file contains the information required by the current VIRTUAL machine.
File format validation
-
Does it start with the magic number 0xCAFEBABE
-
Check whether the major and minor versions are within the processing range of the current VM
-
Constants in the constant pool have types that are not supported constants (check the constant tag flag)
-
Is there any index value that points to a constant that does not exist or that does not conform to a type
-
CONSTANT_Utf8_info whether there is data in the constant that does not conform to UTF8 encoding
-
Additional information about whether the sections in the Class file themselves have been deleted
-
…
Only after passing the verification in this stage, the byte stream will enter the method area of memory for storage, so the following three verification stages are all based on the storage structure of the method area, and will not directly operate the byte stream.
Metadata validation
-
Does this class have a parent (other than java.lang.object)
-
Does the parent of this class inherit from a class that is not allowed to inherit (final modified classes)?
-
If the class is not abstract, does it implement all the methods required by its parent or interface
-
Whether a field or method in the class conflicts with the parent class (overwriting final fields in the parent class, with improper overloading)
This stage is mainly to semantic verification of metadata information of the class to ensure that there is no metadata information that does not conform to Java language specifications.
Bytecode verification
-
Ensure that the data type of the operand stack and the sequence of instructions work together at any time (no longer reading an int)
-
Ensure that jump instructions do not jump to bytecode instructions outside the method body
-
Ensure that type conversions in the method body are valid (it is safe to assign a subclass object to a superclass data type, and the reverse is illegal)
-
…
This is the most complex stage in the validation process, and the main purpose is to determine that the program semantics are legitimate and logical through data flow and control flow analysis. In this phase, the method body of the class is verified and analyzed to ensure that the methods of the verification class do not cause events that harm VM security during running.
Symbolic reference verification
-
Whether a class can be found for a fully qualified name described by character creation in a symbol reference
-
Whether or not a field descriptor for a character method exists in the specified class, and the methods and fields described by the simple name
-
The accessibility of classes, fields, and methods in symbolic references (private, protected, public, default) is accessible to the current class
-
…
The final stage of validation occurs when the sprint converts a symbolic reference to a direct reference, which takes place during the parse phase, the third stage of the join. Symbolic reference validation can be thought of as checking the match of information outside the class itself (various symbolic references in the constant pool), as well as the above mentioned.
Symbolic reference purpose is to ensure normal execution parsing action, if not through reference symbol verification will throw a Java lang. IncompatibleClass. ChangeError abnormal subclass. Such as Java. Lang. IllegalAccessError, Java. Lang. NoSuchFieldError, Java. Lang. NoSuchMethodError, etc.
To prepare
This phase formally allocates memory for the class and sets the initial values of class variables, which are allocated by internal methods (static variables do not contain instance variables).
public static int value = 1127;
This code is 0 after the initial value is set because no Java methods have been executed yet. The putStatic instruction that assigns value to 1127 is stored in the clinit() method after the program is compiled, so value is assigned during initialization.
A zero value for the base data type
Special case: If the ConstantValue attribute exists in the field attribute table of a class field, the VM will set value to 1127 based on the ConstantValue setting during the preparation phase.
parsing
This stage is where the virtual machine replaces symbolic references in the constant pool with direct references.
-
Symbolic reference
Symbolic references describe the referenced object as a set of symbols that can take any form of literal.
-
Direct reference
A direct reference can be a pointer directly to the target, a relative offset, or a handle that can be indirectly located to the target. Direct references are related to the implementation of the fast memory layout
The parse action is mainly for class or interface, field, class method, interface method, method type, method handle, and call point qualifier 7 class symbol references corresponding to the constant types in the constant pool.
Initialize the
While the previous process is dominated by the virtual machine, the initialization phase starts executing the Java code in the class.
Class loader
Gets the binary byte stream describing a class by its fully qualified name.
Parental delegation model
From the perspective of the Java virtual machine, there are only two types of loaders: a startup class loader (implemented in C++ as part of the virtual machine); The other is a loader for all other classes (implemented in Java, independent of the virtual machine and fully inherited from java.lang.classloader)
-
Start the class loader
Load classes in lib or by -xbootclasspath
-
Extend the class loader
Load classes in lib/ext or the path specified by the java.ext.dirs system variable
-
Reference program class loader
The ClassLoader is responsible for loading the library specified on the user’s path.
Each class has its own parent class loader except for the top-level boot class loader.
How it works: If a classloader receives a classload request, it first does not load it itself, but delegates the request to the parent classloader. Subclasses try to load only if the parent class cannot complete.
Break the parent delegate model
Keyword: Thread Context ClassLoader
Runtime data area
Runtime data is divided into five main components:
Method area – This is where all class-level data is stored, including static variables. There is only one method area per JVM, and it is a shared resource.
Heap area – This is where all objects and their corresponding instance variables and arrays are stored. Each JVM also has a heap area. Because the method and heap areas share the memory of multiple threads, the stored data is not thread-safe.
Stack area – For each thread, a separate runtime stack is created. For each method call, an entry is generated in stack memory, called a stack frame. All local variables are created in stack memory. The stack area is thread-safe because it is not a shared resource. The stack frame is divided into three child elements:
-
Local variable array — methods related to local variables and how many corresponding values will be stored there.
-
Operand stack – If any intermediate operations need to be performed, the operand stack acts as a runtime workspace to perform the operations.
-
Frame data – all symbols corresponding to methods are stored here. In the case of any exceptions, the capture block information is retained in the frame data.
PC register – Each thread has a separate PC register that holds the address of the currently executing instruction, and once the instruction executes, the PC register is updated to the next instruction.
Local method stack – The local method stack holds local method information. For each thread, a separate local method stack is created.
Execution engine
The bytecode assigned to the runtime data area is executed by the execution engine. The execution engine reads the bytecode and executes it one by one.
Interpreters – Interpreters interpret bytecode faster, but execute it slower. The disadvantage of the interpreter is that when a method is called multiple times, new parsing is required each time.
JIT compilers – JIT compilers eliminate the interpreter’s disadvantages. The execution engine uses the interpreter’s help when converting bytecode, but when it finds duplicate code, it uses a JIT compiler, which compiles the entire bytecode and changes it to native code. This native code will be used directly for repeated method calls, improving system performance.
-
Intermediate code Generator – Generates intermediate code
-
Code optimizer – is responsible for optimizing the intermediate code generated above
-
Object code generator – responsible for generating machine code or native code
-
Analyzer – a special component that looks for hot spots, that is, whether the method is called more than once.
Garbage collector: Collects and deletes unreferenced objects. The garbage collector can be triggered by a call to “System.gc ()”, but execution is not guaranteed. The JVM’s garbage collection collects the created objects.
Conclusion:
After the first two cursory readings, I can understand the content, but I have trouble remembering the details. Often encounter not knowledge points on the Internet search, so knowledge points too fragmented brain without a system is not only more difficult to remember, and more easy to chaos. However, by recording in this way, I find that I am much clearer. Even if I forget it in the future, the cost of picking up knowledge again is much lower.