My Simple Book synchronous release: JVM understanding is actually not that hard!
Before reading this article, I highly recommend Understanding the Java Virtual Machine in Depth by Zhou Zhiming.
A few days ago, I interviewed an intern of Ali and asked that the Dalvik virtual function could not execute the class file. I answered that it could not, but it executed the dex file of class conversion. When the interviewer continued to ask why the class file could not be executed, I could only answer the reason for the optimization within the Dalvik VIRTUAL machine, but could not correctly answer the specific reason. In fact, zhou Zhiming’s book has the answer: Dakvik is not a Java VIRTUAL machine. It does not follow the Java VIRTUAL machine specification and cannot execute Java class files. It uses register architecture instead of the stack architecture common in JVMS.
In fact, during the undergraduate period, I had contact with “In-depth Understanding of Java Virtual Machine”, but I have not read it carefully, now I think it is a pity! I spent a lot of time studying during my first year of postgraduate study, and now I’m ready to look for a job. Write an essay summarizing the knowledge points in the book. Of course, if you want to go into more detail, check out Understanding the Java Virtual Machine.
JVM memory region
When we write programs, we often encounter OOM (out of Memory) problems and Memory leaks. To avoid these problems, we must first take a concrete look at the JVM’s memory partitioning. The JVM divides memory into method areas, virtual machine stacks, local method stacks, heaps, and program counters. The JVM runtime data area is as follows:
Program counter
The program counter is a thread’s private area, so each thread must have a counter that records which instruction is currently executed. The memory footprint is small and can be thought of as a line number indicator of the bytecode being executed by the current thread. If a thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed. If the Native method is executed, this counter is null (Undefined). This memory region is the only one where the Java Virtual Machine specification does not specify any OutOfMemoryError cases.
Java virtual machine stack
Like program counters, the Java virtual machine stack is thread private. Its life cycle is the same as that of a thread. How do you understand the virtual machine stack? Essentially, it’s a stack. The element inside is called stack frame. Stack frame looks very complicated, but in fact it is very simple! It stores the context of a function, which specifically stores some data of the function to be executed. The data required for a function to be executed is nothing more than a local variable table (which holds variables inside the function), operand stack (which is needed to perform engine calculations), method exits, and so on.
Each time the execution engine calls a function, it creates a stack frame for that function and adds it to the virtual machine stack. From another point of view, each function from call to end of execution is actually corresponding to a stack frame on and off the stack.
Note two possible exceptions in this area: StackOverflowError, which is thrown when the stack depth of the current thread request is greater than the depth allowed by the virtual machine. Creating this exception is simple: recurse a function over and over to itself, and you end up with a StackOverflowError. An OutOfMemoryError is raised when the vm stack can be dynamically expanded (as most VMS can be). An OutOfMemoryError will be raised if the vm stack cannot allocate enough memory.
public void stackLeakByThread(){
while(true){
new Thread(){
public void run(){
while(true){
}
}
}.start()
}
}Copy the code
This code is risky and may cause the operating system to freeze. Use this code with caution
Local method stack
The role played by the Native method stack is similar to that played by the virtual machine stack. The difference is that the virtual machine stack serves methods that execute Java code, while the Native method stack serves Native methods. Like the virtual stack, the local method stack throws StackOverflowError and OutOfMemoryError exceptions.
The Java heap
The Java heap is arguably the largest chunk of memory in a virtual machine. It is an area of memory shared by all threads, where almost all instance objects are stored. Of course, with the development of JIT compilers, the allocation of all objects on the heap is becoming less “absolute”.
The Java heap is the primary area managed by the garbage collector. Since collectors today are essentially generational collection algorithms, all Java heaps can be subdivided into new generation and old generation. In detail, the new generation is divided into: Eden space, From Survivor space, To Survivor space. OutOfMemoryError is thrown when the heap can no longer expand.
Methods area
The method area holds class information, constants, static variables, and so on. The method area is an area shared by individual threads, which is easy to understand. When we write Java code, each line level can access static variable objects of the same class. Because of the reflection mechanism, it is difficult for the virtual machine to infer which class information is no longer in use, so it is difficult to reclaim this area. In addition, this area is mainly for constant pool reclamation, and it is worth noting that JDK1.7 has moved the constant pool to the heap. Similarly, OutOfMemoryError is thrown when the method area cannot meet memory allocation requirements. -xxpermsize and -xx: MaxPermSize can be used to limit the size of the method area before execution.
List list =new ArrayList();
int i =0;
while(true){
list.add(String.valueOf(i).intern());
}
Copy the code
After the operation will be thrown Java. Lang. OutOfMemoryError: PermGen space. The String intern() function puts the String into the constant pool if it doesn’t already exist. The above code keeps adding strings to the constant pool, which is bound to end up out of memory and throw OOM in the methods area.
Here’s why you must run the above code before JDK1.6. As we mentioned earlier, after JDK1.7, we put the constant pool into the heap space. This causes intern() function to behave differently.
String str1 =new StringBuilder("hua").append("chao").toString();
System.out.println(str1.intern()==str1);
String str2=new StringBuilder("ja").append("va").toString();
System.out.println(str2.intern()==str2);
Copy the code
This code runs differently in JDK1.6 and JDK1.7. JDK1.6 result: false,false, JDK1.7 result: true, false. In JDK1.6, the intern() method copies the first encountered string instance into the constant pool and returns a reference to the string in the constant pool. In JDK1.7, intern no longer copies instances, and only references to the first occurrence are kept in the constant pool, so intern() returns the same reference as the string instance created by StringBuilder. Why does the str2 comparison return false? This is because the JVM already has the string “Java” when the class is loaded, which does not comply with the “first occurrence” rule, so return false.
Garbage Collection (GC)
The JVM’s garbage collection mechanism determines whether an object is dead or not not based on whether there are other objects that reference it, but by reachability analysis. References between objects can be abstracted into a tree structure and searched down from the Roots (GC Roots) as the starting point. The searched chain is called the reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is not available and the object is judged to be recyclable.
So which objects can be used as GC Roots? There are mainly the following types:
1. Objects referenced in the VM stack (local variable table in the stack frame). 2. Objects referenced by class static properties in the method area. 3. Objects referenced by constants in the method area 4. Objects referenced by JNI (generally called Native methods) in the local method stack.
In addition, Java also provides soft references and weak references. These two references are objects that can be reclaimed by the virtual machine at any time. We can declare some of the objects that occupy more memory but may be used later, such as Bitmap objects, as soft references and weak references. Note, however, that every time you use this object, you need to display it to see if it is null to avoid errors.
Three common garbage collection algorithms
1. Mark-clear algorithm
Firstly, recyclable objects are marked through reachability analysis, and then all marked objects are uniformly recycled. The marking process is actually the process of reachability analysis. There are two disadvantages of this method: the efficiency of both marking and clearing is not high; The other is the space problem, which causes a large number of discrete memory fragments after the flag is cleared.
2. Replication algorithm
To solve efficiency problems, the replication algorithm divides memory into two equally sized pieces, using only one piece at a time. When this piece of memory is used up, the surviving objects are copied onto another piece of memory. Then clean up the used memory once more. This allows only half of the regions to be garbage collected at a time, and memory allocation does not take fragmentation into account.
However, this is an unacceptable cost, requiring the sacrifice of ordinary memory space. The study found that most of the objects are “living in the same day”, so there is no need to install a 1:1 ratio to divide the memory space. Instead, the memory is divided into one large Eden space and two small Survivor Spaces, and the Eden space and one Survivor space are used each time. The default ratio is Eden: Survivor=8:1. This is how the new generation region is divided. Each time an instance is allocated between Eden and one Survivor, the surviving object is copied to the remaining Survivor when reclaimed. Only 10% of memory is wasted, but the efficiency is high. When the remaining Survivor memory is insufficient, the old memory can be allocated for guarantee. How to understand the allocation guarantee? In fact, when the memory is insufficient, the old generation allocates the memory space, and when the memory of the new generation is recovered, the memory is returned to the old generation, keeping Eden: Survivor=8:1 in the new generation. In addition, two survivors have their own names: From Survivor and To Survivor. The two are often switched, that is, sometimes one block of memory is allocated with Eden, and sometimes the other. Because they often copy each other.
3. Mark-tidy algorithm
The mark-up algorithm is very simple. It marks the objects that need to be reclaimed and then moves all the surviving objects to one end of memory. This has the advantage of avoiding memory fragmentation.
Class loading mechanism
The whole life cycle of a class from loading into the vm memory to unloading from memory includes seven stages: loading, validation, preparation, parsing, initialization, use, and unloading.
The order of the five stages of load, validation, preparation, initialization, and unload is determined. The parsing phase doesn’t have to: it can start after the initialization phase in some cases to support Java runtime binding.
About initialization: The JVM specification explicitly states that there are only five cases in which class initialization (loading, validation, and preparation naturally occur before that) must be performed: 1. New, getstatic, putstatic, invokestatic, if the class is not initialized, it must be initialized. These instructions respectively refer to: new new object, read static variable, set static variable, call static function. 2. When you make a reflection call to a class using the java.lang.Reflect package’s methods, if the class is not initialized, you need to initialize 3. When initializing a class, if the parent class is not initialized, the initialization of the parent class must be triggered first. 4. When the VM starts, the user needs to specify a main class (containing the main function) for execution. The VM initializes this class first. 5. However, with dynamic language support in JDK1.7, if an instance of MethodHandle resolves to REF_getStatic, REF_putStatic, Ref_invokeStatic, and the corresponding class of the MethodHandle is not initialized, Initialization is triggered first.
Also note that referring to a static field from a parent class by subclass does not cause the subclass to initialize:
public class SuperClass{ public static int value=123; static{ System.out.printLn("SuperClass init!" ); } } public class SubClass extends SuperClass{ static{ System.out.println("SubClass init!" ); } } public class Test{ public static void main(String[] args){ System.out.println(SubClass.value); }}Copy the code
It just prints: SuperClass init! For static variables, only the class that defines the field directly is initialized, so referring to a static variable defined in a parent class by a subclass class only triggers parent class initialization, not subclass initialization.
Referencing a class through an array definition does not trigger initialization of the class:
public class Test{ public static void main(String[] args){ SuperClass[] sca=new SuperClass[10]; }}Copy the code
Constants are stored in the caller’s constant pool at compile time and are not, in essence, directly referenced to the class that defines them, so they do not trigger initialization of the class that defines them, as shown in the following code:
public class ConstClass{
public static final String HELLO_WORLD="hello world";
static {
System.out.println("ConstClass init!");
}
}
public class Test{
public static void main(String[] args){
System.out.print(ConstClass.HELLO_WORLD);
}
}
Copy the code
No ConstClass init!
loading
The loading process does three things: 1. Fetch the binary stream 2 of a class by its fully qualified name. Force the static storage structure represented by this byte stream to be converted to the runtime data structure of the method area 3. Generate a java.lang.Class object in memory that represents the Class and serves as the method area for the Class’s various data access points.
validation
The purpose of this phase is to ensure that the information contained in the Class file byte stream meets the requirements of the current VIRTUAL machine and does not compromise the security of the virtual machine.
To prepare
The preparation phase is when memory is formally allocated for class variables and initial values are set. The memory used by these variables is allocated in the method area. First, the allocation of memory at this time only includes class variables (static modified variables), not instance variables. Instance variables are allocated in the Java heap along with the object when it is instantiated. Second, the initial value here is “normally” the zero value of the data type, assuming that a class variable is defined as
public static int value=123;Copy the code
The initial value of the variable value after the preparation phase is 0, not 123, because no Java methods have been executed, and value 123 is assigned to the class constructor () method after the program is compiled.
parsing
The parsing phase is the process of replacing symbolic references to the constant pool in the virtual machine with direct references.
Initialize the
Class initialization is the last step of class loading. In the previous class loading process, except the user can participate in the loading stage through the custom class loader, other actions are dominated and controlled by the VIRTUAL machine. During the initialization phase, the Java program code defined in the class is actually executed.
In the preparation phase, variables have already been assigned the initial values required by the system. In the initialization phase, class variables are initialized according to a subjective plan made by the programmer through the program. The initialization process is actually the execution of the class constructor () method.
The () method is created by combining the assignment of all class variables in a class that the compiler automatically collects with statements in a static statement block. The collection is done in the order in which statements appear in the source file. Only variables defined before the static block can be accessed. Variables defined after it can be assigned, but not accessed. As follows:
public class Test{
static{
i=0;
System.out.print(i);
}
static int i=1;
}
Copy the code
Unlike the class constructor (or instance constructor ()), the () method does not need to explicitly call the superclass constructor. The virtual machine guarantees that the subclass’s () method will execute before the subclass’s () method executes.
Class loader
I won’t mention custom classloaders and the parent delegate model, but it’s time to go to bed after hours of writing