One, foreword

Introduction to THE JVM

The full name of Java Virtual Machine, namely Java virtual machine, can be compared to a virtual computer, it also has a variety of instructions — bytecode instruction set; And its internal memory management, mainly have heap, stack, method area, etc.; It defines a set of specifications; The virtual machine we refer to in the narrow sense is generally the HotSpot virtual machine, because it is the most widely used.

Java from compilation to execution

Java source files are first compiled into.class bytecode files, which are then run on the JVM, converted into machine code by the JVM, and finally run on the computer.

3. JVM cross-platform is language independent

Since the JVM can run on multiple operating systems (Windows, Linux, macOS, etc.), the JVM is cross-platform.

All languages that can compile to.CLSS bytecode can run on the JVM, so the JVM is language independent. For example, Java, Kotlin, and Groovy are all compiled to class bytecode;

4. Architecture of JavaSE

jvm

Java Virtual machine, Java virtual machine is only used to compile bytecode files into machine-recognized code, but it does not generate code itself, so developers need to write code, at this time to use JRE;

jre

Java Runtime Environment (Java Runtime Environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java runtime environment, Java runtime environment, Java runtime environment, Java runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, Java Runtime environment, But we also need the Java language itself and some development tools to write code, and that’s where the JDK comes in;

jdk

Java Development Kit, the full name of the Java development kit, in addition to including JRE, also includes the Java language itself, and the development of the need to use a variety of development tools, such as JavAC compilation tools, JavAP decompression tools, packaging code JAR tools, etc., together constitute the JDK;

So the relationship between JDK, JRE, and JVM is: JDK contains JRE, and JRE contains JVM.

Second, JVM runtime data area

JVM memory management, or stack analysis of JVM runtime data areas

1, the introduction of

One of the biggest advantages of Java over c++ is its automatic memory management mechanism, which does not require manual memory management like c++. The memory area managed by the JVM runtime is generally called the runtime data area. In fact, the actual memory is mapped into the JVM to facilitate memory allocation and management. Runtime data is divided into heap, method area, virtual machine stack, local method stack, and program counter. The thread shared area is heap and method area, and the rest is thread private area.

2. Program counter

A program counter is a small memory space that records the line number of bytecode executed by the current thread. Because of the CPU’s time-slice rotation mechanism, we know that threads can lose CPU elapsed time slices during execution, so we need to use the program counter to record the current thread execution position, so that threads can know where to continue execution when resuming execution. The program counter is the only area of memory where OOM does not occur.

3. Virtual machine stack

A stack is a last-in, first-out data structure that is used to store data required by the current thread’s execution method during JVM execution. During the execution of the current thread, a stack frame is created when the Java method is called and entered into the virtual machine stack. Therefore, the stack frame is used to represent the method. When all the stack frames in the virtual machine stack are removed from the stack, it means that the method is finished, i.e. the thread is finished. Each stack frame contains four areas: local variable table, operand stack, dynamic link, and return address:

1) Local variable table: as the name implies, it is a table used to store local variables. It is a 32-bit length, mainly used to store Java eight basic data types. If it is a 64-bit type, it will occupy two grids. If an Object is stored, the address reference of the Object is stored in the local change table.

2) Operand stack: it is used to store Java data types in the operation process. During the operation of the method, the operands in the operand stack are continuously pushed into and out of the stack;

3) Dynamic connection: mainly reflects the characteristics of Java polymorphism, many classes can only determine the specific type information at run time, so calling the method in the superclass can only determine the specific running method at run time;

4) Return address: under normal circumstances, the address recorded in the program counter will be returned; In the case of exception return, it will return according to the information in the exception processing table.

4. Local method stack

The local method stack is similar to the structure of the virtual machine stack, but it is used to run the native methods, that is, native methods. The native methods are usually written in C or C ++. There is no mandatory regulation on the local method stack in the VM specification, so the virtual machine stack and the local method stack are combined in the HotSpot VM.

5. Method area

A lot of people confuse persistent generation with meta space and method area. Method area is defined in the Java Virtual Machine specification. Persistent generation was implemented in the HotSpot VIRTUAL machine prior to Java 7. And the storage location of the meta space is also changed to out-of-heap memory (direct memory);

The method area is mainly used to store the class information that has been loaded by virtual machines, including class information, static variables, constants, runtime constant pool, string constant pool, etc.

The constant pool is used to store literal and symbolic references generated during compilation. Literals include strings, constants of primitive data types, and symbolic references include fully qualified names and descriptions of classes, method names and descriptors, and field names and descriptors.

6, heap

The heap is the largest area of memory in the JVM, where almost all objects are created, and the heap is where most of the GC garbage collection is done; The heap is divided into Cenozoic and old age, and Cenozoic is divided into Eden area, FROM area and to area.

Whether objects are allocated on the heap or on the stack, there are two cases:

First, if it is a normal object, the JVM first creates the object on the heap and uses its address references everywhere else, such as those stored in a local table of variables in the virtual stack.

The basic data type created in the method body is allocated directly on the stack, while the other basic data types are allocated directly on the heap.

Third, virtual machine optimization technology

1. Compiler optimization technique — method inlining

The actual operation of method inlining is to copy the code of the target method directly to the calling place during method execution, avoiding the real method call and eliminating the method stack frame loading and unloading.

2. Sharing of stack frame event data

In the general model, the different method frames are usually independent, but most JVMS optimize the frames so that the two stack frames overlap. It is mainly used in the process of parameter transfer in the method, such as overlapping the operand of the lower stack frame with the local variable table of the upper stack frame, which not only saves space, but more importantly, eliminates the replication operation when the parameter is called, and can be directly reused.

4. The process of creating objects on VMS

When the JVM encounters a new directive to create an object, it does the following:

1. Check the loading

The virtual machine first checks whether the class has been loaded into the method area. If not, the virtual machine performs the class loading first.

2. Allocate memory

The second step is to allocate memory to objects in the heap, mainly through the pointer collision and free list two ways: pointer collision is generally used in the heap of objects divided neatly, used in one area, unused in another area, by moving the pointer to allocate memory to objects; A free list is a list that maintains which areas have been used and which areas have not been used. Generally, garbage collection and uncollation of the heap will result in a random array of used and unused areas.

There are also concurrency issues to consider when allocating memory in the heap: two methods are generally used, the first is CAS plus retry; The second is a local thread allocation buffer, which is analogous to ThreadLocal.

3. Initialize memory space

The virtual machine initializes the allocated memory space to zero. For example, int is set to 0, Boolean is set to false, String is set to null, etc. This explains how instance fields can be used even if we do not assign initial values.

4, set the

Next, the VM sets the object header of the object, which includes the instance of the object, the metadata information of the object, the hash code of the object, and the GC generation age of the object.

5. Object initialization

For the virtual machine, an object is created, but for the developer, program creation begins because all fields are still zero. In this step, we call the constructor to initialize the object as the developer wishes, so that a usable object is created.

5. Memory layout, access location and survival of objects:

1. Memory layout of objects

1) Object header (1) Store the runtime data hash code of the object itself, GC generation age, lock status identifier, lock held by the thread, bias thread ID, etc. (2) Type pointer: mainly identify the information of the class the object belongs to (3) If it is an array of objects, it should also record the length of the array 2), instance data 3), alignment fill (optional)Copy the code

2. Object access location

For example, there are two ways to access objects via references in the local table of variables on the stack: using handles and direct Pointers

Use the handle

Using handles means that a pool of handles is maintained in the heap, referring first to Pointers to object instances and object types in the handle, and then Pointers in the handle to class types in the object and method areas in the heap.

Direct Pointers

The direct pointer refers to the reference in the local variable table pointing directly to the object in the heap.

3. Determine the survival of the object

Reference counting algorithm

When an object is referenced once, the reference count is incremented by one; When an object is released once, the reference count is reduced by one. When an object’s reference count is set to 0, it is marked as recyclable. One drawback of reference counting is the cross-reference problem. When two objects have no external references, but they reference each other, it will also cause the situation that the reference count is not zero, so it cannot be reclaimed.

Reachable Algorithm Analysis (Root Reachable)

The basic idea is to start with a series of GCRoots and search down from these nodes. The path is called the reference chain. When an object is directly or indirectly connected with GCRoots through the reference chain, it is called root reachable.

Objects that can be used as GCROOTS are: Objects referenced in the local variable table in the virtual machine stack Objects referenced by static variables objects referenced by constants Objects referenced in the local method stack All objects held by synchronization locks (such as synchronized)

The finalize method

Does garbage collection always get collected as long as the object is not root reachable? In fact, there is a Finalize method in Object. When an Object is to be garbage collected, it will call its Own Finalize method. We can preserve the Object in this method, but this method can only be executed once, and it will not be executed when garbage is collected the second time. In addition, the priority of the thread executing Finalize is very low, so it is not guaranteed that the method can be executed in time. It is not reliable, so we cannot rely on this method to keep objects alive. And since Java9, this method has been abandoned.

Four kinds of references to objects

1. Strong references

In general, new objects are strong references. Strong references mean that if an object has strong references that are reachable by GCROOTS, the object will not be reclaimed even if the VM throws an OOM exception.

2. SoftReference SoftReference

The program will recycle soft-referenced objects before an OOM exception occurs. An OOM exception will be raised if there is not enough memory to recycle these objects.

When loading large images, we can use soft reference method to load soft reference images into memory in advance to improve efficiency, and can be recycled properly;

3, WeakReference

Weak references only survive until the next garbage collection, when the next garbage collection occurs, the objects associated with weak references will be reclaimed regardless of memory.

Weak references are widely used. For example, in the implementation of ThreadLocal, the key value ThreadLocal stored in the Map is a weak reference. Another typical application is to solve the problem of Handler memory leaks. If we define an anonymous inner class in an activity directly, since the anonymous inner class will hold a reference to the external class, This causes the activity to exit, but the handler still holds the activity, so the activity cannot be reclaimed. The solution is to create a static inner class Handler that creates a weak reference to the external Activity class in a static inner class so that it can access the Activity’s fields or methods without causing acitivty to fail to reclaim.

4. Virtual reference PhantomReference

Virtual references, also known as ghost references, can be reclaimed at any time. A notification is received at garbage collection time, so the main purpose of a virtual reference is to monitor whether the garbage collector is working properly.

7. Object allocation strategy

1. Are objects allocated on the stack or on the heap

When we look at virtual machine object creation, we often see a sentence: almost all objects are created on the heap; Now, when it gets created on the stack, it gets created on the stack if the object in the method satisfies escape analysis; Escape analysis is divided into method escape and thread escape. If an object in a method does not escape out of the method and cannot be accessed by external threads, we say escape analysis is satisfied. Allocating objects on the stack makes it much more efficient and does not require garbage collection, because stack space is tied to threads, and when the thread finishes executing, the stack is freed, leaving the objects in the stack empty.

2. Storage location of large objects

The JVM decides where to place the object in the heap based on whether it is a large object or not. If it is a large object, it will be placed directly in the old age. If it is not, it will be placed in Eden. Large objects are usually long strings or long arrays. The size ratio between the Cenozoic era and the old era is generally 1:2, and the Eden: From: To = 8:1:1 in the Cenozoic era

3. Object allocation principle

Objects are allocated in Eden area first, that is, newly created objects are allocated in Eden area first.

Space allocation guarantee: when the object enters From or To area, if the space is insufficient, it will directly enter the old age;

How long-lived objects enter the old age: Our newly created object in Eden area, when after recycling Eden area object will be moved To the From of survival area or To area, and then in the From or To the object will be recorded in the head of an object of age, each is still alive after a garbage collection of objects age plus 1, the object will be moved back and forth between the From and To area, The object is moved to the old age when it reaches the specified age condition. The From zone and To zone of the new generation are two areas of equal size, and the garbage collection algorithm implemented is the copy-clean algorithm, which has high efficiency but low space utilization rate, only half of which is used.

Introduction to garbage recycling

1. Generational collection theory: most contemporary garbage collectors follow the theory of generational collection;

The vast majority of objects have very short survival events;

Objects that survive multiple garbage collections are difficult to recycle;

According to the above theory, the new generation and the old generation, the new generation and the old generation have their own specific garbage recycling mechanism;

2. GC type

Cenozoic collection, also known as Young or Minor GC, refers to the collection of garbage from Cenozoic areas; The new generation garbage collection algorithms are generally replication algorithms.

Old age recycling, also known as Old GC or Major GC, there is no unified definition of Old age recycling, some regulations are only for Old age recycling, and some regulations are for the whole heap recycling; In the old age, the recycling algorithm is usually the tag clearing algorithm or the tag sorting algorithm.

Whole heap collection, also known as Full GC, refers to the collection of the entire heap and method area (note the inclusion of the method area);

3. STW phenomenon

When The garbage collection thread is finished, The user thread will resume execution. STW is used to ensure The effect of garbage collection. If The user thread is not suspended, it will create garbage while collecting garbage. However, the disadvantages of STW mechanism are also very obvious, which will cause slight lag in user experience.

If you do not stop the user thread to collect garbage, the speed of collecting objects may be slower than that of producing objects, resulting in oom.

Ninth, garbage collection algorithm

1. Replication algorithm

The copy algorithm is also known as the copy clearance algorithm, which refers to the space is divided into two equal size area AB, each time only one of the areas, garbage collector will first mark the surviving objects in area A, and then the surviving objects will be transferred to area B, and finally the whole area A cleared; The replication algorithm is very efficient in the case of a small number of living objects, because only a small amount of replication movement is needed, and the rest is directly cleared. Because of this characteristic, the replication algorithm is especially suitable for the new generation. Moreover, the replication algorithm has an obvious advantage that it will not generate memory fragmentation, but the disadvantage of the replication algorithm is also obvious, that is, the space utilization is low, the space utilization is only half;

Optimization in Eden area:

If the Cenozoic era is divided into two regions AB, the utilization rate of space is very low due To the traditional replication algorithm, so the Eden area is introduced. In the Cenozoic era, Eden: From: To = 8:1: 1. The newly created objects are directly put into Eden area, while most of the newly created objects have a very short lifetime, so there are very few remaining objects after garbage collection. Then, the remaining surviving objects are put into From or To area, and the standard replication algorithm is implemented in the From and To areas. Using this method, the space utilization is about 90%, and only one space is wasted From or To.

2. Mark clearing algorithm

The principle of the tag removal algorithm is very simple, that is, the garbage collector first marks the objects that need to be collected, and then directly recycles them. This method is simple to implement, but the disadvantage is that memory fragmentation can occur, and if there is too much memory fragmentation, allocating large objects will trigger GC prematurely if there is not a whole area to lay down, so the execution efficiency is not stable.

3. Tag sorting algorithm

The garbage collector marks the objects that need to be collected first, then moves the remaining surviving objects to one end of the region, and finally removes the remaining parts directly after moving. The garbage collector needs to move objects, so its references need to be updated, which is not very efficient, and the user thread pauses for a long time; The advantage is that there is no memory fragmentation

Common garbage collectors in JVMS

Common Cenozoic garbage collectors are Serial, Parallel Insane, and ParNew

Common garbage collectors in the Old days are Serial Old, Parallel Old, CMS

There is a correspondence between these collectors, as the name suggests; There is also the G1 garbage collector, a collector that spans both the old and new generations;

1, Serial and Serial Old

Both are single-threaded recyclers for single-core cpus, using the replication algorithm in the next-generation Serial; In Serial Old, the tag collation algorithm is used;

Parallel with Parallel Old

These two are parallel multithreaded garbage collectors. This parallelism refers to the parallel processing between multiple garbage collection threads. The same replication algorithm is used in the new generation of Parallel; Parallel Old used the tag collation algorithm in the Old days;

3, ParNew

The new generation garbage collector, similar to Parallel, is also a Parallel multi-threaded garbage collector, using the tag collation algorithm;

4, CMS

CMS stands for Concurrent Mark Sweep, a garbage collector in the old days. It is a parallel and Concurrent multi-threaded garbage collector. Parallel refers to the garbage collection thread between parallel processing, and claim is the garbage collection thread and the user threads concurrently, namely some garbage collection do not need user threads suspend operation, which we said in front of the garbage collector whether single-threaded or multithreaded, when garbage collected will suspend user thread;

CMS workflow:

The first is the initial tagging phase, which suspends the user thread and marks objects directly associated with GCroot. Then there is the concurrent marking, further down the reference chain, where the garbage collector thread and the user thread are simultaneously marked; This is followed by the re-marking phase, where some cleaning of the tags is done and the user thread is suspended; Finally, there is the concurrent cleanup process, where both the user thread and the garbage collector thread take place simultaneously.

CMS features

The advantage is that it adopts the mark clearing algorithm, which is simple and efficient. And the use of part of the operation and user thread concurrent, user experience is good, garbage collection lag is slight; The disadvantages are memory fragmentation and floating garbage, because the user thread and garbage collector thread are synchronized during the last step of concurrent cleanup, and garbage objects are generated during cleanup, which is called floating garbage.

5, the G1

G1, full name of Garbage First, is also a kind of parallel and concurrent Garbage collector. It is a collector spanning the new generation and the old generation. It adopts the method of marking and sorting and dividing the whole into parts to recycle Garbage. Dividing the heap into parts means dividing the heap into blocks of equal size, each of which may be Eden, Survivor, Old, or Humongous, where Humongous is used to store large objects. Copy algorithm is still used in the new generation, and tag sorting algorithm is used in the old age. Neither algorithm produces memory fragmentation.

CMS workflow:

Initial marking stage, concurrent marking stage, final marking stage, and finally the filter recycling stage. The first three phases are basically the same as CMS, and the final filter collection selects the appropriate area for collection and suspends the user thread.

Constant pools and Strings

1. Constant pool

After JVM1.8, the runtime constant pool is in the method area, but the string constant pool is in the heap; Since JVM1.8, the implementation of the method area has also changed from permanent generation to meta-space.

Constant pools specified in virtual machines generally include static constant pools and run-time constant pools.

Static constant pools are used to store literals, symbolic references, classes and methods.

Run-time constant pool is to dump symbolic references from static constant pool to run-time constant pool after class loading. After class parsing, symbolic references are replaced with direct references.

Symbolic references: include fully qualified names of classes and methods, field names and descriptors, and method names and descriptors;

Direct reference: the index value of a concrete object;

Create String ();

The String class is final and cannot be inherited. An immutable array is maintained, final char[] value, so String is immutable. There are two ways to create a String:

1), String STR = “aaa”; This method looks for aaa in the string constant pool, and if so, returns the reference. If not, aaa is created in the string constant pool and the reference is returned;

2), String STR = new String(“aaa”); This method checks for aaa in the String constant pool: if so, it returns the reference. Because of the new operator, a String object is created in the heap, which refers to AAA in the constant pool, and a String reference from the heap is returned. If not, aaa is created in the string constant pool, and the following operations are the same as if the constant pool had AAA;

3), String STR = new String(“aaa”).intern(); This method checks for AAA in the String constant pool and returns a reference if there is one, so the following two strings will return true for == comparisons;

String str1 = new String(“abc”).intern(); String str2 = new String(“abc”).intern(); System. The out. Println (str1 = = str2); Return true