It was originally a paper, but exceeded the word limit, so had to be divided into two. When I read blogs or learn knowledge, what I learn is scattered and there is no independent knowledge module concept, and it is easy to forget after learning. So I set up my own notes warehouse (a note warehouse I maintain for a long time, if you are interested, you can click a star~ your star is a great motivation for me to write), and I classified all the things I learned at ordinary times and put them in it, which is also convenient for review when I need.

1. JVM memory data area

1.1 Memory Data area

These are Java virtual machine specifications, not virtual machine implementations

JVM memory can be divided into several different data areas: program counters, virtual machine stacks, local method stacks, heaps, and method areas

1.1.1 Program counter

Program counters are small in memory and can be seen as line number indicators of the bytecode to which the current thread is pointing. The bytecode interpreter works by changing the value of this counter to select the next bytecode instruction to execute. Branch operations, loop operations, jumps, exception handling, and so on also rely on program counters.

  • In the Java Virtual Machine specification, program counters do not specify cases of OutofMemoryErrors;
  • Program counters are thread private; each thread has a private program counter inside it. Its life cycle is synchronous with the life cycle of the thread;
  • When a thread executes a Java method, the program counter records the address of the virtual machine bytecode instruction being executed. If the Native method is being executed, the counter value is null (Undefined)

1.1.2 Vm Stack

The virtual machine stack is also thread-private and its lifecycle is synchronized with that of the thread. There are two exceptions to the virtual machine stack in the Java Virtual Machine specification:

  • StackOverflowError is raised when the thread request stack depth exceeds the depth allowed by the virtual machine stack
  • OutOfMemoryError Thrown when the Java VIRTUAL machine dynamically expands to insufficient memory

The JVM is executed by a stack-based interpreter, while the DVM is executed by a register-based interpreter.

The above stack is the virtual machine stack. The virtual machine stack was originally used to describe the memory model of Java method execution. For each method execution, the JVM creates a stack frame in the virtual machine stack.

1.1.2.1 stack frame

Stack frames are data structures that support virtual machine method calls and method execution. Each thread creates a stack frame for a method when it executes it. A thread contains multiple stack frames, and each stack frame contains: local variable table, operand stack, dynamic link, return address, etc.

Local variable scale

The local variable table is the storage space for variable values. Parameters passed when calling a method, as well as local variables created inside the method, are stored in the local variable table. When Java is compiled into a class file, the max_locals data item in the method’s Code property table determines the maximum local variable table capacity that the method needs to allocate.

The system does not assign initial values to local variables

The operand stack

The operand stack, also known as the operation stack, is a LIFO stack.

As with the local variable table, the maximum depth of the operand stack is written to the mac_stacks data item in the method’s Code property table at compile time. The elements in the stack can be any Java data type, including long and double.

Method pushes various bytecode instructions into and out of operand stacks during execution.

Dynamic link

The main purpose of dynamic linking is to support dynamic linking during method calls.

The return address

There are only two ways to exit a method that starts:

  • Normal exit: The code in a method completes normally, or exits without throwing an exception after encountering a bytecode instruction (such as return) returned by any of the methods
  • Exception exit: An exception encountered during the execution of a method that is not handled within the method body, resulting in the method being raised.

Either way, after the method exits, it needs to return to the point where the method was called before the program can continue executing. The return address in the virtual machine stack is used to help the current method restore its upper-level method execution state.

1.1.3 Local method stack

The native method stack is basically the same as the virtual machine stack, except for native methods. In some virtual machines the two are already combined (such as HotSpot).

1.1.4 heap

The Java heap is the largest chunk of memory managed by the JVM. The sole purpose of this area is to hold object instances. Almost all instances of objects are allocated within the heap, so it is also the primary area managed by the Java Garbage Collector (GC), sometimes referred to as the GC heap. It is also an area of memory shared by all threads, so objects that are dynamically allocated in this area need to be considered thread safety if they are accessed by multiple threads.

According to the different object storage time, the memory in the heap can be divided into Young generation and Old generation, and the new generation is divided into Eden and Survivor zone. Objects in different regions have different life cycles, so different garbage collection algorithms can be used for targeted and more efficient garbage collection.

1.1.5 method area

Method area: Mainly stores class information (version, fields, methods, interfaces), constants, static variables, just-in-time compiler compiled code, and data that have been loaded by the JVM. This area, like the heap, is an area of memory shared by individual threads.

The runtime constant pool is also in the method area and is used to hold the various literal and symbolic references generated at compile time.

2. HotSpot VIRTUAL machine object

The details of how objects are created, laid out, and accessed make sense only if you limit your discussion to specific virtual machines. Taking the most commonly used HotSpot VIRTUAL machine and the most commonly used memory region Java heap as examples, we will explore the whole process of HotSpot VIRTUAL machine object allocation, layout and access in the Java heap.

The objects discussed below do not include arrays and Class objects

2.1 Object Creation

When the Java virtual machine reaches a bytecode new instruction, it first checks to see if the instruction’s arguments locate a symbolic reference to a class in the constant pool, and to see if the symbolic reference represents a class that has been loaded, parsed, and initialized. If not, the corresponding class loading process must be performed first.

After the class load check passes, the virtual machine next allocates memory for the new objects. The size of memory an object needs is fully determined once the class is loaded, and the task of allocating space for an object is essentially the same as dividing a certain size block of memory from the Java heap. Assumes that the Java heap memory is absolutely neat, all of the memory used aside and free memory is placed on the other side, the middle of a pointer as a cut-off point indicator, that allocates memory is just put the pointer in the direction of free space move a and the object is equal to the size of the distance, this way of distribution collision is called a pointer. But if the memory in the Java heap is not neat, has been the use of memory and free memory staggered together, virtual machine, you must maintain a list of records which blocks of memory is available, at the time of distribution from the list to find a large enough space division to the object instance, and update the record on the list, this way of distribution is called the free list. The regularity of the Java heap is determined by whether the adopted garbage collector has spatial collation capabilities.

Object creation is a very frequent activity in virtual machines, and even just changing the location of a pointer is not thread-safe in concurrent situations. There are two solutions to this problem:

  1. Memory space allocation is synchronized. In fact, the VIRTUAL machine uses the CENTRAL Authentication Service (CAS) configuration to ensure atomicity of the update operation
  2. In other words, each Thread allocates a small chunk of memory in the Java heap in advance, which is called the Thread Local Allocation Buffer (TLAB). The memory allocated by each Thread is allocated in the Local Buffer of the Thread. Synchronization locking is only required when a new buffer is allocated when the local buffer is used up.

After memory allocation is complete, the virtual machine must initialize all allocated memory space (excluding object headers) to zero. Next, the virtual machine sets up the object as necessary, such as which class the object is an instance of, how to find the metadata information about the class, and the GC generation age of the object.

After the above work is done, a new object has been created from the virtual machine’s perspective. But from a Java program’s point of view, object creation also goes through the constructor (the

() method in the Class file).

2.2 Memory Layout of objects

In the HotSpot VIRTUAL machine, the storage layout of objects in the heap memory can be divided into three parts: object headers, instance data, and aligned padding.

Object header: Contains two types of information. The first type is used to store its own runtime data, such as hash code, GC generation age, lock status flag, thread held lock, bias thread ID, bias timestamp, etc. It’s officially called the Mark Word; The second type is a pointer to an object’s type metadata, which the Java virtual machine uses to determine which class the object is an instance of. In addition, if the object is a Java array, there must be a piece of data in the object header that records the length of the array.

Instance data: The valid information that an object actually stores, that is, the contents of the various types of fields that we define in our program code, whether inherited from a parent class or defined in a subclass, must be recorded.

Alignment fill: it is not necessary and has no special meaning. It serves only as a placeholder. The HotSpot VIRTUAL machine’s automatic memory management system requires that the object start address be an integer multiple of 8 bytes, meaning that any object must be an integer multiple of 8 bytes.

2.3 Object Access positioning

Java programs need reference data on the stack to manipulate specific objects on the heap. Because the Reference type in the Java Virtual Machine specification only specifies a reference to an object, it does not define how the reference should locate and access the object in the heap, so object access depends on the virtual machine implementation. At present, there are two main access methods: handle and direct pointer.

Use handles: The Java heap will be divided into a piece of content as a handle pool. Reference stores the address of the handle pool of the object, and the handle contains the specific address information of the instance data and the type data respectively.

Direct pointer: The direct pointer stored in reference is the address of the object.

Both approaches have their own advantages. The biggest advantage of using handles is that reference stores a stable handle address and only changes the instance data pointer in the handle when the object is moved (which is a common behavior in garbage collection). Reference itself does not need to be changed. The biggest benefit of using direct Pointers is that they are faster, and it saves time for a pointer location, which can add up to a significant execution cost because objects are accessed very frequently in Java. HotSpot also uses direct Pointers for object access. But it’s also quite common to use handles for access across the ecosystem.

2.4 Interview

2.4.1 Describing the process of creating an object

Object creation in Java consists of two phases: class initialization and class instantiation. New is just a way and a time to create an object. When the bytecode instruction of new is executed, it will first determine whether the class has been initialized. If not, it will initialize the class.

  • Class initialization: A phase in the life cycle of a class in which initial values are assigned to class members
  • Class instantiation: The process of creating an instance of a class

But before the class is initialized, the JVM ensures that the loading, linking (validation, preparation, parsing) phases of the class are complete.

  • Mount refers to Java virtual machine lookup.classFile and generate byte streams, and then create java.lang.Class objects from byte streams
  • Chaining: Validate created classes and parse them into the JVM for execution by the JVM

When will that class load? There is no specification for when the JVM will execute, and it varies a little from virtual machine to virtual machine.

  • Implicit loading: When an object is generated through a new method, the system implicitly calls the ClassLoader to load the corresponding class into memory
  • Show load: When writing code, an active call to methods such as class.forname () also performs a Class load, which is called show load

Here’s the big framework of the process:

  • When the JVM encounters new bytecode, it determines whether the class has been initialized, and if it has not been initialized (it is possible that the class has not been loaded, or if it is implicitly loaded, it will be loaded, validated, prepared, and parsed), and then initialize the class
  • If it has already been initialized, the instantiation of the class object is started directly, and the methods of the class object are called

Initialization execution flow:

  1. Superclass static variables and static code blocks
  2. Subclass static variables and static code blocks
  3. Parent ordinary member variables and ordinary code blocks
  4. Constructor of the parent class
  5. Subclass plain member variables and plain code blocks
  6. The constructor of a subclass

2.4.2 Trigger time of class initialization

A type is initialized only once in the same classloader. There are six possible times when a class can be initialized:

  1. When the virtual machine starts, initialize the main class that contains the main method
  2. When an instruction such as new is encountered to create an object instance, the initialization operation is performed if the target object class is not initialized
  3. When an instruction is encountered that accesses a static method or field, initialize if the target object class has not been initialized
  4. Initialization of a subclass If its parent class has not been initialized, the initialization of its parent class must be triggered first
  5. When a reflection call is made using the reflection API, initialization needs to be triggered if the class has not already been initialized
  6. First calljava.lang.MethodHandleWhen instantiating, you need to initialize the MethodHandle to point to the class in which the method resides

2.4.3 Is there a problem with multithreading class initialization?

No, the

() method blocks. In a multi-threaded environment, if multiple threads initialize a class at the same time, only one thread will execute the

() of that class, and all other threads will block.

2.4.4 Trigger time of class instantiation

  • Use the new keyword to create an object
  • Use the Class newInstance method and the Constructor newInstance method
  • Create objects using the Clone method
  • Create objects using a (de) serialization mechanism

2.4.5 <clinit>()Methods and<init>()Methods the difference

  • <clinit>()Methods occur during class initialization and perform the initialization of static class variables in the class and the logic in the static code block in the order in which statements appear in the source file
  • <init>()Methods occur during class instantiation and are default constructors that perform initialization of ordinary member variables and logic of ordinary code blocks in the order in which statements appear in the source file

2.4.6 Can a class directly instantiate the corresponding object before initialization is complete?

This is possible if a class static variable is an instance of itself.

public class Run {
    public static void main(String[] args) {
        newPerson2(); }}public class Person2 {
    public static int value1 = 100;
    public static final int value2 = 200;

    public static Person2 p = new Person2();
    public int value4 = 400;

    static{
        value1 = 101;
        System.out.println("1");
    }

    {
        value1 = 102;
        System.out.println("2");
    }

    public Person2(a){
        value1 = 103;
        System.out.println("3"); }}Copy the code

Initialize the static variables first, then the normal member variables and the normal code blocks, and finally the constructors. So this is instantiated during initialization.

Therefore, instantiation does not have to start after the initialization is complete, it may happen during the initialization process.

2.4.7 Similarities and Differences between the initialization and instantiation processes of classes

  • Initialization of a class, which is a phase after the class is loaded and linked, takes place<clinit>()Method, initialize static variables, execute static code blocks, and so on
  • Instantiation of a class, which is the process of creating an object after the class is fully loaded into memory, is performed<init>()Method to initialize a normal variable and call a normal code block

2.4.8 An instance variable can be assigned a maximum of several times during object initialization

  1. When an object is created, memory allocation assigns the instance variable to its default value, which is guaranteed to happen
  2. When the instance variable itself is initialized, it is assigned once
  3. When the code block is initialized, it is also assigned once
  4. Constructor, assign again

It could be four times. Look at the example code

public class Person3 {
    public int value1 = 100;

    {
        value1 = 102;
        System.out.println("2");
    }

    public Person3(a){
        value1 = 103;
        System.out.println("3"); }}Copy the code

3. Garbage collector and memory allocation strategy

Java and C++ have a high wall of dynamic memory allocation and garbage collection. People on the outside want to get in, but people on the inside want to get out. – Understanding the Java Virtual Machine

3.1 What is garbage

The application counter, virtual machine stack, and local method stack are all in sync with the thread, so you don’t have to worry too much about recycling.

The Java heap and method area have obvious uncertainties: Multiple implementation classes of an interface may require different memory, and different conditional branches pointed to by a method may require different memory. Only during runtime can we know exactly which objects and how many objects will be created by the program. The allocation and reclamation of this part of memory is dynamic. The garbage collector is concerned with how this portion of memory is managed.

Garbage is an object in memory that is no longer useful. The Java virtual machine uses a reachability analysis algorithm to determine which objects are garbage and whether they can be recycled.

3.2 Is the Object Dead?

The garbage collector needs to determine whether an object is unused before the heap can be collected. There are two methods:

1. Reference counting

Adds a reference counter to an object, incrementing the counter value by one each time a reference is made. When a reference is invalid, the counter value is reduced by one; An object whose counter is zero at any point in time cannot be used again. Reference counting is easy to implement and efficient to judge. However, reference counting is not used to manage memory in Java virtual machines. The main reason is that it is difficult to solve the problem of circular references between objects.

2. Accessibility analysis algorithm

The basic idea of the reachable analysis algorithm is to search down from a series of objects called “GC Roots” as the starting point. The search path is called reference chain. When an object is not connected to GC Roots by any reference chain, that is, unreachable, the object is proved to be unavailable.

Objects that can be used as GC Roots include:

  • Objects referenced in the virtual machine stack (the local variable table in the stack frame), such as parameters, local variables, temporary variables, etc. used in the method stack called by individual threads
  • An object referenced by a class static attribute in a method area, such as a Java class reference type static variable
  • Objects that are constant references in the method area, such as references in the String constant pool (String Table)
  • Objects referenced by the Native method stack JNI (Native method)
  • Java virtual machine internal references, such as the Class object corresponding to the basic data type, some resident exception objects (such as NPE,OOM), and the system Class loader
  • All objects held by the synchronized keyword
  • Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, local code caches, and so on

3.3 When should garbage be collected

Different virtual machine implementations have different GC implementation mechanisms, but generally each GC implementation triggers garbage collection in two ways.

  • Allocation Failure: During Allocation in heap memory, if the Allocation of object memory fails due to insufficient free space, then the system will trigger a GC
  • System.gc() : At the application layer, this API can be actively called to suggest that the virtual machine perform a GC.

3.4 More references

3.4.1 track strong reference

If an object has a strong reference, the garbage collector does not collect it. Reference assignments are common in program code, such as “Objectobj=new Object()”.

3.4.2 soft references

When memory is really low, soft references are reclaimed. After JDK 1.2, the SoftReference class was provided to implement soft references

Rule 3.4.3 weak references

Weak references are also used to describe non-essential objects, but they are weaker than soft references, and objects associated with weak references only survive until the next garbage collection occurs. When the garbage collector starts working, objects associated only with weak references are reclaimed, regardless of whether there is currently enough memory. WeakReference classes were provided after JDK 1.2 to implement weak references.

3.4.4 virtual reference

The presence or absence of a virtual reference to an object has no effect on the alignment of the lifetime, nor can it be used to obtain an instance of an object. The sole purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector. The PhantomReference class was provided after JDK 1.2 to implement virtual references.

3.5 Garbage collection algorithm

3.5.1 Mark-clear algorithm

Clear in place after marking.

All marked objects are marked first, and all marked objects are recycled after marking is complete.

Inadequate:

  • Efficiency problem: Both marking and cleaning processes are inefficient
  • Space issues: A large number of discrete memory fragments are generated after the tag is cleared, so that large objects cannot be allocated and another garbage collection action has to be triggered early

3.5.2 Mark-copy algorithm

Usually, only half of the space is used. When reclaiming, all the surviving space is copied to the other half of the space and all the previous half of the space is cleared.

The mark-copy algorithm is also known as the copy algorithm.

In order to solve the efficiency problem, the replication algorithm appeared. It divides the available memory into two equally sized pieces according to capacity, uses one piece at a time, and when the piece of memory is used up, copies the surviving objects to the other piece, and then cleans up the used piece of memory once. Simple to implement and efficient to run. But this algorithm comes at the cost of reducing available memory by half.

Virtual machines now use replication algorithms to recycle the new generation. According to historical experience, 98% of the objects of the new generation are short-lived, so it is not necessary to divide the memory space according to the 1:1 ratio, but to divide the memory into a large Eden area and two small Survivor Spaces, and use the Eden area and one Survivor each time. When reclaiming, the surviving objects in Eden and Survivor are copied to another Survivor space at once, and Eden and the Survivor space that was just used are cleaned up. Of course, if the Survivor space does not fit, you need to rely on other memory (usually old) for allocation guarantee. The default HotSpot VIRTUAL machine has an 8:1 Eden to Survivor ratio, meaning that only 10% of memory is wasted.

The replication algorithm is very inefficient when the object survival rate is relatively high. More importantly, if you do not want to waste 50% of the memory space, you need to have extra space for allocation guarantee, so the replication algorithm is generally not used in the old era.

3.5.3 Mark-collation algorithm

After the tag is marked, all objects are copied to one side of the space, and all space beyond the memory-occupied boundary is cleaned up.

Instead of cleaning up the reclaimable objects directly, the next step is to move all surviving objects toward one end and then clean up memory directly beyond the boundary.

3.6 Algorithm implementation details of HotSpot

3.6.1 Enumerating root Nodes

Fixing the nodes that can be used as GC Roots is mainly in global references (such as constant or class static attributes) and execution contexts (such as local variable tables in stack frames). Although the goal is clear, it is not easy to make the lookup process efficient. So far, all collectors have had to suspend the user thread while enumerating this step at the root node.

Ensure consistency snapshot: the analysis of the work should be done in a snapshot can ensure consistency – during the whole analysis of the point system looks like frozen at some point in time, can’t appear analysis object references in the process of relationship is still in changing circumstances, which does not meet the results of the analysis accuracy cannot be guaranteed.

Mark object references with OopMap: In HotSpot, a set of OopMap data structures are used to mark the location of object references. When the class load is complete, HotSpot calculates what type of data is on what offsets within the object. During JIT compilation, which locations in the stack and registers are references are also recorded at specific locations. With the help of OopMap, HotSpot can do GC Roots enumerations quickly and accurately.

3.6.2 Safepoint

What is a safe point: There are so many instructions that cause the OopMap content to change that generating an OopMap for each instruction would require a lot of extra space, which would make the GC’s space cost very high. In fact, HotSpot does not generate an OopMap for every instruction, the index records this information at “specific locations” called safe points, where the program does not always pause to start GC, but only when it reaches a safe point. Safety points should not be chosen so little that the GC waits too long, nor so frequently that they unduly add runtime load.

How to choose a safe spot: Security point selected is “whether it has the characteristics of program execution for a long time” as a standard for the selected, because each instruction execution time is very short, program is unlikely because the instruction stream length is too long for this reason and long time running, “long time” the most obvious feature is the instruction sequence reuse, such as method calls and loop jump, abnormal jump, This is why instructions with these functions produce safety points.

Ways to pause at safe points: preemptive interrupt and active interrupt. Few virtual machine implementations now use preemptive interrupts to suspend threads in response to GC events. The idea of active interrupts is that when GC needs to interrupt a thread, it does not operate directly on the thread, but simply sets a flag, and each thread actively polls the flag when it executes, and interrupts and suspends itself when it finds that the interrupt flag is true. Polling flags overlap with safe points, plus where memory is allocated to create objects.

3.6.3 Safe Region

When the program does not execute (such as the sleep state) it cannot reach a safe point, and in this case it needs a safe zone. A security zone is a code fragment in which reference relationships do not change. It is safe to start GC anywhere in this region. We can also think of the Safe Region as an extended security point.

When a thread executes code in the Safe Region, it first identifies itself as having entered the Safe Region, so that the JVM does not have to worry about the thread that identifies itself as Safe Region when it initiates a GC during that time. When a thread leaves the Safe Region, it checks to see if the system has completed the root node enumeration (or the entire GC process), and if so, the thread continues, otherwise it must wait until it receives a signal that it is Safe to leave the Safe Region.

3.6.4 Memory Set and card table

To solve the problem of cross-generation references, the garbage collector builds data structures called memory sets in the new generation to avoid adding the entire old generation to the GC Roots scan scope.

Card table: Each record is accurate to an area of memory that contains objects with cross-generation Pointers. Implementing a memory set in this way is currently the most common form of memory set implementation.

3.7 Garbage Collector

If collection algorithms are the methodology of memory collection, the garbage collector is the practice of memory collection. Why there are so many garbage collectors: Because the scenario is different.

3.7.1 Serial Collector

The Serial collector is a single-threaded collector that must suspend all other worker threads until it finishes collecting garbage. Widely used so far (the client mode defaults to the new generation of collectors), it is simple and efficient, and for memory-constrained environments, it consumes the least extra memory of any collector. The Serial collector is a good choice for virtual machines running in client mode.

3.7.2 ParNew collector

The ParNew collector is a multithreaded version of the Serial collector, which is the preferred next-generation collector for HotSpot virtual machines running in many server-side modes, especially legacy systems prior to JDK 7. In addition to the Serial collector, it is currently the only one that works with the CMS collector.

The CMS collector was released with JDK 5, and for the first time allowed the garbage collector thread to work (basically) at the same time as the user thread.

3.7.3 Parallel Scavenge

The Parallel Collector is a new generation collector. It is also based on the mark-copy algorithm and is a multi-threaded collector that can collect in Parallel. The goal of the Parallel Insane is to achieve a controlled throughput.

3.7.4 Serial Old Collector

The Serial Old collector is an older version of the Serial collector, which is also a single-threaded collector using a mark-collation algorithm. The main purpose of this collector is also to be used by HotSpot virtual machines in client mode.

3.7.5 Parallel Old Collector

The Parallel Old collector is an older version of the Parallel Avenge collector that supports concurrent collection through multiple threads and is based on the mark-collation algorithm.

3.7.6 CMS Collector

The CMS collector is a collector whose goal is to obtain the shortest collection pause time. Focus on the response speed of the service, and CMS is just right. CMS is implemented based on mark-sweep algorithm. CMS is an excellent collector, and its main advantages are already evident in its name: concurrent collection, low pauses.

Garbage First collector

The Garbage First collector, or G1 for short, pioneered the local collection-oriented design of the collector and region-based memory layout. G1 is a garbage collector primarily for server applications. On the release of JDK 9, G1 was declared to replace the Parallel Insane and ParallelOld combination as the default garbage collector in server-side mode, while CMS was reduced to a collector declared deprecated. G1 is based on the Region heap memory layout. Although G1 is still designed according to the generation collection theory, its memory layout is very different from other collectors: G1 no longer insists on a fixed size and number of generational regions, but divides the continuous Java heap into multiple independent regions of equal size. Each Region can act as the Eden space of the new generation, Survivor space, or old age according to needs. The collector uses different policies to process regions based on their roles. The name Garbage First is derived from the fact that G1 prioritises regions that generate the most Garbage, according to the allowed collection pause times set by users.

The OPERATION of the G1 collector can be divided into four steps:

  1. Initial tag: Just mark the objects that GC Roots can associate directly with, and change the value of the TAMS pointer so that the next phase of user threads running concurrently will allocate new objects correctly in the available regions. Threads need to be paused, but only for a short time.
  2. Concurrent marking: Reachability analysis of objects in the heap is performed starting with GC Root, recursively scanning the entire heap object graph for objects to reclaim. This phase is time-consuming, but can be performed concurrently with user programs.
  3. Final mark: Another short pause is made on the user thread to deal with the last few SATB records that remain after the end of the concurrent phase.
  4. Filter recycling: You can update the statistics of a Region, sort the reclamation value and cost of each Region, and make a reclamation plan based on the pause time expected by users. You can select multiple regions to form a collection and copy the surviving objects of the Region to an empty Region. Then clean up the entire space of the old Region.

3.8 Memory Allocation and Reclaiming Policy

As for memory allocation, it is generally allocated on the heap. Objects are mainly allocated in the Eden area of the new generation. If the local thread allocation buffer is started, it will be allocated on TLAB according to thread priority. A few cases can also be directly assigned in the old age.

The Java virtual machine (JVM) divides the heap memory into several blocks based on the lifetime of the object, usually into the new generation and the old generation. This is the memory generation strategy of JVM. In HotSpot there are not only new generation and old generation, but also permanent generation.

The central idea of generational collection is that memory is allocated in the new generation for newly created objects, which generally have a short object lifetime. If they survive multiple cycles of recycling, they are transferred to older age.

3.8.1 Young Generation

The newly generated objects are preferentially stored in the new generation, and the new generation objects die overnight and have a very low survival rate. In the new generation, the conventional application of a garbage collection can generally recover 70%-95% of the space, and the recycling efficiency is very high. In the new generation, because there are some replication operations, the GC collection algorithm commonly used is the replication algorithm.

The new generation can be further divided into three parts: Eden, Survivor0 and Survivor1. These three sections divide the new generation by an 8:1:1 ratio.

In most cases, objects are allocated in the Eden area of the new generation, and when the Eden area does not have enough space to allocate, the virtual machine will initiate a Minor GC.

  • Minor GC: Refers to garbage collection that occurs in the new generation. Since Java objects tend to be ephemeral, Minor GC is very frequent and generally fast
  • Major GC (Full GC) : Refers to GC that occurs in an older era, usually accompanied by at least one Minor GC, which is typically 10 times slower than a Minor GC

3.8.2 Old Generation

If an object survives long enough in the Cenozoic era to not be cleaned up, it will be copied to the old age. The memory size of the older generation is generally larger than that of the new generation and can hold more objects.

If the object is large (such as a string or a large array) and there is not enough space left for the new generation, the large object is allocated directly to the old generation. We can use – XX: PretenureSizeThreshold to control the size of the object directly to the old s, is greater than the value of the object will be allocated directly on the old s. In the old age, because of the long life cycle of the object, there is no need for too much copy operation, so the recycling algorithm of mark collation is generally adopted.

Long-lived objects will be old: Since virtual machines use generational collection to manage memory, memory reclamation must be able to identify which objects should be placed in the new generation or the old generation. To do this, the virtual machine defines an object age counter for each object. If the object is still alive after Eden’s birth and after the first Minor GC and can be accommodated by Survivor, it is moved to Survivor space and the object age is set to 1. Each time an object survives a Minor GC in a Survivor zone, its age increases by one year. When it reaches a certain age, 15 by default, it will be promoted to the old age.

It can be the case that objects from the old era sometimes refer to objects from the new generation. If you want to perform a Cenozoic GC at this point, you may need to query for possible references to Cenozoic in the entire old era, which is obviously inefficient. Therefore, the old generation maintains a 512byte card table, where all the information of the old generation objects referencing the new generation objects is recorded. Every time the new generation sends a GC, it only needs to check the card table, greatly improving performance.

4. Java bytecode (class file) interpretation

I wrote a Java bytecode interpretation before

5. Introduction to bytecode instructions

Java virtual machine instructions consist of a byte number (called an opcode) that represents the meaning of a particular operation, followed by zero or more parameters (operands) that represent the operation.

5.1 Bytecode and data types

In the Java virtual machine instruction set, most instructions contain data type information for their operations. For example, the ILoAD directive loads int data from a local variable table into the operand stack, while the FLOAD directive loads float data. The operations of these two instructions may be performed by the same piece of code inside the virtual machine, but they must have separate opcodes in the Class file.

The compiler extends signed byte and short data to the corresponding int, and Boolean and char zeros to the corresponding int at compile time or run time. Thus, most operations on Boolean, byte, short, and CHAR data are actually performed using the corresponding int type as the operation type.

5.2 Load and Store instructions

Load and store instructions are used to transfer data back and forth between the local variable table and operand stack in a stack frame. These instructions are as follows:

  • Load a local variable onto the operation stack: ILoad, ILoAD_, lload, lload_, fload, fload_, dload, dload_, ALOad, ALOad_
  • Store a value from the operand stack to the local variable table: ISTore, istore_, lstore, lstore_, fstore, fstore_, dstore, dstore_, astore, astore_
  • Load a constant onto the operand stack :bipush, sipush, LDC, LDC_w, LDC2_W, aconST_NULL, iconst_M1, iconst_, LCONST_, fCONST_, dCONST_
  • Instruction that extends the access index of a local variable table :wide

5.3 Operation Instruction

Arithmetic instructions are used to perform a specific operation on the values on two operand stacks and to store the results back to the top of the stack.

  • Add instructions :iadd, ladd, fadd, dadd
  • Subtraction instructions: ISub, LSUB, fsub, dsub
  • Multiplication instruction: IMul, LMUl, FMUl, dMUl
  • Division instructions: IDIV, Ldiv, fdiv, ddiv
  • Redundant instructions: IREM, LREM, frem, DREM
  • Fetch counter instruction: ineG, Lneg, fNEg, dNEg
  • Displacement commands: ISHL, ISHR, IUSHR, LSHL, LSHR, LUShr
  • Bitwise or instruction: IOR, LOR · Bitwise and instruction: IAND, LAND
  • Bitwise xOR instruction: IXOR, LXOR · Local variable increment instruction: iInc
  • Comparison instructions: DCMP G, DCMP L, FCMP G, FCMP L, LCMP

5.4 Type Conversion Instructions

Conversion instructions can convert two different numeric types to each other. These conversion operations are generally used to implement display conversion operations for user code, or to deal with the problem of data type related instructions in the bytecode instruction set that do not correspond to the data type.

The Java VIRTUAL machine directly supports wide type conversions of the following numeric types:

  • Int to long, float, or double
  • Long to float, double
  • Float to double

In contrast, narrowing type conversions must be handled explicitly using conversion instructions: I2b, I2C, I2s, L2I, F2I, F2L, D2I, d2L, and D2F

When an int or long is narrowed to an integer T, the conversion simply dismisses all but the least significant N byte, which is the length of the data type of type T. This may result in the conversion having a different sign than the input value (with the preceding sign bits omitted).

5.5 Object creation and access instructions

Although class instances and arrays are both objects, the Java virtual machine uses different bytecode instructions to create and manipulate class instances and arrays. Once the object is created, you can retrieve fields or array elements from the object instance or array instance through the object access instruction.

  • Directive to create class instances :new
  • New array; anew array; mult ianew array
  • Directives that access class fields (static fields, or class variables) and instance fields (non-static fields, or instance variables) : getField, putfield, getStatic, putStatic
  • The instruction to load an array element into the operand stack :baload, caload, Saload, iaload, laload, faload, daload, aaload
  • Instructions to store the values of an operand stack in an array element: Bastore, Castore, sastore, iastore, fastore, dastore, aastore
  • Take array length instructions :array lengt h – check class instance type instructions :inst anceof, checkcast

5.6 Operand stack management instructions

As with the stack in a normal data structure, the Java virtual machine provides instructions for manipulating the operand stack directly:

  • Remove one or two elements from the top of the operand stack :p op, p op 2
  • Copy one or two values from the top of the stack and push the copied values or double copies back to the top: DUP, DUP 2, DUP _x1, DUP 2_x1, DUP _x2, dUP 2_x2
  • Swap the top two values of the stack :swap

5.7 Control transfer instruction

A control transfer instruction allows the Java virtual machine to conditionally or unconditionally proceed from the next instruction at a specified location (instead of the control transfer instruction). From a conceptual model, a control instruction can be thought of as modifying the value of a PC register conditionally or unconditionally.

  • Conditional branches: IFEQ, IFLT, IFLE, IFNE, IFGT, IFGE, IFNULL, IFnonNULL, IF_ICMPEQ, IF_ICMPNE, IF_ICMPLt, IF_ICMPGT, IF_ICMPLE, IF_ICMPGE, if_ACMPE Q and if_acmpne
  • Compound condition branches: Tableswitch and LookupSwitch
  • Unconditional branches: GOTO, GOTO_W, JSR, jSR_W, ret

Boolean, byte, CHAR, and short conditional branch comparisons are performed using int comparisons. Long, float, and double conditional branch comparisons are performed first. The budget instruction returns an integer value to the operand stack, and then performs a conditional branch comparison of type int to complete the branch jump. Therefore, all types of comparisons will eventually be converted to int comparisons.

5.8 Method calls and return instructions

Method invocation: dispatches, executes procedures.

  • The Invokevirt UAL directive: The instance methods used to invoke an object are dispatched based on the actual type of the object (virtual method dispatch), which is the most common method dispatch method in the Java language.
  • Invokeinterface directive: Invokes an interface method, which searches at runtime for an object that implements the interface method and finds the appropriate method to invoke.
  • The Invokespecial directive is used to call instance methods that require special processing, including instance initialization methods, private methods, and parent methods.
  • Invokestatic directive: Used to invoke class static methods (static methods).
  • Invokedynamic directive: Used to dynamically resolve the method referenced by the call point qualifier at run time. And execute the method. Whereas the dispatch logic for the invokedy Namic instruction is fixed within the Java virtual machine and cannot be changed by the user, the dispatch logic for the Invokedy Namic instruction is determined by the bootloader method set by the user.

Method call directives are independent of the data type, while method return directives are differentiated by the type of value returned, including iReturn (used when the return value is Boolean, byte, CHAR, short, and int), LReturn, freturn, dreturn, and Areturn, There is also a return directive for methods declared as void, instance initializers, and class initializers of classes and interfaces.

5.9 Exception Handling Commands

In addition to throwing an exception explicitly in a Java program (athrow statement), the Java virtual machine specification specifies that many runtime exceptions are automatically thrown when other Java virtual machine instructions detect the exception state. For example, in integer arithmetic, when the divisor is zero, the virtual machine throws the ArithmeticException exception in the IDIV or LDIV instruction.

In the Java virtual machine, catch statements are handled not by bytecode instructions (JSR and RET instructions were used a long time ago, but are no longer used), but by exception tables.

5.10 Synchronization Command

Java virtual machines can support method-level synchronization and synchronization of a sequence of instructions within a method, both of which are implemented using Monitor.

Method-level synchronization is implicit and cannot be controlled by bytecode instructions; it is implemented in method calls and return operations. The virtual machine can tell if a method is declared to be synchronized from the ACC_SYNCHRONIZED access flag in the method table structure in the method constant pool. When a method is invoked, the calling instruction checks to see if the ACC_SYNCHRONIZED access flag of the method is set. If so, the thread of execution requires that it successfully hold the pipe before executing the method, and finally release the pipe when the method completes (either normally or abnormally). During the execution of a method, the executing thread owns the handler, and no other thread can retrieve the same handler. If a synchronized method throws an exception during execution and cannot handle the exception inside the method, the pipe held by the synchronized method is automatically released when the exception is thrown outside the synchronized method boundary.

Synchronizing a sequence of instructions is usually represented by a synchronized statement block in the Java language. The Java VIRTUAL machine has monitorenter and Monitorexit directives to support the semantics of synchronized. Proper implementation of the synchronized keyword requires the cooperation of both the JavAC compiler and the Java virtual machine.

6. Vm class loading mechanism

The Java VIRTUAL machine loads the data describing the class from the class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by the VIRTUAL machine. This process is called the virtual machine class loading mechanism.

6.1 Class loading timing

From the moment a type is loaded into vm memory to the moment it is uninstalled, its whole life cycle goes through seven stages: loading, verifying, preparing, parsing, initializing, using, and uninstalling. The three parts of verifying, preparing, and parsing are collectively referred to as connections.

There is no constraint in the Java Virtual Machine specification on when you need to begin the first stage of the classloading process, “loading.” However, for the initialization phase, it is strictly regulated that only the following six cases must be “initialized” immediately (loading, validation, and preparation naturally need to begin before then) :

  1. When you encounter four bytecode instructions — New, getstatic, putstatic, or Invokestatic — if the type has not been initialized, you need to trigger its initialization phase first. A typical Java code scenario that generates these four instructions:
    • When an object is instantiated using the new keyword
    • Read or set static fields of a type (except static fields that are final and have been put into the constant pool of the class at which the call was made at compile time)
    • When a static method of a type is called
  2. When a reflection call is made to a type using the java.lang.Reflect package’s methods, initialization needs to be triggered if the type has not already been initialized
  3. When initializing a class, if the parent class has not been initialized, the initialization of the parent class must be triggered first
  4. When the virtual machine starts, the user needs to specify a primary class (the one containing the main () method) to execute, and the virtual machine initializes this primary class first
  5. When using JDK 7 new dynamic language support, if a Java lang. Invoke. The analytical results of the final MethodHandle instanceREF_getStatic,REF_putStatic,REF_invokeStatic,REF_newInvokeSpecialFour types of method handles, and the corresponding class of this method handle has not been initialized, need to trigger its initialization first
  6. When an interface defines a new JDK 8 default method (an interface method decorated with the default keyword), if any of the interface’s implementation classes are initialized, the interface should be initialized before it

6.2 Class loading process

The entire process of class loading in a Java virtual machine: loading, verification, preparation, parsing, and initialization.

6.2.1 load

During the load phase, the Java virtual machine needs to do three things:

  1. Gets the binary byte stream that defines a class by its fully qualified name
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area

After the loading phase is complete, the binary byte streams outside the Java VIRTUAL machine are stored in the method area in the format specified by the VIRTUAL machine. The data storage format in the method area is completely defined by the virtual machine implementation. Once the type data is properly placed in the method area, an object of the Java.lang. Class Class is instantiated in the Java heap memory, which acts as an external interface for the program to access the type data in the method area.

6.2.2 validation

Validation is the first step in the connection phase. The purpose is to ensure that the information contained in the byte stream of the class file complies with all the constraints of the Java Virtual Machine Specification and does not compromise the security of the virtual machine if it is run as code. Using pure Java code, you can’t do things like access data beyond array boundaries, convert an object to a type it doesn’t implement, jump to a line of code that doesn’t exist, and if you try to do so, the compiler throws an exception and rejects compilation. But all of this is possible, at least semantically, at the bytecode level. If the Java virtual machine does not check the input byte stream and trusts it completely, it is likely that the entire system will be attacked or even crashed due to the loading of erroneous or malicious bytecodes. Therefore, verifying bytecodes is a necessary measure for the Java virtual machine to protect itself.

In the verification phase, four verification actions will be roughly completed: file format verification, metadata verification, bytecode verification and symbol reference verification.

6.2.2.1 1. Verify the file format

The first phase verifies that the byte stream complies with the class file format specification and can be processed by the current version of the VIRTUAL machine. The main purpose of this validation phase is to ensure that the input byte stream is properly parsed and stored in the method area in a format that conforms to the requirements of a Java type information. The byte stream is allowed to be stored in the method area of the Java virtual machine memory after passing this verification phase. Therefore, the following three verification phases are based on the storage structure of the method area and will not read or manipulate the byte stream directly.

  • Does it start with the magic number 0xCAFEBABE
  • Check whether the major and minor versions are acceptable for the current Java VM
  • Is there any unsupported constant type in the constant pool constant (check the constant t ag flag)
  • Is there any index value that points to a constant that does not exist or that does not conform to a type
  • CONSTANT_Utf8_info whether there is data in the constant that does not conform to utF-8 encoding
  • Whether any other information has been deleted or added to parts of the Class file and the file itself
  • .
6.2.2.2 2. Metadata verification

The second stage is semantic analysis of the information described by bytecode to ensure that the information described conforms to the Java language specification. The main purpose is to semantic verification of the metadata information of the class.

  • Whether this class has a parent (all classes except java.lang.object should have a parent)
  • Does the parent of this class inherit from classes that are not allowed to inherit (classes modified by final)?
  • If the class is not abstract, does it implement all the methods required by its parent or interface
  • Does a field or method ina class conflict with the parent class (e.g., overwriting a final field in the parent class, or having a method overloading that doesn’t conform to rules, e.g., method parameters are identical but return value types are different, etc.)
  • .
6.2.2.3 3. Byte code verification

The main purpose of the third stage is to determine the legitimacy and logic of program semantics through data flow analysis and control flow analysis. Verify and analyze the method body (Code attribute in the class file) of the class to ensure that the methods of the verified class do not harm VM security when running. But even if a method body passes bytecode verification, it is still not guaranteed to be secure.

A joint optimization was done in the Javac compiler and Java virtual machine after JDK6 to move as much validation assistance as possible into the Javac compiler. To do this, add a new attribute named StackMapTable to the property table of the method body’s Code property. This attribute describes the state of the local change table and operation stack when all the base blocks of the method body start. During bytecode validation, The Java VIRTUAL machine does not need to regress to these states based on the validity of the program, but simply checks the records in the StackMapTable property for validity. This changes the type derivation of bytecode validation to type checking, saving a lot of validation time.

  • Ensure that the data type of the operand stack and the sequence of instruction codes work together at any time. For example, there is no such thing as “putting an int on the operand stack and loading it into the local variable table as long”
  • Ensure that no jump instruction jumps to a bytecode instruction outside the method body
  • .
6.2.2.4 4. Symbolic reference Verification

The validation behavior in the last phase occurs when the virtual machine converts symbolic references to direct references, which occurs in the third phase of the connection, the parse phase. Symbolic reference verification can be regarded as the matching verification of all kinds of information outside the class itself (various symbolic references in the constant pool). In plain English, that is, whether the class is missing or denied access to some external classes, methods, fields and other resources on which it depends. Notation to refer to the main purpose of the validation is to ensure that the parsing behavior can execute in normal, if not through reference symbol verification, the Java virtual machine will be thrown a Java lang. IncompatibleClassChangeError subclass exception, typical such as: Java. Lang. IllegalAccessError, Java. Lang. NoSuchFieldError, Java. Lang. NoSuchM ethodError, etc.

  • Whether a class can be found for a fully qualified name described by a string in a symbol reference
  • Whether a field descriptor for a method and methods and fields described by a simple name exist in the specified class
  • Whether the accessibility of classes, fields, and methods in symbolic references (private, protected, public, etc.) can be accessed as the previous class
  • .

6.2.3 preparation

The preparation phase is the phase where you formally allocate memory and set initial values for variables defined in a class (that is, static variables, modified by static).

public static int value = 123;
Copy the code

The value variable has an initial value of 0 instead of 123 after the preparation phase because no Java methods have been executed yet, and the putStatic instruction that assigns value to 123 is stored in the class constructor

() method after the program is compiled. So assigning value to 123 is not performed until the initialization phase of the class.

If a class field has a ConstantValue attribute in the field attribute table, the value of the variable will be initialized to the initial value specified by the ConstantValue attribute in the preparation phase, assuming that the above definition of the class variable value is changed to:

public static final int value = 123;
Copy the code

The javac will generate a ConstantValue attribute for value at compile time, and in preparation the vm will assign value to 123 based on the ConstantValue setting.

6.2.4 parsing

The parsing phase is the process by which the Java VIRTUAL machine replaces symbolic references in the constant pool with direct references, Symbolic references appear in class files as constants of the types CONSTANT_Class_info, CONSTANT_Fieldref_info, CONSTANT_Methodref_info, etc.

  • Symbolic reference: A symbolic reference describes the referenced object as a set of symbols, which can be any literal, as long as they are used unambiguously to the target. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily something that has been loaded into the virtual machine’s memory. The memory layout of various virtual machine implementations can vary, but the symbolic references they accept must all be consistent, because the literal form of symbolic references is explicitly defined in the Class file format of the Java Virtual Machine specification
  • Direct reference: A direct reference is a pointer that can point directly to a target, a relative offset, or a handle that can be indirectly located to the target. A direct reference is directly related to the memory layout implemented by the VIRTUAL machine. The direct reference translated from the same symbolic reference on different virtual machine instances will not be the same. If there is a direct reference, the target of the reference must already exist in the virtual machine’s memory.

Multiple parse requests for the same symbolic reference are common, and in addition to invokedynamic instructions, virtual machine implementations can cache the results of the first parse, for example by directly referencing the record in the constant pool at run time and marking the constant as parsed to avoid repeating the parse action.

For invokedynamic instructions, however, this rule does not hold. Just because you encounter a symbolic reference that has previously been resolved by an Invokedynamic instruction does not mean that the resolution result is valid for other Invokedynamic instructions. Because the InvokeDynamic instruction is intended for dynamic language support, the reference to it is called the “dynamic call point qualifier,” which means that the parsing action can’t take place until the program actually runs to the instruction. In contrast, the rest of the instructions that trigger parsing are static and can be parsed just after the load phase has been completed, before the code has started to execute.

The parse action is mainly for class or interface, field, class method, interface method, method type, method handle, and call point qualifier.

6.2.5 initialization

Class initialization is the final step in the class loading process, and the Java virtual machine starts executing the Java code written in the class, handing control to the application.

During the preparation phase, variables are handed over an initial zero value required by the system, while during the initialization phase, class variables and other resources are initialized according to a subjective plan made by the programmer through the program code. The initialization phase is the execution of the class constructor

() method. < Clinit >() is not written directly by programmers in Java code; it is an automatic artifact of the Javac compiler.

The < Clinit >() method is generated by combining the compiler’s automatic collection of assignment actions for all class variables in a class with statements in the static{} code block. The order in which the compiler collects the values is determined by the order in which the statements appear in the source file. The static block can only access variables defined before the static block. Variables defined after it can be assigned by the preceding static statement block, but cannot be accessed.

public class Test { 
    static {
        i = 0; // Copy variables to compile correctly
        System.out.print(i); // The compiler will say "illegal forward reference"
    }
    static int i = 1; 
}
Copy the code

Unlike the class’s constructor (the instance constructor

() method in the virtual machine perspective), the

() method does not explicitly call the parent class constructor, and the Java virtual machine guarantees that the

() method of the parent class completes execution before the

() method of the subclass executes. So the first

() method to be executed in the Java virtual machine must be of type java.lang.object.




Since the

() method of the parent class executes first, it means that static blocks defined in the parent class take precedence over variable assignments in the child class.

The

() method is not required for a class or interface, and the compiler may not generate the < Clinit >() method for a class that has no static blocks and therefore no assignments to variables.

Static blocks cannot be used in the interface, but there is still assignment for variable initialization, so the interface generates the < Clinit >() method just as the class does. But unlike classes, the

() method that executes the interface does not need to execute the

() method of the parent interface first, because the parent interface is initialized only when a variable defined in the parent interface is used. In addition, the implementation class of the interface does not execute the interface’s < Clinit >() method when initialized.

The Java virtual machine must ensure that a class’s < Clinit >() methods are locked and synchronized correctly in a multithreaded environment. If multiple threads initialize a class at the same time, only one thread will execute the class’s < Clinit >() methods, and all the other threads will block and wait. Until the active thread completes the

() method. You can use this feature for singletons. If a class has a long operation in its

() method, it can cause multiple threads to block.

6.3 Class loaders

The Java Virtual Machine design team intentionally implemented the class-loading action of “get the binary stream of a class by its fully qualified name” outside the Java Virtual machine so that the application could decide how to get the required classes. The code that does this is called a classloader.

If a Java virtual machine (JVM) uses two different class loaders to load the same class file, the two classes must be unequal (including the return of equals(), isInstance(), isAssignableFrom(), and isInstance()). That includes instanceof.) .

6.3.1 Parental delegation model

This section introduces the three-tier classloader and parent-delegate model for JDK 8 and earlier versions of Java

From the perspective of the Java virtual machine, there are only two different class loaders: the Bootstrap ClassLoader, which is part of the virtual machine; The other is all the other classloaders, which are independent of the virtual machine and all inherit from the abstract java.lang.classloader class.

Since JDK 1.2, Java has maintained a three-tier, parent-delegated class loading architecture.

  • Bootstrap Class Loader: This Class Loader is responsible for loading the store<JAVA HOME>\libCatalog, or by-XbootclasspathThe path specified by the parameter is stored in the specified path, and is recognized by the Java virtual machine loaded into the virtual machine memory
  • Extension Class Loader: is the Extension Class LoaderSun. Misc. Launcher$ExtClassLoaderIn the form of Java code. It’s responsible for loading<JAVA_HOME>\lib\extIn the directory, or byjava.ext.dirsAll class libraries in the path specified by the system variable
  • Application Class Loader: bysun.misc.Launcher$AppClassLoaderTo implement. Because the application ClassLoader is the return value of the getSystemClassLoader() method in the ClassLoader class, it is also called the system ClassLoader. It is responsible for loading all libraries on the user’s ClassPath, and developers can use the class loader directly in their code. If the application does not have its own custom class loader, this is generally the default class loader in the application

Class loader parent delegate model:

The working process of the parental delegation model: If a classloader receives a classload request, it does not try to load the class itself at first. Instead, it delegates the request to the parent classloader. This is true at every level of classloaders, so all load requests should eventually be sent to the top level of the starting classloader. Only when the parent loader reports that it cannot complete the load request (it did not find the required class in its search scope) will the child loader attempt to complete the load itself.

Why do we need the parental delegation model? The benefit is that classes in Java have a hierarchical relationship with priority along with their classloaders. For example, the java.lang.Object class, which is stored in rt.jar, is ultimately delegated to the boot class loader at the top of the model, so that the Object class is guaranteed to be the same class in each class loader environment of the program. On the other hand, instead of using the parent delegate model and having each class loader load it, if the user wrote a class named java.lang.Object and put it in the program’s CLassPath, the system would have multiple Object classes. The most basic behavior in the Java type system is not guaranteed, and the application becomes chaotic.

The parental delegation model is very important for the stable operation of Java programs, but its implementation is very simple. There are only about 10 lines of code to implement the parental delegation model, all in the loadClass() method of java.lang.classLoader.

protected synchronizedClass<? > loadClass(String name,boolean resolve) throws ClassNotFoundException
{
        // First, check whether the requested class has already been loadedClass<? > c = findLoadedClass(name);if (c == null) {
            try {
                if(parent ! =null) {
                    c = parent.loadClass(name, false);
                } else{ c = findBootstrapClassOrNull(name); }}catch (ClassNotFoundException e) {
                // If the parent class loader throws a ClassNotFoundException, the parent class loader cannot complete the load request
            }

            if (c == null) {
                // Call findClass() when the parent class loader fails to loadc = findClass(name); }}return c;
}
Copy the code

Core logic: First check to see if the requested load type has already been loaded. If not, call the loadClass() method of the parent loader. If the parent loader is empty, start class loader is used as the parent by default. If the parent class loader fails to load and throws a ClassNotFoundException, call your own findClass() method to try to load.

7. Vm bytecode execution engine

7.1 an overview of the

Execution engine is one of the core part of the Java virtual machine, virtual machine is a concept relative to the physical machine, both machine has the ability to code execution, the difference is the physical machine execution engine is directly built on a CPU, cache, instruction set of the operating system level, while the virtual machine execution engine is implemented by the software itself, Therefore, the instruction set and execution engine architecture can be customized independently of physical conditions, and the instruction set formats that are not supported by hardware can be executed.

In different virtual machine implementations, the execution engine usually has a choice between interpreted execution (through the interpreter) and compiled execution (native code execution through the just-in-time compiler), or both, when executing bytecode.

7.2 Runtime stack frame structure

Java virtual machines use methods as the basic execution unit, and stack frames are the data structures behind method calls and method execution supported by virtual machines. They are also the stack elements of virtual machine stacks in the data area when virtual machines are running. A stack frame stores information about a method’s local variogram, operand stack, dynamic linkage, and method return address. Each method from the call to the end of the process, corresponding to a stack frame in the virtual machine from the stack to the stack process.

When compiling the Java program source Code, the size of the local variable table needed in the stack frame and the depth of the operand stack needed are analyzed and calculated, and written into the Code property of the method table. The amount of memory allocated to a stack frame is not influenced by the program’s runtime variable data, but only by the program’s source code and the specific virtual machine implementation of the stack memory layout.

For the execution engine, in the active thread, only the method at the top of the stack is running, only the stack frame at the top of the stack is valid, this is called the current stack frame, and the method associated with this stack frame is called the current method. All bytecode instructions allowed by the execution engine operate only on the current stack frame. In the conceptual model, typical stack frame results are shown as follows:

7.2.1 Local variation scale

A local variable table is a storage space for a set of variable values used to store method parameters and local variables defined within a method. When the Java program is compiled as a class file, the max_locals data item of the method’s Code property determines the maximum size of the local variable table that the method needs to allocate. The capacity of the local variable meter is the smallest unit in the variable slot.

For 64-bit data types, the Java virtual Machine allocates two consecutive variable slots in high-aligned fashion. There are only two explicit 64-bit data types in the Java language: Long and double.

If an instance method is executed (a method not modified static), the variable slot in the zeroth index of the local variable table is used by default to pass a reference to the instance of the object to which the method belongs, and the implied argument can be asked in the method via the keyword this. The remaining parameters are arranged in the order of the parameter list, occupying the local variable slot starting from 1. After the parameter list is allocated, the remaining variable slots are allocated according to the order and scope of variables defined inside the method body.

As much as possible in order to save the stack frame cost of memory space, the local variables of the variables in the slot can be reused, variables defined in the method body, its scope is not necessarily method to cover the entire body, if the current bytecode PC counter value is beyond the scope of a variable, then the variables corresponding to the slot can reuse of by other variables.

7.2.2 Operand stack

The operand stack, often referred to as the operation stack, is a LIFO stack. As with the local variable table, the maximum depth of the operand stack is written into the max_stacks data item of the Code attribute at compile time. The javac compiler’s data flow analysis ensures that at no time during method execution is the operand stack deeper than the maximum value set in the max_stacks data item.

When a method is first executed, the operand stack of the method is empty. During the execution of the method, various bytecode instructions are written to and extracted from the operand stack, namely, out and onto the stack. For example, arithmetic operation is carried out by pushing the operand stack involved in the operation to the top of the stack and then calling the operation instruction. For example, when calling other methods, method parameters are passed through the operand stack.

The data types of the elements in the operand stack must exactly match the sequence of bytecode instructions, which the compiler must ensure when compiling the program code, and which must be verified again during the data flow analysis during the class validation phase.

The interpreted execution engine of the Java virtual machine is called a stack-based execution engine, and the stack inside is the operand stack.

7.2.3 Dynamic Connection

The process of parsing and joining at run time by coincidence and direct references is called dynamic joining. When a method calls another method, or when a class uses a member variable of another class, it needs to know its name. Symbolic references are the equivalent of names, and the names of these callers are stored in Java bytecode files. Once the name is known, Java programs must rely on the name (symbolic reference) to find the corresponding class and method when they run. In this case, it is necessary to parse into the corresponding direct reference, and use the direct reference to find exactly.

Each stack frame contains a reference to the method that the stack frame belongs to in the runtime constant pool, and this reference is held to support dynamic concatenation during method calls. The class file has a large number of symbolic references in the constant pool, and the method invocation instructions in the bytecode take symbolic references to methods in the constant pool as arguments. Some of these symbolic references are converted to direct references during class loading or the first time they are used, which is called static resolution. The other part, which is converted to a direct reference during each run, is called the dynamic join.

7.2.4 Method Returns the address

Once a method is executed, there are only two ways to exit the method. The first way is met an arbitrary execution engine return bytecode instruction, by this time there may be a return value passed to the upper method caller (the calling method from the known as the caller or tone method), the method has a return value and return value type will be decided according to meet what kind of method to return instructions, This way of exiting a method is called normal call completion.

Another exit is when an exception is encountered during the execution of a method that is not handled properly within the method body. If no matching exception handler is found in the exception table of this method, the method will exit, either internally generated by the Java virtual machine or in code using athrow bytecode instructions. This method is called “exception call completion.”

Either way, after the method exits, it must return to where it was when the original method was called before the program can continue. When the method returns, it may need to store some information in the stack frame to restore the execution state of its upper calling method. In general, when a method exits normally, the value of the PC counter of the calling method can be used as the return address, and it is likely that this counter value will be stored in the stack frame. When a method exits with an exception, the return address is determined by the exception handler table, and this information is generally not stored in the stack frame.

Method of process in fact is equal to the current frame out of the stack, so exit may perform operations are: to restore the upper method local variables and the operand stack, the return value (if any) into the caller stack frame of the operand stack, adjust the PC counter value to point to the method call instructions behind an instruction, etc.

7.3 Method Invocation

Method invocation is not equivalent to the code in the method being executed. The only task in the method invocation stage is to determine the version of the method being invoked (that is, which method to call). The specific running process inside the method is not involved for the time being. Making method calls is one of the most common and frequent operations you can do while a program is running. However, as mentioned earlier, the compilation of class files does not include the concatenation steps of traditional programming languages, and all method calls stored in class files are symbolic references, rather than the entry addresses (direct references) in the actual runtime memory layout of the method. This feature gives Java powerful dynamic extension capabilities, but it also makes Java method calls relatively complex, with some calls requiring direct references to target methods to be determined during class loading or even at runtime.

7.3.1 parsing

The target method of all method calls is a symbolic reference in a constant pool in the class file. During the parsing phase of the class load, some symbolic references are converted into direct references. This resolution can be valid only if: Methods have a determinable invocation version before the program actually runs, and the invocation version of this method is immutable at runtime. In other words, the call target is determined as soon as the program code is written and the compiler compiles. Calls to such methods are called parsing.

In the Java language in line with the “immutable” compile time, the run time this requirement, the method of static methods and main private method two kinds, the former directly associated with the type, the latter cannot be accessed outside, which determines the characteristics of two kinds of methods are impossible through inheritance or other ways to rewrite the other version, They are therefore suitable for parsing during class loading.

Different types of methods are called, and different instructions are designed in the bytecode instruction set. The Java virtual machine supports the following five methods to invoke bytecode instructions:

  • Invokestatic: used to invokestatic methods
  • Invokespecial: Invokes the instance constructor<init>()Method, private method, and method in parent class
  • Invokevirtual: Calls all virtual methods
  • Invokeinterface: Invokes the interface method, which determines an object implementing the interface at run time
  • Invokedynamic: The method referenced by the call point qualifier is dynamically resolved at run time and then executed. The dispatching logic of the previous four invokes is fixed inside the Java VIRTUAL machine, whereas the dispatching logic of the InvokeDynamic instruction is determined by the user-specified bootstrapped method

As long as the method can be invoked by invokestatic and Invokespecial instructions, the unique invocation version can be determined in the parsing stage. In Java language, methods that meet this condition include static method, private method, instance constructor and parent method. Combined with the final modified method (although it is called using the Invokevirtual directive), these five method calls resolve symbolic references to direct references to the method at class load time. These methods are collectively referred to as non-virtual methods, while other methods are referred to as virtual methods.

The parse call is always a static process, fully determined at compile time, and all symbolic references involved are turned into explicit direct references during the parse phase of the class load without having to be deferred to runtime.

7.3.2 dispatch

The dispatch invocation process described in this section will reveal some of the most basic manifestations of polymorphism, such as how overloading and overwriting are implemented in the Java virtual machine. The implementation here, of course, is not how to write syntactically, but how the virtual machine correctly determines the target method.

7.3.2.1 Static Dispatch

Let’s start with some code:

/** * method static dispatch demo */
public class StaticDispatch {
    static abstract class Human {}static class Man extends Human {}static class Woman extends Human {}// the following two methods are not used, so I recommend that I delete them safely
    public void sayHello(Human guy) {
        System.out.println("hello,guy!");
    }

    public void sayHello(Man guy) {
        System.out.println("hello,gentleman!");
    }

    public void sayHello(Woman guy) {
        System.out.println("hello,lady!");
    }

    public static void main(String[] args) {
        Human man = new Man();
        Human woman = new Woman();
        StaticDispatch sr = newStaticDispatch(); sr.sayHello(man); sr.sayHello(woman); }}Copy the code

In the code above, Human is the static type (or appearance type), and Man is the actual type (or runtime type) of the variable. The difference is that the static type changes only when used. The static type of the variable itself is not changed, and the final static type is known to the compiler. The result of the actual type change is determined at run time, and the compiler does not know what the actual type of an object is when it compiles the program.

// The actual type changes
Human human = (new Random()).nextBoolean() ? new Man() : new Woman();

// Static type changes make it clear at compile time whether the transition is Man or Woman
sr.sayHello((Man)human)
sr.sayHello((Woman)human)
Copy the code

All dispatch actions that rely on static types to determine the version of a method’s execution are called static dispatch. The most typical application of static dispatch is method overloading. Static dispatch occurs at compile time, so the action to determine static dispatch is not actually performed by the virtual machine.

Override method matching priority:

public class OverLoad {
    public static void main(String[] args) {
        sayHello('c');
    }

    public static void sayHello(char c) {
        System.out.println("hello char");
    }

    public static void sayHello(int i) {
        System.out.println("hello int");
    }

    public static void sayHello(long l) {
        System.out.println("hello long");
    }

    public static void sayHello(float f) {
        System.out.println("hello float");
    }

    public static void sayHello(double d) {
        System.out.println("hello double");
    }


    public static void sayHello(Serializable s) {
        System.out.println("hello serializable");
    }


    public static void sayHello(Object o) {
        System.out.println("hello object");
    }

    public static void sayHello(char. chars) {
        System.out.println("hello chars"); }}Copy the code

Char > int > Long > float > double > Serializable > Object > variable length parameters

Char does not match overloads of byte and short because it is not safe to convert char to byte or short. Second, the overloading priority for variable length arguments is lowest.

7.3.2.2 Dynamic Dispatch

Dynamic dispatch is closely related to rewriting, another important manifestation of polymorphism.

public class DynamicDispatch {
    static abstract class Human {
        protected abstract void sayHello(a);
    }

    static class Man extends Human {
        @Override
        protected void sayHello(a) {
            System.out.println("man say hello"); }}static class Woman extends Human {
        @Override
        protected void sayHello(a) {
            System.out.println("woman say hello"); }}public static void main(String[] args) {
        Human man = new Man();
        Human woman = new Woman();
        man.sayHello();
        woman.sayHello();
        man = newWoman(); man.sayHello(); }}// Run the result
man say hello
woman say hello
woman say hello
Copy the code

The invokevirtual instruction’s runtime resolution process is roughly divided into the following steps, corresponding to the polymorphic lookup process of the Invokevirtual instruction:

  1. Find the actual type of the object pointed to by the first element at the top of the operand stack, and call it C
  2. If a method is found in type C that matches both the descriptor and the simple name in the constant, access is checked. If it passes, a direct reference to the method is returned, and the search process ends. If no, returnjava.lang.IllegalAccessErrorabnormal
  3. Otherwise, search and verify the second step of each parent class of C from bottom to top according to the inheritance relationship
  4. Throws if no suitable method is foundjava.lang.AbstractMethodErrorabnormal

Take a look at the bytecode of the main () method in the code above

public static void main(java.lang.String[]);
    Code:
        Stack=2, Locals=3, Args_size=1
        0: new #16; //class org/fenixsoft/polymorphic/DynamicDispatch$Man
        3: dup
        4: invokespecial #18; //Method org/fenixsoft/polymorphic/Dynamic Dispatch$Man."<init>":()V
        7: astore_1
        8: new #19; //class org/fenixsoft/polymorphic/DynamicDispatch$Woman
        11: dup
        12: invokespecial #21; //Method org/fenixsoft/polymorphic/DynamicDispatch$Woman."<init>":()V
        15: astore_2
        16: aload_1
        17: invokevirtual #22; //Method org/fenixsoft/polymorphic/Dynamic Dispatch$Human.sayHello:()V
        20: aload_2
        21: invokevirtual #22; //Method org/fenixsoft/polymorphic/Dynamic Dispatch$Human.sayHello:()V
        24: new #19; //class org/fenixsoft/polymorphic/DynamicDispatch$Woman
        27: dup
        28: invokespecial #21; //Method org/fenixsoft/polymorphic/DynamicDispatch$Woman."<init>":()V
        31: astore_1
        32: aload_1
        33: invokevirtual #22; //Method org/fenixsoft/polymorphic/Dynamic Dispatch$Human.sayHello:()V
        36: return
Copy the code

Since the first step in the invokevirtual directive execution is to determine the actual type of recipient at runtime, the Invokevirtual directive in both calls resolves symbolic references to class methods in the constant pool into different direct references, which is the essence of method rewriting in the Java language. This dispatch process, which determines the version of method execution at run time based on the actual type, is called dynamic dispatch.

Fields never participate in polymorphism. When a subclass declares a field with the same name as its parent, the subclass’s field overshadowing the parent’s field, even though both fields exist in the subclass’s memory.

7.3.2.3 Single dispatch versus Multiple Dispatch

The receiver of a method and its parameters are collectively called the arguments of a method. Dispatches can be divided into single dispatches and multiple dispatches depending on how many cases the dispatches are based on. Single dispatch selects a target method based on one cell, while multiple dispatch selects a target method based on more than one cell.

The Java language is a static multi-dispatch, dynamic single-dispatch language.

7.3.2.4 Implementation of dynamic VM Dispatch

Dynamic dispatch is do very frequent, and the dynamic dispatch method version when selecting process needs to run in the receiver type of method metadata search suitable target method, therefore, the Java virtual machine implementation based on considerations of performance, real runtime does not generally so frequently to repeatedly search type metadata. In this case, a basic and common optimization approach is to create a virtual method table in the method area for the type, using virtual method table indexes instead of metadata lookups to improve performance.

The virtual method table stores the actual entry address of each method. If a method is not overridden in a subclass, the address entry in the virtual method table of the subclass is the same as the address entry of the same method in the parent class, pointing to the implementation entry of the parent class. If a subclass overrides this method, the address in the subclass’s virtual method table is replaced with the entry address pointing to the subclass’s implementation version.

The method table is initialized during the connection phase of the class load. After the initial variable values of the class are prepared, the virtual machine also initializes the method table of the class.

7.4 Dynamically typed language support

JDK7 has a new bytecode instruction: Invokedynamic. This added directive is one of the improvements made to dynamically typed language support.

7.4.1 Dynamically typed languages

What is a dynamically typed language? The key feature of dynamically typed languages is that the main process of type checking takes place at run time rather than compile time.

Languages that perform type checking at compile time, such as C++ and Java, are the most commonly used statically typed languages.

Statically typed languages can determine variable types at compile time. The most significant benefit of a statically typed language is that the compiler can provide comprehensive and rigorous type checking so that potential problems related to data types can be found at code time, which benefits stability and makes it easier for projects to scale up. Dynamically typed languages, on the other hand, determine the type at runtime, which can provide great flexibility for developers. Some functions that would take a lot of bloat code to implement in statically typed languages can be very clear and concise, which means improved development efficiency.

7.4.2 Java and dynamic typing

In the bytecode instruction set of JDK7, the first parameter of the four method invocation instructions (Invokevirtual, Invokespecial, Invokestatic, invokeInterface) is a symbolic reference to the invoked method. But symbolic references to methods are made at compile time, whereas dynamically typed languages can only determine the recipient of a method at run time. What can I do?

The Invokedynamic directive and java.lang. Invoke package appear.

7.4.3 Java. Lang. Invoke the package

The java.lang.invoke package added to JDK 7 provides a new mechanism for dynamically determining target methods, called method handles. Sort of like C++ function Pointers.

After having method handles, the Java language can also have tools like method aliases for function Pointers.

This is done in Java code using MethodHandle. MethodHandle is somewhat similar to Reflection in its usage and effects. But there are a few differences:

  • Both Reflection and MethodHandle mechanisms essentially simulate method calls, but Reflection simulates method calls at the Java code level, while MethodHandle simulates method calls at the bytecode level
  • The Reflection of thejava.lang.reflect.MethodObject is far more than in the MethodHandle mechanismjava.lang.invoke.MethodHandleObject contains much more information
  • Since MethodHandle is a simulation of bytecode method instruction calls, it is theoretically possible for virtual machines to support similar optimizations (such as method inlining) in MethodHandle (a work in progress). It is almost impossible to implement various call-point optimization measures directly by calling methods through reflection

7.4.4 invokedynamic instruction

In a sense, invokedynamic and MethodHandle serve the same function. They both solve the problem that the original four “Invoke *” instruction method dispatch rules are completely fixed in the VIRTUAL machine, transferring the decision of how to find the target method from the virtual machine to the specific user code. Let users have a higher degree of freedom.

Each location containing an Invokedynamic instruction is called a dynamic call point, and the first argument to this instruction is no longer the CONSTANT_Methodref_info constant that represents the method symbol reference. Instead, it is the CONSTANT_InvokeDynamic_info constant that was added in JDK 7, and from this new constant you get three pieces of information: bootstrap method, method type, and name.

7.5 Stack – based bytecode interpretation execution engine

How does the virtual machine execute the bytecode instructions in the method?

Many virtual machine execution engines have a choice between interpreted execution (via the interpreter) and compiler execution (via the just-in-time compiler that produces native code execution) when executing Java code. In this section, we examine how the execution engine of a Conceptual Java virtual machine interprets the execution of bytecode.

7.5.1 Explaining The Execution

Both interpretation and compilation, most of the object code for the program code to physical or virtual function is executed before the instruction set, all need through the various steps in this middle branch, is the process of interpretation, and at the bottom of the branch, is the traditional compilation principle of program code to the target machine code generation process.

Java generally follows this line of thinking in line with modern classical compilation principles, parsing and parsing program source code before execution, converting the source code into an abstract syntax tree.

In The Java language, the Javac compiler completes the process of program code through lexical analysis, syntax analysis to abstract syntax tree, and then traverses the syntax tree to generate linear bytecode instruction stream. Because this part of the action takes place outside the Java virtual machine and the interpreter is inside the virtual machine, compilation of Java programs is a semi-independent implementation.

7.5.2 Stack based instruction set versus register based instruction set

The stream of bytecode instructions that the Javac compiler outputs is basically a stack-based instruction set architecture. Most of the instructions in the bytecode instruction stream are zero-address instructions, which rely on the operand stack to work. In contrast, another commonly used instruction set architecture is register-based instruction set, the most typical of which is the x86 two-address instruction set. Now the instruction set directly supported by the physical hardware of the mainstream PC is x86, and these instructions rely on registers to work.

iconst_1
iconst_1
iadd
istore_0
Copy the code

After the two iconst_1 instructions successively push the two constants 1 onto the stack, the iADD instruction pushes the top two values off the stack, adds them together, and then puts the results back on the top of the stack. Finally, istore_0 puts the top value into the 0th variable slot of the local variable table.

Java bytecode instruction stream instructions are usually parameterless, using data in the operand stack as the input of the operation, and the operation result of the instruction is also stored in the operand stack.

The main advantage of stack-based instruction sets is portability, because registers are provided directly by the hardware, and programs that rely directly on these hardware registers are inevitably constrained by the hardware. If the stack architecture instruction set is used, the user program will not use these registers directly, and it is easier to implement by leaving it up to the virtual machine implementation to put some of the most frequently accessed data (program counters, top of the stack cache, etc.) into registers for maximum performance. The instruction set of stack architecture has some other advantages, such as the code is relatively compact (each byte in the bytecode corresponds to one instruction, while the multi-address instruction set also needs to store parameters), the compiler implementation is simpler (there is no need to worry about space allocation, the required space is operated on the stack) and so on.

The main disadvantage of stack instruction sets is that the execution speed is relatively slow in theory, which is also proved by the fact that the instruction sets of all major physical machines are register architectures. However, the execution speed here is limited to the state of interpretation execution, if the real-time compiler output into the physical machine assembly instruction stream, it does not matter what kind of instruction set architecture virtual machine.

When interpreting execution, the stack architecture instruction set is compact, but the number of instructions required to perform the same function is generally greater than the number of instructions required in the register architecture, because the on-off and off-stack operations themselves generate a large number of instructions. More importantly, the stack is implemented in memory, and frequent stack access means frequent memory access, which is always a bottleneck for execution speed relative to the processor. Although a virtual machine can optimise the stack cache by mapping the most frequently used operations into registers to avoid direct memory access, this is an optimisation rather than a solution to the underlying problem. Therefore, due to the number of instructions and memory access, the execution speed of stack architecture instruction set is relatively slow.

data

  • Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 3)
  • A net of “class” initialization instantiation knowledge points
  • Syntax sugar in Java
  • JVM Knowledge