Java Virtual Machine series 1: Learn about the JVM architecture and runtime data areas

Java Virtual Machine series 2: Garbage collection mechanism details, GIF to help you understand

Java Virtual Machine series 3: Garbage collector, ship new ZGC and Shenandoah heard of

preface

I used to write my blog in a casual way, and the topic was very casual. I just wrote what I thought and what I was interested in. Although writing is free and easy, it also brings some problems. Every time after writing one article, I have to struggle with what to write next. It seems that too many choices are not good, and more importantly, the contents that are not systematic are not friendly to readers. So future blogs will try to write in series, but occasionally interspersed with other content. I will focus my blog on the JVM (Java Virtual Machine) and JUC (Java Util Concurrent) for the next few days, with a series of introductions to the Java Virtual Machine and Java concurrent programming.

Knowledge of the JVM is a basic requirement for Java developers, and it is a natural choice for Java programmers to interview today. Many of you, like me, have been so obsessed with CRUD for so long that you have forgotten to learn more about the underlying, fundamental things that really determine how far you can go on this path. Let’s take a closer look at the seemingly mysterious JVM. The JVM as a whole is pretty extensive, and I’ll cover the most important ones, but if you have the time and energy, I recommend you take a look at Understanding the Java Virtual Machine. This series of articles will also quote a lot of the book and add my own understanding, if you stick to it, I believe you will have a great harvest.

A quick introduction to the JVM, which is part of the JDK, The Java Virtual Machine Specification is a separate set of specifications parallel to The Java Language Specification, Different companies have different implementations of it (similar to how an interface is implemented by different classes), with HotSpot, JRockit, and J9 among the best-known Java virtual machine implementations.

This two-part article introduces the overall ARCHITECTURE of the JVM and the runtime data area, both based on the Java Virtual Machine Specification and not specific to any particular JVM implementation.

Java Virtual Machine (JVM) Architecture

In my opinion, no matter what kind of knowledge or technology to learn, the first thing to do is to understand it from the overall situation, so as to avoid blind people feeling the elephant, the situation of twice the result. Since learning about the JVM requires understanding its overall architecture, I have drawn a DIAGRAM of the JVM architecture to help you understand it.

Java virtual rack composition

To the JVM also don’t know classmates for the first time to see this piece of fancy figure would be a face of meng, don’t be afraid, in fact, we just need to focus on understanding and master some of the key (as well as interview), such as the runtime data area, the garbage collector, memory allocation strategy and class loading mechanism, etc., can also learn the class file structure, The rest can be understood a little bit. Since the purpose of this article is to give you an overview of the JVM architecture, let’s take a look at the various parts of the figure.

1.1 Class files (bytecode files)

Java is known as “write once, run everywhere” thanks to the combination of virtual machines and Class files. Programmers don’t have to adapt to different operating systems. We all know that Java code is written to execute in Class files, and Class files can be executed on JVMS on any operating system, making it “platform independent.” Here is a simple HelloWorld program and its corresponding Class file.

HelloWorld program and its compiled Class file

Thanks to Class files, the JVM can also be “language independent,” meaning that not only Java programs can run on the JVM, but many other languages, such as Kotlin, which has recently become popular among Android developers, Other languages, such as Scala and Groovy, are built on the JVM platform, and their code can be compiled into Class files and run on the JVM.

The platform independence and language independence provided by the JVM

1.2 ClassLoader Subsystem

To execute a Class file, you need to load it into memory. This work is done by the ClassLoader. The system provides three kinds of loaders. Bootstrap ClassLoader, Extension ClassLoader, and Application ClassLoader, if necessary, We can also add custom class loaders. The class loading process is as follows:

Class loading process

The class loading process is divided into three phases: load, connect, and initialize, and the connect phase is divided into three phases: verify, prepare, and parse (the detailed class loading mechanism will be described in a subsequent article).

1.3 Java VM Runtime Data Area

This part has many contents and will be introduced separately in the second part of this article.

1.4 Execution Engine

After the bytecode is loaded into the runtime data area, it is read and executed by the execution engine, which consists of the following modules:

  • Interpreter: Believe that everyone heard long before “computer only know 0 and 1” this sentence, today, the computer still only know 1 s and 0 s, so any programming language code eventually will be converted into machine code (the binary code) to perform, Java is not exceptional also, and the interpreter’s work is to compile the bytecode into machine code, and then you can perform. This is why Java is known as an interpreted language, and it is because of its interpret-and-execute nature that Java programs execute more slowly than compiled languages such as C++.
  • Just-in-time Compiler (JIT Compiler) : To compensate for the speed disadvantage of interpreting execution, the JVM introduced the just-in-time compiler, which compiles hot code, such as repeatedly called methods and loops, into machine code and stores it in the code cache so that it can be used later without having to reinterpret execution. It can improve the efficiency of program operation.
  • The Garbage Collector: The Java programmer doesn’t have to manually free memory because of the Garbage Collector, which is especially important in the JVM and will be covered in several articles.

1.5 Java Native Interface (JNI)

If you read JDK source code often, you’ll have noticed that native is a keyword that modifies a method without a method body because it calls the computer’s native method library (usually C or C++ code). JDK source code for many classes of methods, especially some need to operate on computer hardware methods, all call the local method library, after all, with the hardware or C and C++ more convenient, such as the following methods:

// This is the currentThread method of the Thread class, which is used to get the Thread currently executing
public static native Thread currentThread(a);

// This is the open0 method of the FileInputStream class, which opens the specified file
private native void open0(String name) throws FileNotFoundException;
Copy the code

1.6 Native Method Library

It is in this library that the local library interface calls objects, typically C or C++ language code that is local to the computer.

Second, Java virtual machine running data area

The runtime data area of the Java virtual machine is one of the most important areas to be familiar with because it is closely related to the program you are writing. Stackoverflowerrors and OutofMemoryErrors are almost always from this area. I say “almost” because OutofMemoryErrors are also thrown when the native direct memory runs out. As shown in the figure below, the program counter, Java virtual machine stack, and local method stack are thread private, while the heap and method area, which in turn contains the run-time constant pool, are thread shared. Let’s take a closer look at this section (note: this section is from Understanding Java Virtual Machines).

Java Virtual machine run-time data area

2.1 Program Counter Register

In case you are confused, the following paragraph format is the Markdown quotation format. It is usually used to refer to someone else’s article or other content.

The Program Counter Register is a small memory space that can be thought of as a line number indicator of the bytecode being executed by the current thread. In the concept of Java virtual machine, bytecode interpreter works by changing the value of this counter to select the next bytecode instruction to be executed. It is an indicator of program control flow, and basic functions such as branch, loop, jump, exception handling and thread recovery depend on this counter.

Because multithreading in the Java VIRTUAL machine is implemented by the way threads alternate and allocate processor execution time, at any given time, one processor (or core for multi-core processors) will execute instructions in only one thread. Therefore, in order to recover to the correct execution position after thread switching, each thread needs to have an independent program counter, which is not affected by each other and stored independently. We call this kind of memory area “thread private” memory.

If the thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed. If the Native method is being executed, this counter value should be null. This memory region is the only one where the Java Virtual Machine Specification does not specify any OutOfMemoryError cases.

Here refers to the deep understanding of Java virtual machine, the contents of the book, actually it is easy to understand, the role of the program counter is to preserve the thread of execution status, citation said in the third paragraph, “if the thread is executing a Java method, the counter records is the address of the virtual machine bytecode instructions are being executed”, This address is where the bytecode is executed. The Java multithreaded context switch is aided by a program counter, which reads the thread execution state when the CPU switches from one thread to another to restore the scene. If Native methods are executed, this counter is Undefined. Because the native method executes C/C++ code and runs directly on the native platform, there is no concept of Java virtual machine and naturally cannot store the byte code instruction address. In this case, the PC register of the native CPU can only be used to record the running state of the code.

2.2 Java Virtual Machine Stacks

Like program counters, the Java Virtual Machine Stack is thread-private and has the same life cycle as a thread. The virtual machine Stack describes the threaded memory model of Java method execution: When each method is executed, the Java VIRTUAL machine synchronously creates a Stack Frame to store information about local variables, operand stacks, dynamic connections, method exits, and so on. The process of each method being called and executed corresponds to the process of a stack frame moving from the virtual machine stack to the virtual machine stack.

The local variable table stores various Java virtual machine basic data types (Boolean, byte, CHAR, short, int, float, long, double) and object references (reference type, which is not the same as the object itself) known at compile time. May be a reference pointer to the object’s starting address, a handle representing the object or other location associated with the object) and the returnAddress type (which points to the address of a bytecode instruction).

The storage space of these data types in the local variable table is represented by local variable slots, where 64-bit long and double data types occupy two variable slots and the rest occupy only one. The memory space required for the local variable table is allocated at compile time. When entering a method, how much local variable space the method needs to allocate in the stack frame is completely determined, and the size of the local variable table does not change during the method run. Please note that “size” refers to the number of variable slots, and how much memory the virtual machine really uses to implement a variable slot (for example, 32 bits, 64 bits, or more per variable slot) is entirely up to the individual virtual machine implementation.

In the Java Virtual Machine Specification, two types of exceptions are specified for this memory area: a StackOverflowError is thrown if the stack depth of a thread request is greater than the depth allowed by the virtual machine; If the Java virtual machine stack can expand dynamically, OutOfMemoryError will be raised when sufficient memory cannot be allocated during stack expansion.

The internal structure of the Java virtual machine stack is shown below:

Java virtual machine stack

2.2.1 Local variation scale

A local variable table is an area where method parameters and local variables are stored. Local variables have no preparation phase and must be explicitly initialized. If the method is non-static, the instance reference of the object to which the method belongs is stored at the index[0] position, with a reference variable of 4 bytes, followed by parameters and local variables.

2.2.2 Operand stack

The operand stack is a bucket stack with an empty initial state. During method execution, various instructions write and extract information onto the stack. The JVM’s execution engine is a stack-based execution engine, where the stack refers to the operand stack. The bytecode instruction set is defined based on the stack type, and the depth of the stack is in the stack property of the method meta-information. The difference between I ++ and ++ I is used to help understand the operand stack:

I ++ and ++ I:

  1. I ++ : takes I from the local variator and pushes it onto the operation stack. Then increments I in the local variator by 1 and uses the top value of the operation stack. Finally, the top value of the operation stack is used to update the local variator so that the thread reads the value from the operation stack before it increments.
  2. ++ I: first increment the I of the local variable scale by 1, then take out and push into the operation stack, and then take out the top value of the operation stack for use. Finally, update the local variable scale with the top value of the stack, and the thread reads the value after increment from the operation stack.

I ++ is not atomic, and even volatile modification is not thread-safe, because it is possible for I to be removed from the local variable table (memory), pushed onto the stack (register), incrementally incrementally in the stack, and updated with the top of the stack (register update written into memory) in three steps. Volatile guarantees visibility and ensures that each read from the local variable table is the latest value, but it is possible that these three steps may be interrupted by three steps from another thread, resulting in a data overwrite problem that causes the value of I to be smaller than expected.

2.2.3 Dynamic Connection

Each stack frame contains a reference to the current method in the constant pool to support dynamic concatenation during method calls.

2.2.4 Method exit

There are two exits when a method executes:

  1. Normal exit, that is, normal execution of RETURN bytecode instructions to any method, such as RETURN, IRETURN, ARETURN, etc.
  2. Abnormal exit.

In any exit case, the method is returned to where it was currently called. Method exit is equivalent to eject the current stack frame. There are three possible ways to exit:

  1. The return value is pushed into the upper call stack frame.
  2. The exception message is thrown to a stack frame that can handle it.
  3. The program counter points to the next instruction after the method call.

2.3 Native Method Stacks

The role of the Native method stack is very similar to that of the virtual machine stack, except that the virtual machine stack performs Java methods (that is, bytecode) services for the virtual machine, while the Native method stack serves the Native methods used by the virtual machine.

The Java Virtual Machine Specification does not specify any language, usage, or data structure for methods in the local method stack, so specific virtual machines are free to implement it as needed. Some Java virtual machines (such as hot-Spot virtual machines) simply combine the local method stack with the virtual machine stack. Like the virtual stack, the local method stack throws StackOverflowError and OutOfMemoryError, respectively, when the stack depth overflows or the stack extension fails.

This part is easier to understand, so I won’t do the analysis.

2.4 Java Heap (Heap)

For Java applications, the Java Heap is the largest chunk of memory managed by the virtual machine. The Java heap is an area of memory that is shared by all threads and is created when the virtual machine is started. The sole purpose of this memory area is to hold object instances, and “almost” all object instances in the Java world are allocated memory here. The Java heap is an area of memory managed by the garbage collector, and therefore is often referred to as the “GC heap.”

According to the Java Virtual Machine Specification, the Java heap can be in a physically discontinuous memory space, but logically it should be considered contiguous, just as we use disk space to store files without requiring each file to be contiguous. But for large objects (typically groups of objects), most virtual machine implementations will most likely require contiguous memory space for simplicity of implementation and storage efficiency.

The Java heap can be implemented as either fixed size or extensible, but most current Java virtual machines are implemented as extensible (with the -xmx and -xms parameters). The Java virtual machine will throw an OutOfMemoryError if there is no memory in the Java heap to complete the instance allocation and the heap can no longer be extended.

The only purpose of the Java heap is to hold object instances, and this is the area of memory that the garbage collector pays the most attention to because most object instances have a short lifetime, such as instances created inside a method that have no value after the method is executed, so this area is the most cost-effective area for garbage collection. See a follow-up article for more details on garbage collection.

2.5 Method Area

The Method Area, like the Java heap, is an Area of memory shared by threads that stores data such as type information that has been loaded by the virtual machine, constants, static variables, and code caches compiled by the just-in-time compiler. Although the Java Virtual Machine Specification describes the method area as a logical part of the Heap, it has an alias called “non-heap” to distinguish it from the Java Heap.

The concept of “Permanent Generation” has to be mentioned, especially before JDK 8, many Java programmers were used to developing and deploying applications on the HotSpot virtual machine. Many preferred to refer to the method area as “Permanent Generation”. Or confuse the two. The two are not essentially equivalent, as it was only the HotSpot VIRTUAL machine design team that chose to extend the generational design of the collector to the method area, or to implement the method area using persistent generations, which allowed the HotSpot garbage collector to manage this part of memory as well as the Java heap. Save the effort of writing memory-management code specifically for the method area. However, for other virtual machine implementations, such as BEA JRockit, IBM J9, etc., there is no concept of permanent generation. In principle, the implementation method area belongs to the implementation details of virtual machines and is not governed by the Java Virtual Machine Specification and does not require uniformity. In retrospect, however, the decision to implement the method area with persistent generation was not a good idea. This design made Java applications more prone to memory overflow problems. (Persistent generation has an upper limit of -xx :M axPermSize, which has a default size if not set. J9 and JRockit are fine as long as they don’t touch the upper limit of available memory for a process, such as the 4GB limit on 32-bit systems), and there are very few methods (such as String :: Intern ()) that can behave differently on different VMS due to persistent generation. When Oracle acquired ownership of JRockit through its acquisition of BEA and was ready to port the best features of JRockit, such as the Java Mission Control management tool, to the HotSpot virtual machine, However, they face many difficulties because of their differences in method area implementation. Considering the future development of HotSpot, in JDK 6 the HotSpot development team planned to abandon the permanent generation and gradually adopt Native Memory to implement the method area. In JDK 7 HotSpot, In JDK 8, the concept of permanent generation is completely abandoned and replaced by Metaspace, which is implemented in local memory like JRockit and J9. Move all of the remaining content of the persistent generation in JDK 7 (mainly type information) into the meta space.

The Java Virtual Machine Specification is very relaxed about method areas, and in addition to the fact that the Java heap does not require continuous memory and can be either fixed size or extensible, you can even choose not to implement garbage collection. Garbage collection is relatively rare in this area, but it is not the case that data enters the method area as “permanent” as the name of the permanent generation. The target of memory reclamation in this area is mainly for constant pool reclamation and type unloading. Generally speaking, the reclamation effect in this area is not satisfactory, especially for type unloading, but the reclamation in this part of the area is indeed necessary sometimes.

According to the Java Virtual Machine Specification, OutOfMemoryError is thrown if the method area cannot meet the new memory allocation requirements.

This section provides a comprehensive introduction to method areas, and it is important not to confuse method areas with persistent generations, which have not been used since JDK 8.

2.6 Runtime Constant Pool

The Runtime Constant Pool is part of the method area. The Constant Pool Table is used to store various literals and symbolic references generated at compile time. This part of the Table is stored in the runtime Constant Pool of the method area after the Class is loaded.

Since the runtime constant pool is part of the method area and is naturally limited by the method area memory, OutOfMemoryError is thrown when the constant pool can no longer claim memory.

Constant pool is to avoid frequent creation and destruction of objects that affect system performance, and it realizes object sharing.

conclusion

This article, the first in the Java Virtual Machine series, introduces you to the overall architecture and runtime data area of the Java Virtual Machine, and I believe you have an overall understanding of the JVM. But that’s not enough. There’s more to the JVM and more details to explore, so stay tuned for future articles.

Finally, reference articles and literature:

  • Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices by Chi-Ming Chow (highly recommended)
  • Java Virtual Machine specification
  • Java memory regions (runtime data regions) and memory models (JMM)
  • Architecture of JVM Java Virtual Machine
  • The difference between compiled and interpreted
  • The JVM consists of three – bytecode, bytecode instructions, and JIT compiled execution
  • Understand constant pools inside Java