Don't say you don't know how to use JVM. After reading this article, you can talk with your interviewer for half an hour.

preface

Of course, if you read this article and got something out of it, give me a thumbs up. Your thumbs up is my biggest encouragement. Can add a follow by the way, go home do not get lost, irregularly update the blog ~ ~

Zhou Zhiming went over and over his book “Understanding the JAVA Virtual Machine” and finally found the courage to write a blog about the JVM here!! Now, I want to start my understanding of the record here, and all friends to share!!

I’m sure those of you who clicked on this article know what a JVM is, right? What, don’t know? Well, I think you’ll get the idea by looking at Wikipedia: The Java Virtual Machine – Wikipedia, the free encyclopedia

However, as a thoughtful college student, I also summed up the following three points:

A virtual machine capable of running bytecode.
The specific operating system information is masked.
It is these two things that make Java programs compile-once, execute everywhere.

So much for the introduction to what the JVM is, and as usual, let’s look at the structure of this article:

Runtime data area

What is the runtime data region?

Runtime data area

From the figure above, there are two color-coded areas, the red is the thread-shared area and the green is the thread-private area. We’re going to go through them one by one, but as you go through this part, it’s good to think about why these areas are there. Is it because being makes sense?

Heap (Heap)

A lot of people who do development, they pay special attention to the heap and the stack, is that another way to illustrate the importance of the heap and the stack? In this case, let’s start from the point that students are concerned about. (Sweet, aren’t you feeling wet around your eyes again?)

First of all, the Java heap has the following characteristics:

We store our new objects, not our basic types and object references.
Because of the large number of objects created, the garbage collector mainly works in this area.
The thread shares the area and is therefore thread unsafe.
OutOfMemoryError can occur.

In fact, the Java heap can also be divided into the New generation and the old generation, and the new generation can be further divided into Eden, Survivor 1, and Survivor 2. For specific scale parameters, look at the graph below.

I think it is quite clear in the picture, so there is no need for words, right? I’ll probably write an article later on how Java heap objects are created and when memory leaks occur, but this is just a theoretical introduction.

VM Stack

The Java virtual machine stack is also an area of focus for developers. Again, put the dry stuff in first:

The thread-private area, where each thread has its own virtual machine stack, is thread-safe.
Stores basic data types and references to objects.
Each method creates a corresponding stack frame in the virtual machine stack, which is destroyed after the method is executed. Method stack frame is in the first – in – last – out mode of virtual machine stack.
Each stack frame can be divided into local variable tables, operand stacks, dynamic links, method exits, and additional additional information.
There are two possible exceptions in this area: if a thread requests a stack depth greater than allowed by the virtual machine, a StackOverflowError (usually recursively) is thrown; An OutOfMemoryError is thrown when the JVM is dynamically expanding and cannot obtain enough memory.

Similarly, if this article is well received, the actual practice is followed by a separate article.

Native Method Stack

The local method stack can actually be compared to the Java virtual Machine stack, except that the local method stack is where a Java program creates a stack frame when calling a local method. As with the JVM stack, StackOverflowError and OutOfMemoryError are thrown in this area.

Method Area

The method area should also be an area of focus. Similarly, the main characteristics of the method area are as follows:

Threads share the area, so this is a thread unsafe area.
The method area is also an area where OutofMemoryErrors can occur.
The method area stores static variables loaded from the Class file, Class information, constant pools, and code compiled by the compiler.

For the method area, I think it’s important to mention the constant pool. The constant pool can be divided into Class file constant pool and runtime constant pool. After the Java program runs, the information in the Class file is loaded into the method area by the bytecode execution engine, which forms the runtime constant pool.

In addition, speaking of method area, some people may confuse it with permanent generation and meta space. So what’s the difference between them? A method area is a definition in the Java virtual Machine specification, which is a specification, while a permanent generation is an implementation, which is a standard and an implementation. After Java 8, however, there is no such thing as a permanent generation.

Program Counter Register

Program counter is very simple, presumably everyone is not a beginner in Java, also should understand a little thread and process concept? (Soul interrogation, you understand?) It’s okay if you don’t understand. Let me make it clear.

A process is the smallest unit of resource allocation. A thread is the smallest unit of CPU scheduling. A process can contain multiple threads. Now consider the following scenario.

At some point, thread A gets the CPU’s right to execute the internal program. But thread A’s program is not finished yet, and at some point the CPU loses execution rights to thread B. After thread A’s untiring efforts, the execution right of the CPU is recovered, so thread A’s program has to start from scratch again?

This is where the program counter comes in. Its function is to record where the current thread is executing. In this way, when a thread regains the CPU’s right to execute, it executes directly from the recorded location, and branches, loops, jumps, and exception handling all depend on the program counter. In addition, the program counter has the following characteristics:

Thread private, each thread has a program counter, so it is thread safe.
The only area where an OutOfMemoryError does not exist is probably where the designers felt it was unnecessary.

Object creation and access

Object creation

As we said earlier, objects are created in the heap, and you usually just need to create a new one. Is it that simple? It’s not as simple as that. There’s a series of SAO operations going on inside the Java virtual system just for the new keyword.

When the virtual machine hits the bytecode new instruction, it goes to the runtime constant pool to find out if the class corresponding to the instantiated object has been loaded, parsed, and initialized. If it is not loaded, the class information is loaded first, otherwise memory is allocated for the new object.

There are two ways to allocate memory:

Pointer collision: Allocates memory to an object through something that looks like a pointer, provided the heap space is relatively neat.
Free list: the heap space is irregular, and a list is used to record what space is free, which is updated when memory is allocated.

These are two different methods, depending on the type of virtual machine.

Object memory layout

The storage layout of objects in the heap can be divided into three parts:

Object head
- The first type of information: stores the runtime data of the object itself, such as hash codes, GC generational ages, lock status flags, and so on.
- The second type of information: the pointer type, which the Java VIRTUAL Machine uses to determine that the object is an instance of that class.
Instance data: The valid information that an object actually stores.
Aligned padding: has no real meaning and serves as a placeholder.

Object

As we mentioned earlier, the Java virtual machine stack stores basic data types and object references. We already know the basic data types, so what is this object reference?

Well, the object instance is stored in the Java heap, and by referring to this object we can find out where the object is in the heap. However, different Java virtual machines have different methods of locating this object.

Usually, there are two methods:

With handle access, a pool of handles is typically divided in the Java heap.
Use direct Pointers so that the Java virtual machine stack stores the address of the object in the heap.

Both methods of accessing objects have advantages. With direct pointer access, the object can be located directly, reducing the time cost of locating the object once (using the handle pool pointer to locate the object twice), and the biggest benefit is that it is faster. However, when the object is moved, the pointer to the instance data in the handle pool can be changed without changing the reference stored on the stack.

Garbage collection algorithm

Is the object dead?

In the previous section we talked about objects. An object can be created. When is the object destroyed? In general, there are two ways to determine whether an object has been destroyed:

Reference counting algorithm: add a reference counter to an object, incremented by 1 every time the object is referenced in a place; The counter is subtracted by 1 each time an object reference invalidates. But when the counter is 0, it indicates that the object is not referenced.
Reachability analysis algorithm: Starting with a series of root nodes called “GC Roots”, search along the chain of references, and no objects on the chain of references will be reclaimed.

As shown in the figure above, the green objects are all on the GC Roots’ chain of references and will not be collected by the garbage collector, while the gray objects are not on the chain of references and are naturally determined to be recyclable.

So, the question is, what are these GC Roots? Here are the objects that can be used as GC Roots:

Reference objects in the Java virtual machine stack, parameters, local variables, temporary variables, and so on.
An object referenced by a class static property in the method area, such as a static variable of a reference type.
Object referenced by a constant in the method area.
Object referenced in the local method stack.
References within the Java virtual machine, Class objects corresponding to basic data types, and some resident exception objects.
An object that is held by a synchronized.

Now, we know which pairs can be recycled. So how do you recycle objects? There are three main garbage collection algorithms: mark-clean algorithm, mark-copy algorithm, and mark-sort algorithm. These three garbage collection algorithms are also relatively easy to understand, so let me first introduce the concepts and then summarize them in turn.

Tag-clear algorithm

The tag-clear algorithm marks an invalid object and then clears it. The diagram below:

For the mark-clean algorithm, you must see clearly that after garbage collection, the heap space has a large amount of fragmentation and irregularities. When allocating memory for a large object, it is not possible to find enough continuous memory space to trigger garbage collection again. In addition, if there are a large number of garbage objects in the Java heap, then the garbage collector must do a lot of marking and cleaning action, which will inevitably lead to a reduction in the collection efficiency.

Replication algorithm

The tag-copy algorithm divides the Java heap into two pieces, uses only one for each garbage collection, and then moves all the surviving objects to the other area. The diagram below:

The tag-copy algorithm has an obvious disadvantage in that it only uses half of the heap space at a time, resulting in a decrease in Java heap space utilization.

Most Java virtual Machine garbage collectors today use the tag-copy algorithm, but when it comes to dividing Java heap space, it’s not as simple as splitting it in two.

Remember this picture?

When we talked about the Java memory structure, we mentioned the specific division of the Java heap, so let’s talk about it now.

Let’s start with two generational collection theories:

Weak generational hypothesis: most objects have short life spans.
Strong generational hypothesis: The more garbage collections an object has, the longer it will survive.

It is these two generational assumptions that allow designers to divide the Java heap more rationally. Next, here’s the classification of GC:

Minor GC/Young GC: Garbage collection for the new generation.
Major GC/Old GC: Garbage collection for Old ages.
Full GC: Garbage collection for the entire Java heap and method area.

Ok, now that you know the GC classification, it’s time to know the GC process.

Normally, objects that are created for the first time are stored in the Eden section of the new generation. When the Minor GC is first triggered, objects that survive in the Eden section are moved to a section of the Survivor section. The next time the Minor GC is fired, the Eden object, along with one Survivor object, is moved to another Survivor. As you can see, we’re only using one of these two Survivor sections at a time, so we’re just wasting one Survivor.

Every time an object goes through garbage collection, its generational age is increased by one, and when the generational age reaches 15, it is directly stored in the old age.

There is also a case where there is not enough memory in the Eden region to allocate memory to a large object. What should we do? In this case, the large object goes straight into the old age.

Tagging — sorting algorithms

Tagging – The tagging algorithm is a compromise garbage collection algorithm that performs the same steps in object tagging as the previous two. However, after marking, the living object is moved to one end of the heap, and the area outside the living object is cleaned up directly. In this way, memory fragmentation is avoided, and there is no waste of heap space. However, suspending all user threads each time a garbage collection is performed, especially for older objects, which takes longer to collect, which is very bad for the user experience. The diagram below:

HotSpot algorithm details

Root node enumeration

Root node enumeration, which is essentially finding objects that can be used as GC Roots, all user threads must be stopped in the process. Until now, almost no virtual machine has been able to perform GC Roots traversal concurrently with user threads. Of course, the most time-consuming process in the reachabability analysis algorithm, finding chains of references, can already be done in parallel with the user thread. So, why do you need to stop the user thread during root node enumeration?

In fact, it is not difficult to consider that if the user thread is not paused during the GC Roots traversal, and the object reference relationship of the root node collection is constantly changing, the result of the traversal is not accurate. So, does the Java virtual Machine really need to do a global traversal to find GC Roots?

No, the HotSpot virtual machine uses a data structure called OopMap to know where object references are stored. In this way, the elapsed time of GC Roots is greatly reduced.

safer

A safe point is a point where a thread can break. When we do GC Roots traversal, we must stop the user thread. The question is, can a thread stop anywhere? To stop a thread at the nearest safe point, there are two ways to think about it:

Preemptive interrupt: Suspends all user threads and, if any thread is not at the safe point, resumes execution until it reaches the safe point before interrupt. But no Java virtual machine takes this approach.
Active interrupt: No action is done on the thread, only a simple flag bit is set, and the thread polls for this flag bit as it executes. When this flag bit is true, the thread suspends at its nearest safe point.

The safety area

The safe zone is an extension of the safe point, which solves how to stop the thread, but not how to get the virtual machine into garbage collection.

A security zone is an area that can ensure that reference relationships do not change within a code snippet. Therefore, once a thread is in the safe zone, the thread in the safe zone can be ignored. When a thread leaves the security zone, the VM checks to see if the root node enumeration is complete.

Memory set and card table

I don’t know if you have considered such a question? Since the Java heap is divided into new generations and old generations, will object references exist across generations? If there are generations, how to solve the problem of GC Roots traversal in the old age?

First, cross-generational references exist. As a result, the garbage collector builds a data structure called a memory set in the new generation to avoid taking the entire old generation and scanning it for GC Roots.

Memory sets are abstract data structures, and card tables are concrete implementations of memory sets. This relationship is similar to method areas and meta-spaces.

Write barriers

The purpose of the write barrier is simply to maintain and update the card table.

Reachability analysis of concurrency

Earlier we said why pause all user threads (also known as Stop The World)? This is really to prevent the user thread from changing the reference to the GC Roots object. Imagine if a user thread could re-mark a dead object as alive or a alive object as dead. This would cause unexpected errors in the program.

Classic garbage collector

I know a lot about garbage collection theory, but the implementation of specific garbage collectors is not exactly the same. Here are some common garbage collectors.

Serial collector

The Serial collector, the most basic and oldest, pauses all worker threads during garbage collection until the garbage collection process is complete. Here is a schematic of the Serial garbage collector in action:

ParNew collector

The ParNew garbage collector is actually a multi-threaded version of the Serial garbage collector, which allows the ParNew garbage collector to use multiple threads for garbage collection.

Parallel Scavenge

It is also a new generation of garbage collector, based on the same mark-copy algorithm. Its greatest feature is the ability to control throughput.

So what is throughput?

Serial Old collector

The Serial Old collector is an older version of the Serial collector. The garbage collector works the same way as the Serial collector.

Parallel Old collector

The Parallel Old collector is an Old version of the Parallel Scavenge. It supports multi-threaded concurrent collections. Here’s how it works:

CMS collector

The Parallel Scavenge, mentioned earlier, is a garbage collector that controls throughput. Now for the CMS collector, which is a garbage collector for the shortest pause time, based on the mark-sweep algorithm. The operation process of CMS garbage collector is relatively complex compared with the previous several garbage collectors. The whole process can be divided into four parts:

Initial tag: You need to Stop The World. Here, you only tag objects that GC Roots can directly relate to, so it’s fast.
Concurrency markup: Traversing the entire chain of references to GC Roots from the associated object takes the longest time, but can be run concurrently with the user thread.
Reschedule: Fixed The concurrency time, because user threads could cause The markup to change, also need to Stop The World.
Concurrent clear: Clear dead objects.

Garbage First

The Garbage First (G 1) collector is a landmark achievement in the history of Garbage collectors and is primarily oriented to server-side applications. In addition, although G 1 collector still retains the concepts of the Cenozoic and the old age, the Cenozoic and the old age are not fixed. They are all dynamic collections of a series of regions.

Ok, so much for the garbage collector, but there are still a lot of places to pay attention to the G1 collector, and friends can refer to relevant information. Next article, we will talk about class loading mechanism, ok?

It’s 2:30 in the morning, so I’m glad I didn’t see the 4:00 a.m. sun. Will God take away his favorite son? Or that sentence, the original is not easy, like to see, and finally add a concern, regular push original high-quality articles.

And finally, finally, something dry. (With a girlfriend like that, would I stay up late?)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Don’t say you don’t know how to use JVM. After reading this article, you can talk with your interviewer for half an hour.

preface