The run data area of the JVM

Let me start by briefly drawing a schematic of the structure of the JVM as follows.

We focused on the data area of the JVM at runtime, and you can see that there are roughly five sections at runtime.

1. The method of area

Instead of just storing “methods”, it stores information about the entire class file. When the JVM is running, the classloader subsystem will extract the class information from the class file and store it in the method area. Examples include the name of the class, the type of the class (enumeration, class, interface), fields, methods, and so on.

2. Heap

Those of you who are familiar with C/C ++ programming should be familiar with Heap, whereas with Java, each application corresponds to a unique JVM instance, and each JVM instance corresponds to a unique Heap. The heap consists mainly of an object instance of the keyword new, a pointer to this, or an array of objects, all placed in the heap and shared by all applying threads. The heap is managed by the JVM’s automatic memory management mechanism, called garbage Collection (GC).

3. Stack

Operating system kernel for a process or the establishment of a thread storage area, it stores a thread method call state, it has advanced after the characteristics. The size and lifetime of the data in the stack are strictly deterministic. For example, an int variable declared in a function is stored in the stack, its size is fixed, and its lifetime ends when the function exits. In a stack, each method corresponds to a stack frame, and the JVM performs two operations on the Java stack: push and push. Both operations are executed in units of stack frames. There is also a small amount of data such as code compiled by the just-in-time compiler.

4. PC registers

The PC register is used to store the address of an instruction. Each thread has a PC register.

5. Local method stack

Native methods used to call other languages, such as native code written in C/C++, are executed in the native method stack, not in the Java stack.

I met the GC

Automatic garbage collection is simply looking for unwanted objects in the Java heap. For example: your room is the MEMORY of the JVM, you create garbage and mess in your room, and your mother is GC (sounds like a curse). Your mom thinks your room is messy all the time and has to send you out to clean it. If your mom is always cleaning your room, you can’t play video games and eat instant noodles in the process. But if you’re in your room all the time, sooner or later your room will turn into an uninhabitable pigsty.

So, what kind of recycling is better? We can roughly think of the following ideas.

Marking

First, all objects in the heap are scanned: we always know what’s garbage and what’s useful. Instead of throwing out your favorite clothes or your id card, she takes everything out and labels it so she can come back and take care of it when the time is right.




Normal Deletion


The garbage collector will remove the marked objects: your mother has already sorted out some of the stuff (or all of it), and then will take it out and throw it away. You’re just glad the room can be ravaged again.

Deletion with Compacting

Compression cleaning method: we know that the memory is free, does not mean that we can use it, for example, we want to allocate array such a continuous space, if there are many fragments in the memory, it will not work. The room may need a new bed, but after the old wardrobe is thrown out, the old bed can’t fit in the old place, so you need to compress the space and put the remaining furniture and objects together to make more room.

It is interesting to note that the JVM is not similar to the objective – c ARC (AutomaticReferenceCounting) way to reference counting objects, but use, is called the Root search algorithm (GC Root) method, The basic idea is to select a small number of objects as GC Roots and form a collection of root objects, and then search the ReferenceChain from these objects as GC Roots. If the target object is connected to GC Roots, the target object is said to be reachable. If the target object is unreachable, the target object can be recycled.

The algorithm used by GC Root is quite complex and you don’t have to remember all the details. But you should know that there are four main types of objects that can be used as GC Root.

  1. Objects referenced in the JVM stack;

  2. The object referenced by the static property in the method area;

  3. The object referenced by the constant in the method area;

  4. Objects referenced by JNI (Native methods) in the Native method stack;

After JDK1.2, Java divides references into four types: strong reference, soft reference, weak reference, and virtual reference. The strength of these four kinds of references decreases successively.

Generation and GC mechanism

Well, does that sound like it? But in practice, unfortunately, the vast majority of objects in the JVM die young and most of the memory in the heap is temporarily allocated at coding time, so GC as described above is often not enough to meet our needs, both in terms of efficiency and overhead. Moreover, the time and overhead of GC actually increases as more objects are allocated. So, the MEMORY of the JVM is divided into three main parts: the new generation, the old generation, and the permanent generation.

The new generation

All newly created objects are all in the new generation. Eden district keeps the latest objects, and there are two survivorspaces — S1 and S0, with the proportion of the three areas roughly 8:1:1. When the Eden region of the new generation is full, a GC is triggered, which we call minor garbage collections. Minor garbage collections is a Stopthe world event, for example, when your mother cleans up, she kicks you out instead of cleaning up as you throw out your garbage.

Let’s take a look at the allocation process of objects in the heap. When a new object enters the heap, it is put into the Eden area of the new generation by default, and S area is empty by default. The number of objects below represents how many GC’s have gone through, which is the age of the object.

When the Eden region is full and minor Garbage Collections are triggered, any referenced objects are allocated to the S0 region and the remaining unreferenced objects are cleaned up.

During the next GC, it is likely that some objects in S0 will be moved to S1 together with unreferenced, referenced objects and live objects in S0. Unreferenced objects in Eden and S0 will be cleaned up.

The next step in the infinite loop will be to assign any surviving member of the new generation beyond a certain age to the Tenured region of the old age. This age can be set with the parameter MaxTenuringThreshold. The default value is 15, and the example in the figure is 8.

The new generation of memory management algorithm is CopyingGC, also known as mark-copy method. The principle is To divide memory into two Spaces: a From space and a To space. Objects are initially allocated only in the From space, and the To space is free. GC copies and pastes live objects From the From space into the To space, then changes the To space To the new From space, and the original From space To the To space.

First mark unreachable objects.

Then move the surviving objects to the TO region and keep them contiguous in memory.

Clean up the trash.

As you can see, memory is almost continuous after the above operation, so its efficiency is very high, but the relative throughput is high. Also, splitting the memory in half takes up nearly half of the available memory. With a piece of pseudocode to implement roughly as follows.

<code>
void copying() {$free = $to_start 
// $freeObj, // for each object successfully copied$freeMove forward size(obj)for(r : $roots) *r = copy(*r)$from_start.$to_start} </code>Copy the code



The old s

The old age is used to store objects with a long lifetime, and the GC of the old age region is a Major garbage collection, which is triggered when the memory in the old age runs out. This is also a Stopthe World event, but as the name suggests, the recycling process is quite slow, since it involves recycling all objects of both the new generation and the old generation, also known as FullGC.

In the old days, the first algorithm used for managing memory was the mark-clear algorithm. This algorithm is well understood. Combined with the definition of GC Root, we will mark all unreachable objects for cleaning.

Before cleaning, yellow objects are unreachable.

After cleaning, all objects become reachable.

So, the disadvantage of this algorithm is easy to understand: yes, there will be a lot of memory fragmentation in the process of tag cleaning, and Java usually allocates memory in contiguous memory, so we will waste a lot of memory. As a result, modern JVM GCS in the old days used a mark-compress cleanup method to clean up and compress the memory in the image above to keep the memory continuous, even though this algorithm is the least efficient of the three.

The permanent generation

The permanent generation is located in the Method area and mainly stores metadata, such as Class and Method metadata, which is not very relevant to the object being reclaimed by the GC, so we can almost ignore its impact on the GC. In addition to newer virtual machine technologies such as JavaHotSpot, useless constants and classes are recycled to prevent overflow of the method area with frequent self-setup ClassLoader operations such as reflection.

GC collector and optimization

In general, GC should not be a bottleneck affecting system performance, and we generally consider the following when evaluating the merits of a GC collector:

  1. throughput

  2. GC overhead

  3. The pause time

  4. The GC frequency

  5. Heap space

  6. Object life cycle

So different GC collectors need to be selected and tuned for our application scenarios. Looking back at the history of GC, there are four main GC collectors: Serial, Parallel, CMS, and G1.

Serial

The Serial collector uses a mark-copy algorithm and can use a single-threaded Serial collector with -xx :+UseSerialGC. However, during GC, the program enters a long pause period, which is generally not recommended.

Parallel

-xx :+UseParallelGC -xx :+UseParallelOldGCParallel Also uses the mark-copy algorithm, but we call it a throughput first collector, because the main advantage of Parallel is to use multiple threads to complete garbage cleaning. This can take full advantage of the multi-core nature and significantly reduce GC time. When you have a high-throughput application scenario, such as message queuing, that needs to ensure efficient utilization of CPU resources and can tolerate certain pause times, this approach is a priority.

CMS ( ConcurrentMarkSweep)

-xx :+UseParNewGC -xx :+UseConcMarkSweepGCCMS uses the mark-clean algorithm. If the application pays special attention to the server response speed (for example, Apiserver) and wants the shortest system pause time to bring better customer experience, CMS can be selected. The CMS collector suspends all application threads at MinorGC and performs garbage collection in a multithreaded manner. FullGC does not suspend the application thread, but uses several back-end threads to periodically scan the old chronological space and reclaim objects that are no longer used.

G1 (GarbageFirst)

-xx :+UseG1GC If the heap is large, full gc will cause pauses, and the caller will block, time out, and even avalanche, so it is necessary to reduce the frequency and duration of full GC. G1 was created to reduce the number of FULLGCs, and compared to CMS, G1 uses a mark-compress cleaning algorithm, which can greatly reduce the memory fragmentation generated by large memory (4GB +) GCS.

G1 provides two GC modes, YoungGC and MixedGC, both of which are StopTheWorld(STW). YoungGC mainly carries out GC for Eden area. MixGC not only carries out normal garbage collection for the new generation, but also recycles part of the old partition marked by the back-end scanning thread.

Another interesting point is that G1 does away with the division of Cenozoic and old physical space. Instead, G1 divides the heap into several regions, each of which is a multiple of 2 and has the same size, up to 2000. In addition, G1 has a Humongous region dedicated to large objects that are more than 50% of the size of a region. In normal resolution, objects are copied from one region to another, and the heap is compressed.


-xx :+UseSerialGC: serial collector in new generation and old generation -xx :+UseParNewGC: Parallel collector in new generation -xx :+UseParallelGC: -xx :ParallelGCThreads: Set the number of threads used for garbage collection -xx :+UseConcMarkSweepGC: -xx :ParallelCMSThreads: Set the number of THREADS in CMS -XX:+UseG1GC: enable G1 garbage collector



Accessing and storing objects in a Java virtual machine

Student stu=new Student();

In this code, Student stu is a reference variable and therefore stored on the Java virtual machine stack, and New Student() is an instance object stored on the Java heap. In addition, the Java heap must contain the address information where the object type data (such as object type, parent class, implemented interface, method, and so on) can be found, which is stored in the method area.

In the Java Virtual Machine specification, the reference type only specifies a reference to an object, and does not define which way the reference should be located or the specific location of the object in the Java heap. Therefore, different virtual machines implement different object access methods. There are two main access methods: Use handles and direct Pointers. If the handle access method is used, a chunk of memory will be allocated to the Java heap as the handle pool. Reference stores the handle address of the object, and the handle contains the specific address information of the instance data and type data of the object, as shown in the figure below.


Pointer to the way

How do you place access types in the layout of Java heap objects


The two methods of accessing objects have their own advantages. The biggest advantage of using the handle access method is that reference stores a stable handle address and only changes the instance data pointer in the handle when the object is moved (which is a very common behavior in garbage collection), but the reference object itself does not need to be modified. The biggest benefit of using direct pointer access is that it is faster. It saves the time cost of a pointer location, and since objects are accessed very frequently in Java, this overhead can add up to a very significant execution cost.


If you feel the article is helpful to you, please let more people see it!

1. Like this article and leave a comment!

2. Support the author by forwarding this article!

3. Wechat search ~ follow wechat public number: programmer knowledge dock to obtain a full set of learning materials!

Micro channel scan code concern: send technical articles on time every day! You can also join a dedicated learning community!

Thumb up! Good article to top oh ~