preface

The second part introduces the Java memory runtime area, in which the program counter, the virtual machine stack, the local method stack three areas are born with the thread, with the thread and the end; The stack frame within the stack methodically performs the exit and push operations as methods enter and exit. The amount of memory allocated in each stack frame is basically known when the class structure is determined, so the allocation and reclamation of memory in these regions are deterministic. There is no need to think too much about reclamation in these areas, because when the method ends or the thread ends, memory naturally follows reclamation.

On the other hand, the Java heap and method area are different. Multiple implementation classes in an interface may require different amounts of memory, and multiple branches in a method may require different amounts of memory. We only know which objects will be created while the program is running. This part of memory is allocated and reclaimed dynamically, and the garbage collector focuses on this part of memory.

The body of the

(1) Determination of the object’s life and death

How to determine whether an object in Java should be “alive” or “dead” is the first thing the garbage collector does.

1. Reference counting algorithm

There is a reference counter for every concrete object (not a reference) in the Java heap. When an object is created and initialized with an assignment, the variable count is set to 1. Every time it is referenced somewhere, the counter value is incremented by one. The counter value is subtracted by 1 when a reference is invalidated, that is, when a reference to an object exceeds its lifetime (out of scope) or is set to a new value. Any object with a reference count of zero can be garbage collected. When an object is garbage collected, the count of any objects it references is subtracted by one.

  • Advantages:

    The reference counting collector is simple to perform, efficient to determine, and interwoven in the program running. It is beneficial to the real-time environment where the program is not interrupted for a long time.

  • Disadvantages:

    Circular references between objects are difficult to detect. At the same time, reference counters increase the overhead of program execution. So the Java language does not choose this algorithm for garbage collection.

2. Accessibility analysis algorithm

The accessibility analysis algorithm, also known as the root search algorithm, uses a series of objects called GC Roots as the starting point and then searches down. The path taken by the search is called the Reference Chain. When an object is unreachable without any Reference Chain to GC Roots, the object is not available.

As shown in the figure below: Object5, Object6 and Object7 are associated with each other, but they are unreachable to GC Roots, so they are also judged as recyclable objects.

The GC root object

In Java, there are four types of objects that can be GC Roots:

  • Objects referenced in the virtual machine stack (local variable table in the stack frame)

  • A variable referenced by JNI (Native method) in the local method stack

  • A variable referenced by a class static property in the method area

  • A variable referenced by a constant in the method area

    All modern GC algorithms used in the JVM seek out all surviving objects before collection. The reachability analysis algorithm is introduced from graph theory in discrete mathematics. The program treats all the references as a graph. This concept is best illustrated by the memory layout in the JVM shown below:

(2). Object reference classification

1. Strong Reference

References such as Object obj = new Object() are common in code, and the garbage collector will never reclaim the referenced Object as long as the strong reference is in place.

2. Sofe Reference

Useful, but not required, objects that can be implemented as soft references using the SoftReference class. These objects are listed in the collection scope for a second collection before the system is about to have a memory overflow exception. If there is not enough memory in this collection, the memory overflow exception will be thrown.

3. Weak References

An object that is not required but is weaker than a soft reference. An object associated with a WeakReference can only survive until the next garbage collection occurs. The JDK provides a WeakReference class to implement a WeakReference. Objects associated with soft references are reclaimed regardless of whether the current memory is sufficient.

4. Phantom Reference

Virtual references, also known as phantom or phantom references, are the weakest type of reference. The JDK provides the PhantomReference class to implement virtual references. The only purpose of setting a virtual reference for an object is to receive a system notification when the object is collected by the garbage collector.

(3).Finalize () second marker

Whether an object should be collected by the garbage collector during GC is marked at least twice.

In the first marking process, whether the object is reachable with GC Roots is analyzed by the accessibility analysis algorithm. Objects that pass the first token and are filtered as unreachable are marked a second time.

The second marking process is to determine whether it is necessary to execute finalize method for unreachable objects. The execution condition is that the finalize method of the current object is overridden and has not been called by the system. If allowed to execute then the object will be placed on a queue called an F-Query, waiting to be executed.

Note: Since Finalize is run by a low priority Finalizer thread, the Finalize method of this object does not have to be executed, and even if it is, there is no guarantee that the Finalize method will complete. If the object saves itself in Finalize method, it only needs to establish association with any object on the reference chain again.

(4). Garbage collection algorithm

This section describes the ideas of various garbage collection algorithms:

1. Mark-clear algorithm

The mark-sweep algorithm scans the root set and marks the surviving objects. After marking is complete, unmarked objects in the entire space are scanned and reclaimed.

  • Advantages:

    Simple implementation, do not need to move the object.

  • Disadvantages:

    The process of marking and clearing is inefficient, resulting in a large number of discontinuous memory fragments and increasing the frequency of garbage collection.

2. Replication algorithm

This collection algorithm solves the efficiency problem of the marker clearing algorithm. It divides the memory region into two identical memory blocks. Only half the space is used at a time, and new objects generated by the JVM are placed in half the space. When half of the space is used up, GC is performed to copy reachable objects to the other half of the space, and then the used memory space is cleaned up at once.

  • Advantages:

    The memory can be allocated sequentially, which is simple and efficient to implement, without considering memory fragmentation.

  • Disadvantages:

    The available memory size is reduced to half its original size, and objects are frequently copied when their survival is high.

3. Mark-collate algorithm

The mark-tidy algorithm uses the same method as the mark-clean algorithm to mark objects, but does not directly clean up the recyclable objects afterwards, but moves all the living objects to the free space at one end, and then cleans up the memory space beyond the end boundary.

  • Advantages:

    It solves the memory fragmentation problem of the mark-clean algorithm.

  • Disadvantages:

    Local object movement is still required, which reduces the efficiency to some extent.

4. Generational collection algorithm

Currently, commercial virtual machines use the garbage collection algorithm of generational collection. The generational collection algorithm, as its name suggests, divides memory into chunks based on the lifetime of an object. They generally include the young generation, the old generation and the permanent generation, as shown in the figure below:

Young Generation

Most newly created objects are allocated here, and since most objects become unreachable quickly after creation, many objects are created in a new generation and then disappear. The process by which objects disappear from this region is called minor GC.

In the Cenozoic, there is an Eden zone and two Survivor zones. New objects are first allocated in Eden (if the new object is too large, it is directly allocated in the old age). In GC, objects in Eden are moved to Survivor until the object reaches a certain age (defined as the number of times it survived THE GC), and is moved to the old age.

You can set the relative size of the new generation and the old generation. The advantage of this approach is that the new generation size dynamically expands with the overall heap size. Parameter -xx :NewRatio Sets the ratio of the old generation to the new generation. For example, -xx :NewRatio=8 specifies the old age/Cenozoic age as 8/1. The older generation takes up 7/8 of the heap size and the younger generation takes up 1/8 of the heap size (default is 1/8).

Such as:

-XX:NewSize=64m -XX:MaxNewSize=1024m -XX:NewRatio=8
Copy the code

The Old generation

Objects that do not become unreachable and survive the new generation will be copied here. It takes up more space than the new generation. Also due to its relatively large space, much less GC occurred in the old age than in the Cenozoic era. The process by which objects disappear from the old days is called major GC (or full GC).

Permanent Generation

Things like class hierarchy information, method data and method information (such as bytecode, stack and variable sizes), run-time constant pools (moved out of the permanent generation after JDK7), identified symbolic references and virtual method tables, etc. They are almost always static and are rarely unloaded or recycled. In a HotSpot virtual machine prior to JDK8, the ** “permanent” ** data of the class is stored in an area called the permanent generation.

A permanent generation is a contiguous memory space, which can be controlled by setting the -xx :MaxPermSize value before the JVM starts. However, JDK8 later cancelled the permanent generation, and the metadata was moved to a local memory area not connected to the heap called Metaspace.

summary

JDK8 heap memory is generally divided into young generation and old generation, different generation according to their own characteristics using different garbage collection algorithm.

For the Cenozoic generation, a large number of objects die and only a small number survive each GC. Considering the low replication cost, it is suitable to adopt the replication algorithm. So you have the From Survivor and To Survivor sections.

For older generations, there is no extra memory space to guarantee it because the object has a high survival rate. Therefore, it is suitable to use mark-clean algorithm and mark-tidy algorithm to recycle.

reference

Zhiming Zhou, understanding the Java Virtual Machine: JVM advanced Features and Best practices, China Machine Press


Welcome to pay attention to the technical public number: one technology Stack

This account will continue to share backend technology essentials, including virtual machine fundamentals, multi-threaded programming, high-performance frameworks, asynchronous, cache and messaging middleware, distributed and microservices, architecture learning and advanced learning materials and articles.