1 overview

1.1 What is Garbage?

  • Garbage is an object that has no pointer to it in the running program. This object is garbage that needs to be collected
  • If garbage in memory is not cleaned up in time, the space occupied by garbage objects will remain until the end of the application, and the reserved space cannot be used by other objects. It may even cause memory overflow

Garbage collection is not a companion product of the Java language. Lisp, the first language to use dynamic memory allocation and garbage collection, has three classic problems with garbage collection:

  • What memory needs to be reclaimed?
  • When is it recycled?
  • How to recycle?

Garbage collection is Java’s signature capability and greatly improves development efficiency. Nowadays, garbage collection is almost standard in modern languages

1.2 Why is GC needed

  • For high-level languages, the basic understanding is that without garbage collection, memory will run out sooner or later, because constantly allocating memory without recycling is like constantly producing garbage without ever cleaning it up
  • In addition to releasing useless objects, garbage collection can also clear memory of record fragments. Defragmentation moves occupied heap memory to one end of the heap so that the JVM can allocate it to new objects
  • As applications deal with larger and more complex businesses and more users, they cannot be kept running without GC. The GC that often causes STW can’t keep up with the actual requirements, so there are constant attempts to optimize GC

In the early days of C/C++, garbage collection was basic. The work is done by hand. Developers can use the new keyword for memory allocation and the delete keyword for memory release

This method can flexibly control the time of memory release, but it will bring management burden to the developer to frequently apply for and release memory. If an area of memory is forgotten to be collected due to programmer coding problems, a memory leak can occur, and garbage objects can never be cleaned up. As the system runs longer, garbage objects can continue to consume more memory until they overflow and cause the application to crash

Now, in addition to Java, C#, Python, Ruby, and other languages use the idea of automatic garbage collection, which is the trend of the future. It can be said that this way of automatic memory allocation and garbage collection has become a necessary standard in modern development languages

1.3 Java garbage Collection Mechanism

  • Automatic memory management reduces the risk of memory leaks and spills by eliminating the need for developers to manually allocate and reclaim memory
  • The automatic memory management mechanism frees programmers from the heavy memory management and allows them to concentrate more on business development
  • Automatic memory management is a black box for Java developers, and relying too heavily on “automatic” can be a disaster, at its worst weakening the Java developer’s ability to locate and resolve problems when a program runs out of memory

advice

  • It is important to understand the JVM’s automatic memory allocation and memory recycling principles. Only after we really understand how the JVM manages memory can we quickly locate and resolve outofMemoryErrors based on the error exception log
  • These “automated” technologies need to be monitored and tuned when overflow, leak, and garbage collection becomes a bottleneck for higher concurrency
  • The garbage collector can collect from the young generation, the old generation, and even the full heap and method area, where the Java heap is the focus of the garbage collector’s work
  • Young area was frequently collected, old area was rarely collected, and Perm area was basically not moved

2 Garbage marking phase

2.1 Overview of garbage tags

Object survival judgment

  • Almost all Java object instances are stored in the heap, and before the GC can perform garbage collection, it first needs to distinguish between alive and dead objects in memory. Only objects marked as dead are freed by GC during garbage collection, so this process can be referred to as the garbage marking phase
  • So how exactly do you mark a dead object in the JVM? Simply put, an object is declared dead when it is no longer referenced by any living object
  • There are two ways to judge the survival of objects: reference counting algorithm and reachability analysis algorithm

2.2 Reference Counting (Not used in Java)

  • The Reference Counting algorithm is relatively simple and stores a Reference counter attribute of an integer for each object. Used to record when an object is referenced
  • For an object A, if any object references A, then A’s reference counter is incremented by 1. When a reference is invalid, the reference counter is decayed by one. As long as the reference counter of object A has A value of 0, it indicates that object A can no longer be used and can be reclaimed
  • Advantages: simple implementation, garbage object easy to identify; High judgment efficiency and no delay in recovery
  • Disadvantages:
    • It requires separate fields to store counters, which adds to the storage overhead
    • Each assignment requires updating the counter, along with addition and subtraction, which adds time overhead
    • A serious problem with reference counters is that they cannot handle circular references. This is a fatal flaw that prevents this type of algorithm from being used in the Java garbage collector

The code demo proves that Java does not use reference counting

/** * -xx :+PrintGCDetails Public class RefCountGC {private byte[] bigSize = new byte[5 * 1024 * 1024]; private byte[] bigSize = new byte[5 * 1024 * 1024]; //5MB Object reference = null; public static void main(String[] args) { RefCountGC obj1 = new RefCountGC(); RefCountGC obj2 = new RefCountGC(); obj1.reference = obj2; obj2.reference = obj1; obj1 = null; obj2 = null; Can obj1 and obj2 be collected? System.gc(); try { Thread.sleep(1000000); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

Set 0bj1 – reference and 0bj2 – reference to null. The two pieces of memory in the Java heap remain references to each other and cannot be reclaimed

summary

  • Reference counting algorithms are the recycle of choice for many languages, such as Python, which has become more popular due to artificial intelligence, and supports both reference counting and garbage collection.
  • Which is optimal depends on the scenario, and there are large-scale attempts to retain only reference counting mechanisms in practice to improve throughput.
  • Java did not choose reference counting because of the basic difficulty of handling circular reference relationships.
  • How does Python solve circular references?
    • Manual dereference: It is well understood that when appropriate, the dereference relationship.
    • Use weakreference WeakRef, a standard library provided by Python that is designed to address circular references.

2.3 Algorithm of accessibility analysis

  • Reachability analysis is Java, C# selection, also known as Tracing GarbageCollection
  • Starting from the root object (GCRoots), search from top to bottom whether the target object connected by the root object collection is reachable
  • After using the reachabability analysis algorithm, all the living objects in memory are directly or indirectly connected by the root object set. The path searched is called the reference chain
  • If the target object is not connected by any reference chain, it is unreachable, meaning that the object is dead and can be marked as a garbage object
  • In the reachability analysis algorithm, only objects that can be connected directly or indirectly by the root object set are viable objects

GC Roots

  • Objects referenced in the virtual stack, such as parameters, local variables, etc. used in methods called by individual threads
  • Objects referenced by JNI (commonly referred to as local methods) in the local method stack
  • The object referenced by the class static property in the method area
    • Example: Java class reference type static variable
  • The object referenced by the constant in the method area
    • A reference in the string constant pool
  • All objects held by synchronized
  • Internal references of the Java VIRTUAL machine
    • Basic data types correspond to class objects, some resident exception objects, such as nullpointerException, OOMerror, system class loaders
  • Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, native code caches, and so on
  • In addition to these fixed collections of GCRoots, other objects can be added “temporarily” to form a complete collection of GCRoots, depending on the garbage collector selected by the user and the memory region previously reclaimed. For example: generational collection and local collection
    • If garbage collection is performed for only one area of the Java heap (for example, typically only for the new generation), it must be considered that the memory area is an implementation detail of the virtual machine itself, and not isolated. Objects in this area can be referenced by objects in other areas. The accuracy of accessibility analysis can only be ensured by adding related regional objects into GC Roots collection
  • Tip: Since Root stores variables and Pointers on a stack, a pointer that holds objects in the heap but is not stored in the heap itself is Root

Pay special attention to

  • If a reachability analysis algorithm is used to determine whether memory is retrievable, the analysis must be performed in a consistent snapshot. If this is not satisfied, the accuracy of the analysis results cannot be guaranteed
  • This is also an important reason why GC must be “StopTheWorld”, even when enumerating root nodes in the supposedly (almost) non-pause CMS collector

3 Object finalization mechanism

  • The Java language provides an object termination finaliztion mechanism that allows developers to provide custom processing logic for objects before they are destroyed
  • When the garbage collector finds that there is no reference to an object, that is, garbage collector will call the Finalize () method of the object first
  • The Finalize () method allows to be overridden in subclasses for resource release when objects are reclaimed. This method usually does some resource freeing and cleaning, such as closing files, sockets, and database connections
  • Do not actively call the Finalize () method of an object. Instead, let the garbage collector call it

3.1 Whether the Object is “Dead”

  • Objects ina virtual machine generally have three possible states, thanks to finalize ()
  • If an object is unreachable from all root nodes, it is no longer in use. In general, this object needs to be reclaimed. But in fact, they are not necessarily dead. They are on probation.An unreachable object may “resurrect” itself under certain conditionsIf so, then it is not reasonable to reclaim it. For this purpose, define three possible states of the object in the virtual machine. As follows:
    • Accessible: This object can be reached from the root node.
    • Resurrectible: All references to objects are released, but objects may be resurrected in Finalize ().
    • Untouchable: Objects will enter the untouchable state when Finalize () is called and not resurrected. An untouchable object cannot be resurrected, because Finalize () will only be called once. Objects can only be reclaimed if they are no longer reachable

3.2 Determine whether the specific process can be recycled

Determining whether an object ObjA can be reclaimed requires at least two tagging sessions

  • 1. If the object has no chain of references to GCRoots, the first mark is made
  • 2. Filter to determine whether it is necessary to execute finalize () method on this object
    • If Object A does not overwrite the Finalize method, or if the Finalize method has already been called by A virtual machine, then the virtual machine is deemed not necessary to execute and Object A is judged to be untouchable
    • If object A overrides Finalize () and has not been executed yet, then A will be inserted into the F-Queue, and A virtual machine automatically creates A low-priority Finalizer thread that triggers finalize ()
    • Finalize method is the last chance for objects to escape death. Later, GC will mark the objects in the F-queue for the second time. If A connects with any object in the reference chain in Finalize method, then A will be removed and the collection will be collected at the second time. After that, objects will appear again when there is no reference, finalize method will not be called again, and objects will directly become unreachable state
/** * Tests the Finalize () method of the Object class, the finalization mechanism for objects. * */ public class CanReliveObj { public static CanReliveObj obj; This method can only be called once @override protected void finalize() throws Throwable {super.finalize(); System.out.println(" Call finalize() method overridden by the current class "); obj = this; } public static void main(String[] args) {try {obj = new CanReliveObj(); Obj = null; System.gc(); // Call garbage collector system.out.println (" 1st gc"); // Because Finalizer Thread has a low priority, pause for 2 seconds to wait for thread.sleep (2000); if (obj == null) { System.out.println("obj is dead"); } else { System.out.println("obj is still alive"); } system.out. println(" 2nd gc"); Obj = null; obj = null; System.gc(); // Because Finalizer Thread has a low priority, pause for 2 seconds to wait for thread.sleep (2000); if (obj == null) { System.out.println("obj is dead"); } else { System.out.println("obj is still alive"); } } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

Console output

The first gc calls the finalize() method obj is still alive and the second GC obj is deadCopy the code

3.3 Obtaining the Dump File

  • Method 1: Jmap is used for command lines
    • jps
    • Jmap -dump:format=b,live,file=test1.bin {process id}
  • Approach 2: Export using JVisualVM

3.4 GC Roots analysis

public class GCRootsTest { public static void main(String[] args) { List<Object> numList = new ArrayList<>(); Date birth = new Date(); for (int i = 0; i < 100; i++) { numList.add(String.valueOf(i)); try { Thread.sleep(10); } catch (InterruptedException e) { e.printStackTrace(); }} system.out. println(" data added, please do: "); new Scanner(System.in).next(); numList = null; birth = null; System.out.println("numList, birth "); system.out. println("numList, birth "); new Scanner(System.in).next(); System. The out. Println (" end "); }}Copy the code

4 Cleanup Phase

After successfully separating live and dead objects in memory, the next task of the GC is to perform garbage collection to free up memory space occupied by useless objects, In order to have enough free memory space to allocate memory for new objects.Three garbage collection algorithms that are common in JVMS today are Mark Sweep, Copying, and Mark Compact

4.1 Mark Sweep Algorithm

The Mark Sweep algorithm is a very basic and common garbage collection algorithm proposed by J. McCarthy et al in 1960 and applied to Lisp.

4.1.1 Execution process

  • When available memory in the heap is exhausted, the entire program (also known as stop the world) is stopped and two tasks are performed, the first is marked and the second is cleared
  • Tag: Collector traverses from the reference root node, marking all referenced objects. It is usually recorded as reachable in the Header of an object
  • Cleanup: The Collector traverses the heap memory linearly from beginning to end and reclaims an object if it is not marked as reachable in its headers

4.1.2 defects

  • Not very efficient
  • During GC, the entire application needs to be stopped, resulting in a poor user experience
  • The free memory cleared in this way is discontinuous, resulting in memory fragmentation. A free list needs to be maintained
  • Note: This does not mean that the address of the object to be cleared is actually empty. Instead, the address of the object to be cleared is stored in the free address list. The next time a new object needs to be loaded, determine whether there is enough space for garbage, and if so, store it

4.2 Replication Algorithm

In order to address the shortcomings of the tag-one cleanup algorithm in garbage collection efficiency, M.L. Insky published a famous paper in 1963, CALISP Garbage Collector Algorithm Using SerialSecondary Storage). The algorithms described by M.L. Minsky in this paper are known as Copying algorithms, and were successfully introduced by M.L. insky himself into an implementation of Lisp

4.2.1 Core Idea

Dividing the living memory space into two pieces and using only one at a time will result in garbage collection. Live objects in the used memory are copied to unused memory blocks, all objects in the used memory block are cleared, the roles of the two memory blocks are swapped, and garbage collection is completed

4.2.2 advantages

  • No marking and cleaning process, simple implementation, efficient operation
  • After copying the past to ensure the continuity of space, there will be no “fragmentation” problem

Holdings shortcomings

  • The disadvantage of this algorithm is also obvious, that is, it requires twice the memory space.
  • For GC with a large number of regions, such as G1, replication rather than movement means that the GC needs to maintain object reference relationships between regions, which is not a small amount of memory or time
  • If the system has a large number of live objects, the replication algorithm is not ideal, because the number of live objects that the replication algorithm needs to copy is not very large, or very low

4.3 Mark-compact (consolidation) algorithm

The high efficiency of the replication algorithm is based on the premise of fewer live objects and more garbage objects. This happens a lot in the new generation, but in the old age, it’s more common for most objects to be living objects. If the replication algorithm is still used, the cost of replication will be high due to the large number of living objects.

It is true that the tag-one cleanup algorithm can be used in the old days, but it is not only inefficient, but also generates memory fragmentation after a collection is performed, so JVM designers need to improve on this. The Mark – Compact algorithm was born

4.3.1 Core Idea

  • The first stage marks all referenced objects from the root node, just like the mark-one cleanup algorithm
  • The second phase compresses all living objects to one end of memory and discharges them in sequence
  • After that, clean up all the space outside the boundary

  • The net effect is the same as another defragmentation after the token clearing algorithm has been executed.
  • Fundamentally different from the tag clearing algorithm, the tag clearing algorithm is a non-mobile algorithm, while the tag compression algorithm is mobile
  • The essential difference between the two algorithms is that the marker-clearance algorithm is a non-mobile recovery algorithm. Shrinkage is mobile. Whether or not to move a reclaimed live object is a risky decision with both advantages and disadvantages
  • As you can see, marked living objects are sorted by memory address, while unmarked memory is cleaned up. This way, when we need to allocate memory for new objects, the JVM only needs to hold the starting address of memory, which is significantly less overhead than maintaining a free list

When allocating memory for a new object is done in a neat and ordered way, where used and unused memory are on one side and a marker Pointer is attached to each other to record the starting point of the next allocation. You just change the offset of the Pointer to allocate the new object to the first free memory location. This is called Bump the Pointer.

4.3.2 advantages

  • Eliminating the problem of scattered memory in the mark-clean algorithm, the JVM only needs to hold the starting address of memory when allocating memory to a new object
  • Eliminates the high cost of halving memory in the replication algorithm

4.3.3 shortcomings

  • From the point of view of efficiency, the mark – one sorting algorithm is lower than the copy algorithm.
  • When you move an object, you need to adjust the address of the reference if it is referenced by another object. You need to pause the user application throughout the movement. Namely: STW

5 subtotal

  • In terms of efficiency, the replication algorithm is the leader, but it wastes too much memory.
  • In order to take into account the three indicators mentioned above, the mark collation algorithm is relatively smoother, but its efficiency is not satisfactory. It has one more mark stage than the copy algorithm and one more memory collation stage than the mark clearing algorithm

6 generation collection algorithm

There is no best algorithm, only a more suitable algorithm

The generational collection algorithm is based on the fact that different objects have different life cycles. Therefore, objects with different life cycles can be collected in different ways to improve collection efficiency. Typically, the Java heap is divided into the new generation and the old generation, so that different collection algorithms can be used according to the characteristics of each generation to improve the efficiency of garbage collection.

In HotSpot, based on the concept of generation, the memory reclamation algorithm used by GC must combine the characteristics of both the young generation and the old generation.

  • Young Gen
    • Characteristics of the young generation: the area is smaller than that of the old generation, the object life cycle is short, the survival rate is low, and the recycling is frequent
    • In this case, the recovery of the replication algorithm is the fastest. The efficiency of the replication algorithm depends only on the size of the current living object, so it is suitable for young generation recycling. The problem of low memory utilization of the replication algorithm is alleviated by the two survivor designs in hotspot
  • Tenured Gen
    • Characteristics of the old age: large area, long life cycle of objects, high survival rate, recycling less frequent than the young generation
    • In this case, there are a large number of objects with high survival rates, and the replication algorithm becomes obviously unsuitable. It is usually implemented by tag – cleanup or a mixture of tag – cleanup and tag – cleanup

Take the CMS collector in HotSpot as an example. CMS is implemented based on Mark-sweep and has a high recycling efficiency for objects. For the fragmentation problem, CMS uses Serial Old collector based on Mark-Compact algorithm as a compensation measure: Serial 0LD is used to perform Full GC in the event of poor memory collection (due to a Concurrent Mode Failure caused by fragmentation) to defragment old memory

7 Incremental collection algorithm and partitioning algorithm

7.1 Incremental Collection Algorithm

The basic idea is to alternate garbage collection threads and application threads if processing all the garbage at once requires a long system pause. Each time, the garbage collection thread collects only a small area of memory space, and then switches to the application thread. Repeat until garbage collection is complete

In general, the incremental collection algorithm is still based on the traditional tag – one cleanup and copy algorithm. Incremental collection algorithms allow garbage collection threads to mark, clean, or copy in a phased manner by properly handling conflicts between threads

Disadvantages: Using this approach reduces system downtime due to intermittent application code execution during garbage collection. However, the overall cost of garbage collection increases due to the consumption of thread switches and context transitions, resulting in a decrease in system throughput

7.2 Partitioning Algorithm

  • To control the pauses generated by GC, a large memory area is divided into smaller chunks, and a reasonable number of cells are reclaimed each time, rather than the entire heap space, depending on the destination pause time, thereby reducing the time generated by a GC
  • The generation algorithm divides the object into two parts according to the life cycle length, and the partitioning algorithm divides the whole heap into continuous different cells
  • Each cell is used independently and recycled independently. The advantage of this algorithm is that it can control how many cells are recycled at a time

Note that these are just the basic algorithm ideas, the actual GC implementation process is much more complex, currently under development frontier GC are composite algorithms, and both parallel and concurrent.


JVM full directory

Class loading mechanism 3. Runtime data area [PC register, vm stack, local method stack] 4. Runtime data area [heap] 5. Runtime data area [method area] 6. Temporary absence 7. Runtime data area [instantiated memory layout and access location of objects, direct memory] 8. String constant pool 10. Garbage collection [overview, related algorithms] 11. Garbage collection [related concepts] 12. Common OOM 14. JDK command line tools