Instead of managing memory manually, Java uses a garbage collector, which, in either case, does two things at its core:

  1. Find which objects are alive (still in use)
  2. Clears dead (no longer used) objects

Mark the survivable object:

Reference counting method

The most straightforward and easy to think of method is reference counting, which records the number of references to each object, if 0, it is dead. This method is simple to implement and efficient to judge, but it is difficult to solve the problem of circular reference between objects.

Accessibility analysis and calculation

The JVM uses a reachability analysis calculation to mark survivable objects, and GC defines some special objects as GC Roots:

  • Local variables and parameters in the stack frame
  • Active threads
  • A loaded static variable
  • JNI references

With GC Roots as the starting point, search continuously along the reference path, and mark the searched object as alive.

Note that during the tagging phase, you need to Stop the application thread (Stop the World), because there is no way to tag while the application keeps changing references. The pause time depends on the number of live objects, and the more live objects, the longer it takes to mark.

Clear dead Objects

Sweep and Compact

Academically, the tag clearing algorithm is the most representative one:

Marking: Start searching for marked reachable objects with GC Roots

Sweeping: Makes memory Spaces occupied by unmarked objects available for later use

But there are two problems with this direct Sweep:

  • Writing operations require finding available blocks, which can be more time-consuming
  • Too much space fragmentation can result in insufficient contiguous memory for allocating large objects

To avoid this problem, one more defragment is needed

Copy

A simpler approach is to split the memory in two, use only one region at a time, and when GC occurs, copy the surviving objects to the other region without fragmentation problems. Furthermore, replication can be done simultaneously with markup, which is more efficient. The disadvantages are also obvious, requiring more memory. This algorithm is called the mark-copy algorithm.

Generational hypothesis

The researchers observed that most assignments within an application fall into two categories:

  • Most objects are not used soon after they are created
  • Some objects live for a long time

Based on this assumption, the virtual machines split their memory into two generations, the Young Generation and the Old Generation or Tenured Generation.

In view of the characteristics of different generations, algorithm optimization can be targeted. Generally speaking, algorithms can be divided into Mionr GC (only recycling new generation objects) and Full GC (global recycling). There are also two problems with this assumption:

  • There may be references to objects between two generations. Even if only Mionr GC is performed, it is necessary to scan old age objects to check whether there are old age objects referencing new generation objects, which violates the original intention of generation
  • The generational assumption may not work for some applications. Because the GC algorithm is optimized for objects that die quickly and live for long periods of time, the JVM does not perform well for objects with “medium” life

Memory division

Typically Eden is the area allocated when the object is created. Eden is allocated one or more Thread Local Allocation Buffers (TLabs) because it involves multiple threads creating objects at the same time. In short, each Thread is allocated an area for its own Thread to allocate objects (avoiding Thread synchronization costs). If the allocated memory is insufficient, the shared portion is used (applying for a new TLAB), and if it is insufficient, a Minor GC is triggered. If the cleared memory is still insufficient, the object is allocated to the old age.

During Mionr GC, all surviving objects are scanned and marked by GC Roots first. It should be noted that as mentioned before, old objects may also reference new generation objects. For this problem, the JVM uses card-marking to avoid old-time scans. HotSpot uses the Card Table technique to divide the heap into 512-byte cards, which are dirty if objects in the Card may refer to a new generation of objects, and the JVM maintains a Table of cards, each with a corresponding identifier to indicate whether the Card is dirty. Instead of scanning the entire age, you can simply insert objects from dirty cards into GC Roots for Minor GC.

After the tagging is complete, all surviving objects are copied to one of the Survivor zones, after which the entire Eden zone memory can be reused. This algorithm is also called the “Mark and Copy” algorithm. Survivor is divided into from and TO regions (identity swap after each GC). The TO region is always empty. After GC completes the marking, the surviving objects of Eden and FROM region are copied to the TO region, the FROM region is cleared, and the identities of the two regions are swapped.

Objects may replicate back and forth between two survivors, and when they replicate a certain number of times (15 by default), they are considered old enough to be promoted to the old age. In addition, if the Survivor region is not large enough to hold all the surviving objects, the older objects are promoted to the older age.

In the old days there was much more memory, and most objects were not garbage, and GC occurred much less frequently, so the replication algorithm was not suitable. In general, the old age is recycled using a “mark-purge-tidy” algorithm.

For an example of how a JVM implements a specific collector using these algorithms, see this easy-to-understand GC implementation in a JVM

Plumbr Handbook Java Garbage Collection