After looking at how a virtual machine determines that an object is recyclable, let’s look at some of the theories and algorithms for Java virtual machine garbage collection.

1. Generational collection theory

Generation collection theory is a set of rules of thumb based on the relationship between the number of program running objects alive and the age of objects.

It is based on two generational hypotheses:

  • The Weak Generational Hypothesis: the overwhelming majority of subjects were born and died.
  • The Strong Generational Hypothesis: the more times an object survives a garbage collection, the less likely it is to die.

Summary in popular words: most stains are easy to wipe clean, repeatedly wipe did not wipe clean responsibility more and more difficult to wipe clean.

Based on this theory, the collector divides the Java heap into different regions and then allocates the reclaimed objects to different regions for storage by age.

Specifically, the Java heap is divided into two regions: Young Generation and Old Generation. The Young Generation stores objects with short lifetimes, and the small number of objects that survive after each collection are gradually promoted to the Old Generation.

For the new generation of objects, you can focus on keeping a few alive rather than marking the large number of objects that will be recycled.

For older generations, you can reduce the frequency of garbage collection and pay more attention to dying objects.

In order to reduce the cost of garbage collection, different garbage collection algorithms are adopted in the new generation and the old generation.

Based on generation, some types of garbage collection are generated:

  • Partial GC: Garbage collection that does not aim to collect the entire Java heap.
    • Minor /Young GC: Garbage collection that targets only the new generation.
    • Major GC: Garbage collection that targets only Old GC. Currently, only the CMS collector has the behavior of collecting older generations separately.
    • Mixed GC: Garbage collection that aims to collect the entire new generation and parts of the old generation. Currently, only the G1 collector has this behavior.
  • Full GC: Collects the entire Java heap and method area garbage collection.

Garbage collection algorithms

2.1. Mark-clear algorithm

As the name suggests, the Mark-sweep algorithm is divided into two phases:

  • Mark: Marks all objects that need to be reclaimed
  • Clear: Reclaims all marked objects

The mark-clear algorithm is relatively basic, but it has two major shortcomings:

  • The execution efficiency is not stable. If the Java heap contains a large number of objects, and most of them need to be recycled, then a large number of marking and cleaning actions must be performed, resulting in the execution efficiency of both marking and cleaning process decreases as the number of objects increases.
  • Fragmentation of memory space. After marking and clearing, a large number of discontinuous memory fragments will be generated. Too much space fragmentation may cause that when large objects need to be allocated during the program running, enough contiguous memory cannot be found and another garbage collection action has to be triggered in advance.

The mark-sweep algorithm is mainly used for older generations, where there are fewer objects to recycle.

2.2. Mark-copy algorithm

The mark-copy algorithm solves the problem that the mark-clear algorithm is inefficient when it faces a large number of recyclable objects.

The process is also simple: divide the available memory into two equally sized pieces by capacity and use only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.

This algorithm has an obvious disadvantage: part of the space is not used, there is a waste of space.

This algorithm is mainly used for Cenozoic garbage collection because there are fewer living objects in Cenozoic garbage collection and only a few living objects are copied at a time.

Generally, virtual machines do not use 1:1 partitioning. Take HotSpot as an example. The HotSpot VIRTUAL machine divides its memory into one large Eden space and two small Survivor Spaces. When garbage collection occurs, the surviving objects in Eden and Survivor are copied to another Survivor space at once, and Eden and the used Survivor space are cleaned up directly. The default Eden to Survivor size ratio is 8:1.

2.3. Mark-finishing algorithm

In order to reduce the memory consumption, a targeted algorithm is introduced: Mark-compact algorithm.

The marking process is still the same as the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects to one end of the memory space, and then clean up the memory directly beyond the boundary.

The mark-unscramble algorithm is mainly used in The old age, where moving objects is a huge burden in areas where a large number of objects live, and such object moving operations must be stopped The World at all times.


Reference:

[1] : Zhou Zhipeng, In Depth Understanding the Java Virtual Machine: Advanced JVM Features and Best Practices

[2] : Java Virtual Machine series 2: Garbage collection mechanism details, GIF to help you understand

[3] : Geek Time in-depth Dismantling of The Java Virtual Machine