Various GC algorithms

preface

  • Following up, I’ve already introduced the names of some common GC algorithms.

Mark-sweep algorithm

The algorithm is divided into two phases

  • The algorithm is divided into two phases of “Mark” and “Sweep”. First, all objects that need to be reclaimed are marked, and then all objects that need to be reclaimed are reclaimed.
  • disadvantages
    • Efficiency problem. Both marking and cleaning processes are inefficient and require scanning all objects.
    • Space problem: a large number of discontinuous memory fragments will be generated after mark clearing. Too much space fragment may lead to failure to find enough continuous memory in subsequent use and trigger another garbage collection action in advance. The larger the heap, the more GC times, the more severe the fragmentation.

Copying collection algorithms

Divide the memory into two pieces

  • Divide the available memory into two pieces and use only one of them at a time. When half memory is used up, only the surviving objects are copied onto the other piece and then the entire memory space is cleaned up at once.
  • In this way, each memory reclamation is for the whole half of the reclamation, memory allocation does not have to consider the complexity of memory fragmentation, as long as the heap top pointer is moved, in order to allocate memory, simple implementation, efficient operation.
  • Disadvantages:
    • But the cost of this algorithm is to reduce memory by half, which is expensive.
    • The efficiency of the copy-collection algorithm decreases when the object survival rate is high.
  • This collection algorithm is now used in commercial virtual machines.
  • Divide the memory into a large Eden space and two small survivor Spaces, use Eden and one survivor each time, and copy Eden and survivor surviving objects to another survivor space at a time when recycling. Then clean up Eden and used Survivor.
  • By default, the Oracle Hotspot virtual machine has an 8:1 Eden to survivor ratio, meaning that only 10% of memory is “wasted” at any one time.
  • If you don’t want to waste 50% of the space, you need to allocate extra space for the extreme case where all objects in half memory are 100% alive, so in the old days, you can’t use this algorithm directly.

  • You only need to scan live objects, which is more efficient and does not generate fragmentation.
  • Additional memory needs to be wasted as replication area.
  • The replication algorithm is ideal for objects with short life cycles because the GC always recycles most of the objects and the replication overhead is low
  • According to IBM’s research, 98% of Java objects last only one GC cycle (disposable objects) and are good candidates for replication algorithms. And there is no 1:1 division of workspace and replication space.

Mark-compact algorithm

Move live Object

  • The copy-collection algorithm becomes less efficient when the object survival rate is high. More importantly, if you do not want to waste 50% of the space, you need to have extra space for allocation guarantee, so the old age generally can not directly choose this algorithm.
  • Typical of the old days, a mark-compact algorithm was proposed, in which the marking process remained the same as the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step was to move all surviving objects to one end and then clean up the memory beyond the end boundary.
  • No memory fragmentation.
  • It takes more time to compact than Mark-sweep.
  • After moving.

Generational Collecting algorithms

Mainstream GC algorithms

  • Current garbage collection on commercial virtual machines is done using a “Generational collection” algorithm
  • Memory is divided into blocks based on the lifetime of objects.
  • Generally, the Java heap is divided intoCenozoic and old ageSubstitute, that will doUse the most appropriate collection algorithm according to the characteristics of each generationFor example, if a large number of objects die and only a few survive in each GC of the new generation, the replication algorithm can be used to complete the collection only by paying a small amount of the replication cost of the surviving objects.
  • Based on the advantages and disadvantages of the previous GC algorithms, different GC algorithms are adopted for objects with different life cycles.
    • Hotspot JVM divides into three generations: Young Generation Old Generation and Mate Space


  • Young generation (Young Generation)
    • Newly generated objects are placed in the new generation. The young generation uses a replication algorithm for GC(theoretically young objects have a very short lifetime, so are suitable for replication algorithms)
    • The younger generation is divided into three sections. One Eden zone and two Survivor zones (generally called S0 and S1, the number of survivors can be set by parameter). Objects are generated in the Eden area. When Eden is full, surviving objects are copied to a Survivor (s0) zone, and when a Survivor zone is full, surviving objects in this zone are copied to another Survivor (S1) zone.
    • When the second Survivor zone is also full, the surviving objects copied from the first Survivor zone are copied to the old age.
    • The 2 survivors are perfectly symmetrical, and the default ratio for alternate replacements of Eden and 2 survivors is 8:1:1, which means 10% of the space will be wasted. You can adjust the size ratio based on GC log information.
  • (the old sOld Generation)
    • Holds objects that survived one or more GC.
    • Generally, THE MARk-sweep or Mark-Compact algorithm is used for GC.
    • If you have multiple garbage collectors to choose from. Each garbage collector can be considered a concrete implementation of a GC algorithm. You can choose the appropriate garbage collector based on your application’s needs (throughput? Looking for the shortest response time? .

Memory recovery

Target to recycle

  • What the GC does is reclaim memory from dead objects.
  • Hotspot considers objects that are not referenced to be dead.
  • Hotspot divides references into four categories :Strong, Soft, Weak, and Phantom:
    • Strong is a reference to which Object o=new Object() is assigned by default.
    • Soft, Weak and Phantom all inherit Reference.
  • References of the Reference type are treated specially in Full GC:
    • Soft: Memory insufficient will be GC, long-term use will also be GC.
    • Weak: Must be GC and will be notified in ReferenceQueue when marked as dead.
    • Phantom: unreferenced, notified when released from JVM heap.

The timing of the GC

When GC is triggered

  • Based on the generation model, GC is split into two exploiture types: Scavenge GC and Full GC
  • Scavenge GC (Minor GC)
    • Trigger time:
      • When a new object is generated,
      • The Eden space is full
    • Theoretically, most objects in the Eden exploiture will be recycled on Scavenge, and the replication algorithm can be executed efficiently and Scavenge in a relatively short time.
  • Full GC
    • The entire JVM memory is collated, including Young, Old, and Perm (Matespace).
    • Main trigger times:
      • Old full
      • Perm is full, there is no permanent generation. In theory, Matespace has no upper limit. If a threshold is set, it will be triggered when the threshold is reached.
      • system.gc()
      • Very inefficient, minimize Full GC.