Daily sentence

A hard-won opportunity makes all the animals do something they didn’t like to do before — Natsume Natsume

If the profile

The advent of GC frees programmers from the need to manually reclaim memory, but we also need to understand GC, know yourself, know your enemy, a hundred battles will not be dangerous.

background

Common GC collection algorithms mainly include reference counting algorithm, reachability analysis method, tag clearing algorithm, tag copy algorithm, tag compression algorithm, generation algorithm and partition algorithm.

Among them, reference counting method and reachability analysis method are used to determine whether an object can be reclaimed, and other algorithms are used to perform GC.

Talk about the tag clearing algorithm, the copy algorithm, the tag compression algorithm, the generation algorithm, mainly introduces the generation algorithm. For reference counting method and accessibility analysis method, go to: Reference counting method and accessibility analysis method


Mark copy algorithm

The tag copy algorithm splits the memory space into two pieces, using only one piece at a time. The replication algorithm also uses reachability analysis to mark garbage removal objects. When GC is executed, the surviving objects are copied to another memory space with continuity, and then the previously used memory space is directly emptied. And so on and so forth. Let’s just call these two areas of memory the FROM and to areas.

Mark phase

As shown in the figure below, R1 and R2 serve as GC Root objects and mark the objects except yellow as garbage objects after reachability analysis.

Copy the stage

The replication process is as follows: GC copies the five live objects to the TO region and ensures continuity in the memory space of the TO region.

Finally, junk objects in the FROM section are removed.

In summary, the algorithm is very efficient in the case of less inventory objects and more garbage objects. The advantage is that there is no memory fragmentation, but the disadvantage is obvious: you lose half of your available memory. (JVM algorithms only waste 10 percent)

Mark clearing algorithm

Tag cleanup is the basis of the current GC algorithm, and no GC seems to use it anymore. Because this algorithm will generate a lot of memory fragmentation.

The execution process of the mark clearing algorithm is divided into two stages: mark stage and clear stage.Copy the code
  • The marking phase marks active objects through reachability analysis.
  • The cleanup phase removes the garbage objects marked by the markup phase.

The marking stage is shown in the figure:

  • In the Java heap, yellow objects are unreachable and are marked during the marking phase.

The recycling algorithm is implemented below, as shown in the figure after execution:

From the figure above, we can clearly see the defect of this algorithm. After reclamation, a large number of discontinuous memory space will be generated, namely memory fragmentation. Since Java typically allocates memory on a continuous basis, memory is wasted when there is not enough fragmented space to allocate to new objects.

Mark compression algorithm

The tag compression algorithm can solve the memory fragmentation problem of the tag clearing algorithm.Copy the code

Its algorithm can be regarded as three steps:

  1. Mark garbage objects
  2. Clean up garbage objects
  3. Memory defragmentation

The process is as follows:

First mark the garbage removal object (yellow)

Clean up garbage objects

Memory defragmentation


Generational algorithm

Generation algorithm is based on copy algorithm and mark compression algorithm.

  • First of all, the tag clearing algorithm, the copy algorithm and the tag compression algorithm all have their own disadvantages, and if one of them is used for GC alone, it will be a big problem.
  • For example, the tag clearing algorithm will generate a large amount of memory fragmentation, the copy algorithm will lose half of the memory, and the defragmentation of the tag compression algorithm will cause a large consumption.
  • Secondly, both copy algorithm and tag compression algorithm have their own suitable application scenarios. The replication algorithm is applicable to the scenario where there are few viable objects in each reclamation, which reduces the amount of replication. The tag compression algorithm is suitable for scenarios where there are many living objects during the collection, which will reduce the generation of memory fragmentation and the cost of defragmentation will be much smaller.

The generation algorithm divides the memory region into two parts: new generation and old generation.

Different GC algorithms are used according to the different characteristics of objects in Cenozoic and old age.Copy the code
  • New generation objects can be recycled as soon as they are created (for example, objects created in the virtual machine stack are destroyed when methods are out of the stack). That is, most garbage objects are collected each time, so the new generation is suitable for copying algorithms.
  • The characteristics of the old age are: after many GC, they still survive. That is, most of the objects are alive each time GC is performed, so the old days are good for tag compression algorithms.

The new generation is divided into Eden area, FROM area and to area, while the old generation is a whole memory space, as shown below:

Generation algorithm execution process

First, a brief description of the whole process of the New generation GC (the old GC will be introduced below) : New objects are always born in Eden, and when Eden (reaches the threshold), Minor GC is triggered, which copies live objects in Eden to an unused space in from and to, assuming to (live objects in from being used are also copied to TO).

There are several situations in which objects can be promoted to the old age:

  1. (large object directly into the old s parameters by the virtual machine – XX: PretenureSizeThreshold parameter, the default value is 0, that is not open, the unit is Byte, for example: 3145728 = 3 m, less than 3 m’s object, will direct promotion old s)
  2. If the to area is full, the extra object will be promoted directly to the old age
  3. After 15 copies (15 years old), the surviving object will also enter the old age

In this case, Eden and FROM are garbage objects and can be cleared directly.

PS: Why is it that after being copied 15 times (15 years old), it is judged to be old and promoted to the old age? Since the age of each object is stored in the object header, the object header stores this age in 4bits, which can represent a maximum of 15 in decimal notation, so it is 15.

The following describes the GC process, starting with JVM startup.

After the JVM has just started and initialized, several blocks of memory have been allocated and the state is as shown in the figure above.Copy the code

A newly created object will always be born in Eden

When Eden is full, a Minor GC will be triggered, which will find an unused space from and to, and copy the surviving objects in Eden (the first time from and to are empty, use from, and then use TO each time to store the transferred data objects). The age of the copied object is +1, and garbage objects in Eden are cleared.

The program continues to run, creating a new object in Eden, and a large object.

When Eden fills up again, a Minor GC will be triggered again. This time, objects in Eden and from will be copied to to, and objects with an age of +1 will be promoted directly to the old age, and objects in to will be promoted directly to the old age.

The program continues, assuming that after 15 copies an object is still alive, it will go straight to the old age.

Old age Full GC

Before performing a Minor GC, the JVM also checks whether the total memory used by all objects in the new generation is less than the maximum remaining continuous memory in the old generation. If this condition is true, then the Minor GC is safe, because even if all objects in the new generation enter the old age, the old age will not run out of memory.

If this condition is not true, the JVM checks whether HandlePromotionFailure[1] is enabled (by default after JDK1.6). If this is not enabled, it indicates that there is a risk of memory overflow after Minor GC, and a Full GC is performed. The JVM also checks to see if the average size of the generation objects is smaller than the maximum contiguous memory space of the generation, and if so, attempts Minor GC directly, and if not, Full GC is performed.

The surviving objects in Eden and FROM will be copied to to, and the super-large objects and objects that cannot fit into to will be promoted to the old age. When Eden is full, the Minor GC is triggered, and it determines that the remaining continuous memory of the old generation is less than the sum of memory occupied by all objects of the new generation. Assuming HandlePromotionFailure is enabled, the JVM continues to determine whether the remaining continuous memory of the old generation is greater than the average size of objects of the previous generation. As shown in the figure, there are currently two Spaces left in the old age. If the remaining space is less than the average of three objects promoted to the old age, Full GC will be triggered.

Old age Recycling – Mark:

Old age recycling – removal:

Old age Recycling – Defragmentation:

Problems with the Minor GC

The problem with Minor GC is that new generation objects can be referenced by older generations in a way that reachable analysis cannot, but new generation objects in this case should not be collected.

The HotSpot VIRTUAL machine provides a solution: card tables.Copy the code

This method divides the old memory evenly into a number of cards, each containing a subset of objects, and then maintains a card table. The card table is an array. Each element points to a card and identifies whether there are objects in the card that point to the new generation. In this way, the Minor GC only needs to scan cards identified as 1 in the card table, greatly increasing efficiency.

The card table is as follows: