Garbage collection needs to accomplish three things:

  1. What memory needs to be reclaimed?

  2. When is it recycled?

  3. How to recycle?

Which memory needs to be reclaimed

What is a quote

Prior to JDK1.2, the Java definition of a reference was that if the data stored in a reference type represented the starting address of another chunk of memory, the chunk represented a reference. Disadvantages: the definition is too narrow, for how to describe some “tasteless to eat, pity to abandon” the object is powerless.

After JDK1.2, Java divides references into strong, soft, weak, and virtual references.

  • Strong references A strong reference is a reference to something like “Object obj = new Object()” that is common in program code. As long as a strong reference exists, the garbage collector will never reclaim the referenced Object.

  • Soft references Soft references are used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before the system is about to run out of memory.

  • Weak references Weak references are also used to describe non-essential objects, but they are weaker than soft references, and objects associated with weak references only survive until the next garbage collection occurs.

  • Phantom reference

    Virtual references, also known as ghost references or phantom references, are the weakest type of reference relationship. The existence of a virtual reference does not affect the lifetime of an object, nor can it be used to obtain an instance of an object. The sole purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector.

Reference counting method

Reference counting is simply adding a reference counter to an object. Whenever a reference is made to it, the counter is incremented by 1. When the reference is invalid, the value of the counter is decreased by 1. ** Advantages: ** simple implementation, high judgment efficiency, in most cases when a good algorithm. ** Disadvantages: ** is difficult to solve the problem of objects referring to each other circularly. Therefore, reference counting is not used to manage memory in mainstream Java virtual machines.

Accessibility analysis algorithm

The basic idea of the algorithm is to search down from a series of objects called “GC Roots” as the starting point. The path of the search is called the reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is not available. The diagram below:

Object 5, Object 6, and Object 7 are related to each other, but they are not reachable to GC Roots, so they will be judged as recyclable objects.

In Java, objects that can be used as GC Roots include the following:

  • Objects referenced in the virtual machine stack (the local variable table in the stack frame);

  • The object referenced by the class static attribute in the method area;

  • The object referenced by the constant in the method area;

  • Objects referenced by JNI (Native methods in general) in the Native method stack.

Recovery method area

Reclaim the method area, or persistent generation, in the HotSpot VIRTUAL machine. The garbage collection of the permanent generation mainly recycles two parts: discarded constants and useless classes. Recycling deprecated constants is very similar to recycling objects in the Java heap. For example, a String “ABC” is already in the constant pool, but there is no String named “ABC” in the system. That is, no String refers to the “ABC” constant in the constant pool, and there is no other reference to the literal. If memory reclamation occurs, and if necessary, The “ABC” constant is cleaned out of the constant pool by the system. To determine if a class is “useless”, three conditions need to be met:

  1. All instances of the class have been reclaimed, that is, there are no instances of the class in the Java heap;

  2. The ClassLoader that loaded the class has been reclaimed;

  3. The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.

How to recycle

Mark-clean algorithm

The algorithm is divided into two stages: “mark” and “clear”. All marked objects are marked first, and all marked objects are recycled after marking is complete. The schematic diagram is as follows:

The algorithm has two main shortcomings:

  1. Efficiency problem, the efficiency of both marking and clearing is not high;

  2. Space problem. A large number of discontinuous memory fragments will be generated after the mark is cleared. Too much space fragment may cause that when the program needs to allocate large objects in the future, it cannot find enough contiguous memory and has to trigger another garbage collection action in advance.

Replication algorithm

The replication algorithm divides the available memory into two equally sized pieces by capacity and uses only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again. The schematic diagram is as follows:

The replication algorithm performs memory reclamation for the whole half region every time, and does not consider the complicated situation of memory fragmentation when allocating memory. As long as the heap top pointer is moved, the memory can be allocated in order, which is simple to implement and efficient to run. However, when the object survival rate is high, more replication operations need to be performed, and the efficiency will be low. In addition, this algorithm is reduced to half of the original memory, which is a bit too high a price.

Mark-collation algorithm

The marking process is still the same as the mark-clean algorithm, but instead of cleaning up the reclaimable objects directly, the next step is to move all surviving objects towards one end and then clean up memory directly beyond the end boundary. The schematic diagram is as follows:

Generational collection algorithm

The generation collection algorithm generally divides the Java heap into the new generation and the old generation, and then adopts the most appropriate collection algorithm according to the characteristics of each generation. In the new generation, a large number of objects are found dead and only a few survive in garbage collection, so the replication algorithm is selected, and only a small amount of the replication cost of the surviving objects can be collected. In the old age, because of the high survival rate of the object, there is no extra space for it to allocate guarantee, so it must use “mark-clean” or “mark-tidy” algorithm to recycle.