1. Reference counting and accessibility analysis

In heap memory, focus on recycling useless objects.

Reference Counting is a very simple method to determine whether an object is being used. It adds a reference counter to each object, which is incremented by one when the object is in external use. Similarly, the counter is subtracted by one when dereferenced. As long as the counter value is greater than zero, the object is available and cannot be reclaimed.

However, reference counting does not solve the problem of circular dependencies between objects, as follows:

UselessA a = new UselessA(); UselessB b = new UselessB(); a.setB(b); b.setA(a); a = null; b = null; // At this point, objects A and B are effectively useless, but they both have a counter value of 1 and cannot be marked as "recyclable"Copy the code

Therefore, reference counting is not used to manage memory in mainstream Java virtual machines. Instead, Reachability Analysis is used.

The reachability analysis refers to the search from a series of objects defined as GC Root as the starting point, and the search path is called the Reference Chain. If an object is not connected by any Reference Chain to the GC Root, the object is useless.

From the Reachability Analysis, it is theoretically important to define the GC Root.

Objects that can be fixed as GC Root in Java include:

  • Objects referenced in the virtual machine stack;
  • Objects referenced by static properties in the method area;
  • Objects referenced by constants in the method area;
  • Objects referenced by JNI in local methods;
  • Internal Java VIRTUAL machine references, such as Class objects corresponding to basic data and resident exception objects.
  • All objects held by the synchronization lock;
  • Jmxbeans, callbacks registered in JVMTI, local code caches that reflect the internal conditions of the Java virtual machine;

In addition to these fixed GC roots, it is possible to temporarily add objects. For example, when garbage collection is performed in the younger generation (see 3.1 generational collection theory for the concept), objects from the younger generation may be referenced by the older generation. At this point, the older objects need to be added to the GC Root (not necessarily in full).

2. Constant recovery and type uninstallation

The Java Virtual Machine Specification does not require that garbage collection be implemented in the method area. However, many applications now use frameworks such as dynamic proxy and CGLIB bytecode generation, which generate classes dynamically during the running of the application and send them to the virtual machine to load and use. In such a scenario, the virtual machine needs to be capable of type uninstallation to avoid excessive pressure on the method area.

There are only two types of objects that can be reclaimed from the method area: constants that are useless and types that are no longer used.

The process of constant recycling is basically the same as the process and conditions of recycling objects in the heap. However, the conditions for recycling classes (i.e. unloading classes) are more demanding. The following three conditions must be met:

  • All instances of the class (including derived instances) have been reclaimed;
  • The classloader that loaded the class has been reclaimed;
  • Which of theClassThe object is not referenced anywhere, and the methods of the class are not accessed anywhere through reflection;

In general, the cost performance of method area garbage collection is very low.

3. Reference type

Using “referenced” and “unreferenced” states is too absolute a basis for garbage collection. For example, we expect to keep the cache when there is enough available memory and reclaim the cache when there is not enough. It is not appropriate to use the “referenced” and “unreferenced” states as the conditions for cache reclamation.

Java extended the concept of reference after 1.2, which is divided into:

  • Strongly Reference: is the most traditional definition of a reference. When usingnewA strong reference is created by the keyword.The object is in a strong reference relationship六四事件GC Root六四事件If the object is reachable, the garbage collector will never reclaim it;
  • Soft Reference: Describes useful but not necessary objects, throughSoftReferenceClass to use.The system is about to happen六四事件OutOfMemoryError, all soft reference objects will be reclaimed. This occurs only if the memory space is still insufficient after the reclamationOutOfMemoryError* * * * * *);
  • Weak Reference: its strength ratioSoft referencesWeaker, passWeakReferenceClass to use.All weak reference objects are collected in the next garbage collection;
  • Phantom Reference: The weakest reference relationship, throughPhantomReferenceClass to use; Virtual references do not affect the lifetime of the object. The only thing it does isYou can receive a system notification when the object is reclaimed.

4. “dead” object declaration process

After the reachability analysis, some of the objects are declared dead, but during the declaration, the “dead” object has one more chance to rescue it.

Declaring an object “dead” goes through two tokens. An object is flagged for the first time when it is found to be unreachable from GC Root. After that, the virtual machine will determine whether the object has finalize() override method and the method has not been executed by the virtual machine. If the condition is met, the object will be added to a Queue called F-queue by the virtual machine, and finalize() method of the object will be executed with Finalizer thread of very low priority, but complete execution of the method is not guaranteed (to avoid the existence of very slow code or infinite loop in Finalize () method). Later, the virtual machine will mark the object in the F-queue a second time, and if the object fails to save itself (such as assigning this to a variable) in the Finalize () method, then it will actually be declared dead.

The Finalize () method is only called once by the virtual machine, that is, the object has only one chance to save itself.

Note: Finalize () method is officially deprecated, so don’t use it anymore.

5. Generational collection theory

Most modern garbage collectors are designed according to the “generational collection” theory. The theory of “generational collection” mainly contains three hypotheses:

  • If the generational hypothesis: the vast majority of objects are born and die overnight;
  • Strong generational hypothesis: the vast majority of objects are ephemeral;
  • Intergenerational citation hypothesis: intergenerational citation is very rare compared to intergenerational citation;

The Java virtual machine designed according to the “generational collection” theory generally divides the heap memory into two regions: Young Generation and Old Generation. A large number of objects die during each garbage collection, and the small number of objects that survive after each collection are gradually promoted to old age storage.

At the same time, the division of the heap led to the emergence of “Partical GC”. Partical GC contains the following categories:

  • Minor GC/Young GC: refers to garbage collection targeting only the new generation;
  • Major GC/Old GC: refers to the target is only the old garbage collection, currently onlyCMSThe garbage collector has a separate vintage collection behavior;
  • Mixed GC: refers to garbage collection aimed at collecting the entire new generation as well as parts of the old generationG1The collector has this behavior;
  • Full GC: Collect the entire Java heap and method area garbage collection;

6. Mark-clear algorithm

Is the earliest and most basic garbage collection algorithm. The algorithm is divided into two stages: “marking” and “clearing” : first, all the objects that need to be reclaimed are marked, and after the completion of marking, all the marked objects are reclaimed uniformly.

This algorithm has two major disadvantages:

  • Unstable performance: If the Java heap contains a large number of objects, most of which need to be collected, then a large number of marking and clearing actions must be performed, causing the efficiency of both marking and clearing processes to decrease as the number of objects increases.

  • Memory fragmentation will be generated: After marking and clearing, a large number of discontinuous memory fragments will be generated. Too much space fragmentation may cause that when large objects need to be allocated in the process of program running, enough continuous memory cannot be found, as follows:

3.3 Mark-copy algorithm

It is a kind of “half copy” garbage collection algorithm. It divides the available memory into two equal sized chunks by capacity and uses only one chunk at a time. When the memory of this piece is used up, the objects that are still alive are copied to another piece, and then the used memory space is cleaned up once again.

It produces a “neat” memory space, which can be allocated to new objects using a simple and efficient “Bump The Pointer” method. But it cuts the available memory in half, wasting a lot of storage space.

Most modern Java virtual machines use this algorithm to implement the new generation of garbage collection. According to IBM, 98% of those in the new generation don’t make it past the first round of collection. Therefore, the storage space of the new generation is not allocated in a 1:1 ratio.

The garbage collectors in the HotSpot virtual machine, such as Serial and ParNew, divide the new generation into a larger Eden zone and two smaller Survivor zones in an 8:1:1 ratio. Only Eden and one Survivor space are used in each allocation. In case of garbage collection, all surviving objects in Eden and Survivor space are copied to another Survivor space, and Eden and the used Survivor space are cleaned up directly.

HotSpot also has a mechanism known as the “escape door” : when the Survivor area is not large enough to hold objects that survive after a Minor GC, it relies on other (actually mostly older) memory areas for allocation guarantees.

3.4 Mark-collation algorithm

Its working process is consistent with the “mark-clear” algorithm, which is also divided into two steps: “mark” and “clear”. However, the next step is not to clean up the recyclable objects directly, but to make all the living objects move to one end of the memory space, and then directly clean up the memory outside the boundary, as shown in the figure below:

Moving living objects can produce a “neat” area of memory, but when a large number of objects are alive (such as in the old days of garbage collection), moving them and updating their corresponding references is a very burdensome operation.