background

There is an important concept in Java that everything is an object. The so-called object is the reality of the things out of the abstract, and then through inheritance, implementation and combination of all things to accommodate everything, so understanding the concept of object in learning Java(including all object-oriented languages) is crucial.

When we need to use an object in a program, it is grandpa, and we have to create it even with reflection; It’s garbage when we don’t need it, and even if it can escape the new generation, it will still hunt you down in the old age.

Today, we’re going to look at how the JVM handles garbage (objects). Before we do that, let’s ask ourselves a few questions:

  • What are the phases of an object’s life cycle?

  • How do YOU tell if an object has become garbage?

  • How is garbage marked?

  • How are garbage (objects) collected and what are the strategies?

1. The lifetime of the object

Normally, as a Java developer, we can go rogue with new objects without having to manage their lifetime because the JVM does the cleaning for us. However, as an advanced Coder, we still have to seriously understand. Let’s take a look at the life cycle of the object.

1.1 Macro Perspective

From a macro point of view, the life cycle of an object can be: object creation > object usage > object recycling.

  • Object creation

Object creation can take the form of new instructions, deserialization, reflection, etc. This step is mainly to allocate memory for the object and initialize it.

  • Use of objects

Objects are used to locate and access objects in the heap by referencing objects in the JVM stack, often using handles and direct Pointers.

  • Object collection

Object collection is garbage collection, which is explained below

Just looking at these three chunks is a little unclear so it’s too rough, but let’s look at the life cycle of the object from a microscopic point of view.

1.2 Microscopic Perspective

From a microscopic perspective, the life cycle of objects can be roughly divided into seven stages: creation stage, application stage, invisible stage, unreachable stage, collectable stage, end stage and object space redistribution stage.

  • Create a stage

The object creation phase consists of allocating memory for the object, starting object construction, and completing the initialization of static members. Once an object is created and assigned to some variable, the state of the object is switched to the application phase.

  • Application stage

The application phase is when the object is associated with at least one strong reference. (Don’t panic, the concept of strong citation is explained below.)

  • Invisible stage

When an object is in the invisible phase, the program no longer holds any strong references to the object, but those references may still exist. In general, the program is executing beyond its scope.

boolean flag= false;

if(flag){

flag = 0;

num++;

}
System.out.println(num);
Copy the code

In the above program, the local variable num is considered invisible when system.out.println (num) is out of scope.

  • Inaccessible stage

An object in the unreachable phase means that the object is no longer held by any strong references, but it may still be held by some loaded static variable or thread under a system such as the JVM or by strong references such as the JNI. These special strong references are referred to as “GC root”. The presence of these GC roots causes the object to leak memory and cannot be collected.

  • Collectable stage

An object enters the “collection phase” when the garbage collector finds that the object is in the “unreachable phase” and the garbage collector is ready to allocate the object’s memory space again.

  • End stage

When an object is still in the unreachable state after finalize method is run, the object enters the finalization stage. This phase waits for the garbage collector to reclaim the object space.

  • Object space redistribution phase

Object space allocation stage again, garbage collector to the object occupied memory space for recycling or redistribution, then the object completely disappeared, called “object space allocation stage again”.

There are a lot of things on it, and you just want to know when the object becomes garbage and the garbage (object) is removed.

In fact, these phases follow the entire process of object creation, object use, object invalidation, object being marked as garbage, and object collection. In order to meet your requirements, we will focus on garbage (objects). However, before we focus on the problem, we will first understand the concept of object references, because it is very helpful for objects to become garbage!

1.3 Object Reference

Since JDK1.2, Java designers have divided references to objects into four categories: strong, soft, weak, and virtual references.

  • Strong reference

A strong reference indicates that an object is in a useful, necessary state, and is the most commonly used reference. If an object has a strong reference, the garbage collector will never reclaim it. Even if the Java virtual machine runs out of memory, it would rather throw outofMemoryErrors to abort the program than reclaim objects with strong references to resolve the memory problem.

Student student = new Student(); // This is a strong reference
Copy the code
  • Soft references

Soft references indicate that an object is in a useful, but not required, state. If there is enough memory, the garbage collector will not reclaim an object if it has only soft references, but if there is not enough memory, the garbage collector will reclaim the object (before the OutOfMemoryError). As long as the garbage collector does not collect it, the object can be used by the program.

Soft references are used to implement memory sensitive caches, such as web page caches, image caches, etc.

Student student = new Student();
SoftReference softReference = new SoftReference(student);
Copy the code
  • A weak reference

Weak references indicate that an object is in a state that may be useful, but not necessary. Similar to soft references, but weaker than soft references: objects that only have weak references have a shorter lifetime. The GC thread reclaims objects associated with weak references as soon as it finds objects that have only weak references while scanning the memory area under its control. That is, GC reclaims objects associated with weak references regardless of whether memory is currently scarce. However, because the GC is a low-priority thread, objects that only have weak references are not necessarily found quickly.

The classic utility case is listed in java.langthreadLocal

  • Phantom reference

A virtual reference indicates that an object is in the “useless” state, meaning that a virtual reference is equivalent to no reference and can be reclaimed by the GC at any time. Virtual references are set up so that objects associated with virtual references receive a system notification (used to track the activity of objects collected by the GC) when they are collected by the garbage collector.

ReferenceQueue referenceQueue = new ReferenceQueue();
PhantomReference phantomReference = newPhantomReference (object, queue);Copy the code

2. Garbage (object) judgment

The first step in garbage collection is to determine whether an object is garbage. In fact, it is not up to us programmers to decide whether an object is garbage or not. More specifically, it is not up to us Java Developers to care about it. If you are really clean or obsessive, you can call GC.

In fact, garbage collection has two main algorithms: reference counting algorithm and reachability analysis algorithm

2.1 Reference Counting Algorithm

The reference counting algorithm is a very old algorithm that has been deprecated in many Java versions. As a learner, it is necessary to know more about it.

Definition: a reference counting algorithm adds a reference counter to an object. When the object is referenced, the count is incremented by one. When a reference is invalidated, the counter is decayed by one, and when the reference count reaches zero, the object is invalidated, becomes garbage, and the JVM begins to reclaim it.

The reference counting algorithm is easy to understand by definition, and it is an algorithm with obvious advantages and disadvantages, so let’s move on.

  • advantages

① The principle of reference counting algorithm is simple, and the real-time performance is strong, when the reference counter is 0, the JVM can directly reclaim it.

② The reference counter only works on a single object, that is, when the JVM scans, it only scans that object, not all objects along the reference.

  • disadvantages

① Every time an object is referenced, it takes some time to update the reference counter.

(2) Reference loops occur where object A refers to object B, and object B refers to object A. Since A and B refer to each other, counting them is no longer needed, and they are not garbage collected by the JVM.

To solve the problem of reference counting algorithms, good Java developers have proposed another algorithm: the reachability analysis algorithm.

2.2 Accessibility analysis algorithm

The idea of the accessibility analysis algorithm is to search through a series of “GC Roots” objects as the starting point. If there is no reachable path between “GC Roots” and an object, the object is said to be unreachable and the object can be recycled.

The accessibility analysis algorithm is shown in the figure below:

Different from the reference counting algorithm, the reference counting algorithm determines whether the object is dead, while the reachability analysis algorithm analyzes whether the object is alive. The reachability analysis algorithm can effectively solve the circular reference problem in the reference counting algorithm. Reachability analysis algorithm is the main algorithm to judge whether the object is garbage.

3. Garbage marking

We have already shown how to determine whether an object is garbage using reference counting and reachability analysis. Your friend must have noticed that these two algorithms both determine whether an object is garbage and mark it at the same time.

Yes, to be more precise: if an object has no reference, then the object is garbage. The counter in reference counting algorithm and the reference chain in reachability analysis algorithm are both ways of marking objects.

We also mentioned above that due to the problem of circular reference in reference counting algorithm, the existing mainstream marking algorithm is reachability analysis algorithm. Next, we will analyze in detail the process of reachability analysis algorithm marking garbage objects.

In Java, objects that can be used as GC Roots usually include the following:

  • Object referenced in the virtual machine stack
  • Objects referenced by static properties in the method area
  • The object referenced by the constant in the method area
  • Objects referenced in Native methods

In the reachability analysis algorithm, even if there is an unreachable object, the object is not necessarily dead, an object is really dead, it has to go through the process of marking twice.

Marking process analysis:

When objects are analyzed using the reachabability analysis algorithm, if some objects are found to be unreachable from GC Root chain, then the objects will be marked for the first time and then filtered. The filtering condition is to determine whether it is necessary to implement finalize() method (which is available for every object by default). However, if the object does not overwrite the Finalize () method or if the Object’s Finalize method has already been called by the virtual machine once, it will be regarded as “not necessary to execute” and the garbage collector can directly collect it.

If the object is determined to be necessary to finalize(), the virtual machine will put the object ina queue, and then a special Finalizer thread will execute the Finalize () method of the object. If some object is reassociated with any object on the reference chain at this time, it will be moved to the queue on the second markup.

4. Recycling

Memory is a valuable resource, and garbage objects that are not needed should be ejected as soon as possible. The garbage (object) determination and garbage marking steps explained above are paving the way for garbage collection.

In the article JVM Memory, we analyze the runtime data area of the JVM in detail for those unfamiliar with it.

4.1 Analysis of heap layout structure

In fact, the garbage collection is in the heap of the runtime data area. We all know that objects live in cycles, and if an object is not referenced, it is considered garbage. Since each object lives for a different time, to reduce the garbage scanning time and frequency of the GC thread, we can place objects that live longer in a separate area. Thus, the layout of the heap is determined. In general, the heap is divided into two parts: Cenozoic and old age.

The ratio of the Cenozoic era to the old era is 1:2, which is not unique. We can specify it according to the specific scene by parameter — XX:NewRatio. If the division is more fine-grained, the Cenozoic era can be divided into Eden area and Survivor area. The Survivor zone can be divided into FromSurvivor and ToSurvivor, and the default ratio is 8:1:1

The question then arises, why split Survivor into two equally sized Spaces? Good question, I’ll start with the answer. The two parts are mainly to solve the problem of memory fragmentation. If memory fragmentation is severe, that is, two objects occupy not contiguous memory, and there is not enough contiguous memory for new objects, garbage collection (GC) will be triggered.

4.2 Garbage collection algorithm

In the above paper, we analyzed the layout structure of the heap in the runtime data area. According To the life cycle of the object, the heap was divided into new generation and old generation, and the new generation was subdivided into three regions: Eden,From Survivor, and To Surviver, with a ratio of 8:1:1. Understanding the heap layout structure is very helpful in understanding the flow of garbage collection algorithms in the JVM. Why do you say that? Because garbage objects are primarily in the heap, and because the heap is divided into different partitions, the garbage collection algorithm is different depending on the nature of each partition.

In the following study, we will first learn the commonly used garbage collection algorithm, and finally choose the appropriate garbage collection algorithm according to the characteristics of different partitioned memory in the heap.

Generally, the commonly used garbage collection algorithms are mark-clean algorithm, tag copy algorithm, tag collation algorithm and generational collection algorithm.

Mark-clear algorithm

The algorithm can be divided into two stages: marking and clearing, that is, marking the objects to be recycled first, and cleaning the objects uniformly after marking. Its advantage is high efficiency, but its disadvantage is easy to generate memory fragmentation.

In the marking stage, GC Roots are used as the starting point to scan the reference chain and mark the surviving objects. The cleanup phase is to scan the entire object collection and eliminate the unmarked objects in the collection.

The garbage tagging and garbage collection phases are shown below:

Mark-copy algorithm

The mark copy algorithm divides memory into two Spaces. At any point in time, all dynamically allocated objects can only be allocated in one of the Spaces, which can be called active space and the other space is free. When the available memory runs out, the JVM pauses the program and starts the replication algorithm GC thread. The GC thread then copies all the live objects in the live interval into the free interval, in strict order of memory address, and updates the memory reference address of the live objects to the new memory address.

At this point, the free interval has been swapped with the active interval, and the garbage objects are now all left in the original active interval, which is now the free interval. In fact, the garbage objects are already collected all at once when the active interval is converted to the spatial interval.

Tag sorting algorithm

Similar to the tag cleanup algorithm, the tag cleanup algorithm is divided into two stages: tag and tag cleanup.

Marking: The first stage is identical to the marking/clearing algorithm, which traverses GC Roots and marks the surviving objects.

Defragment: Move all surviving objects in memory address order, and then reclaim all memory beyond the end memory address. Therefore, the second stage is called the finishing stage

Generation collection algorithm

Most JVM garbage collectors currently use a generational collection algorithm, with the idea of dividing memory into three regions based on the lifetime of an object: new generation, old generation, and permanent generation. Regions can be divided according to the memory structure of the heap, new generation and old generation correspond to the partition in the heap, and the permanent generation has been replaced with meta space in Java8.

  • New generation recovery algorithm

The new generation mainly stores objects with a short life cycle. All newly generated objects should first be placed in Eden area in the new generation. During recycling, the surviving objects in Eden area should be copied To To Survivor area, and then Eden area should be emptied. When the To Survivor zone is also full, copy the Eden zone and the SURVIVING objects of the To Survivor zone To the From Survivor zone, then empty Eden and the From Survivor zone, and the To Survivor zone is empty. Then swap the To Survivor zone with the From Survivor zone, leaving the From Survivor zone empty, and repeat.

When the From Survivor zone is insufficient To store the surviving objects in Eden and To Survivor zones, the surviving objects are stored directly To the old age. If the old generation is also Full, a Full GC will be triggered, that is, the new generation and the old generation will be recycled.

In general, Cenozoic GCS are also called Minor GCS, and minorgcs occur with high frequency (not necessarily triggered when Eden is full).

  • Old age recovery algorithm

Objects that survive N garbage collections in the new generation are put into the old generation. Therefore, you can think of the tenured generation as holding objects with long life cycles. The memory of the old generation is also much larger than that of the new generation (approximately 1:2). When the memory of the old generation is Full, Major GC is triggered, i.e., Full GC, which occurs at a low frequency and has a longer survival time and a high survival rate.

  • Permanent generation recycling algorithm

The persistent generation is used to store static files, such as Java classes, methods, and so on. Persistent generation has no significant impact on garbage collection, but some applications may generate or call classes dynamically, such as Hibernate, etc. In such cases, a large permanent generation space should be set up to store the classes added during the run. Before JDK1.8, the permanent representative method area, after JDK1.8, the permanent generation is called a meta-space.

Algorithm selection in different partitions in the heap structure

In fact, the JVM uses different algorithms for different regions of the heap (new generation and old generation).

The new generation is more suitable for replication algorithm. The new generation includes Eden, From Survivor and To Survivor, because objects in Eden will be copied To To Survivor, and From Survivor and To Survivor exchange frequently. So the copy algorithm is used.

The life cycle of objects in the old age is relatively long, so it is not suitable for the copy algorithm. In the old age, the mark-tidy/clear algorithm is generally used.

reference

[1] www.sohu.com/a/359103068…

[2] www.cnblogs.com/widget90/p/…

[3] baijiahao.baidu.com/s?id=166328…

[4] www.cnblogs.com/jichi/p/111…

[5] Deep Understanding of JVM virtual Machine values (3rd edition). Zhou Zhihua