1. An overview of the
Garbage Collection (GC), as its name implies, frees up space occupied by Garbage and prevents memory leaks. Efficient use of usable memory, cleaning and recycling dead or unused objects in the memory heap.
2. Garbage detection algorithm
2.1 Reference counting method
Add a counter to each object that is incremented by one when the object is referenced somewhere and decayed by one when the reference fails. The object counter is 0 to determine whether the object can be reclaimed. Disadvantages: Cannot solve the problem of circular references.
I’m going to create a String, String m = new String(” Jack “); “Jack” has a reference to m. Then set m to null, and the number of references to “jack” equals zero, which in the reference-counting algorithm means that this piece of content needs to be reclaimed.
The reference counting algorithm spreads garbage collection over the entire application, rather than suspending the entire application until all objects in the heap have been processed. Therefore, garbage collection with reference counting is not strictly stop-the-world garbage collection.
It seems nice, but we know that JVM garbage collection is stop-the-world, so what led us to finally abandon The reference counting algorithm? Look at the following example.
public class ReferenceCountingGC { public Object instance; public ReferenceCountingGC(String name) { } public static void testGC(){ ReferenceCountingGC a = new ReferenceCountingGC("objA"); ReferenceCountingGC b = new ReferenceCountingGC("objB"); a.instance = b; b.instance = a; a = null; b = null; }}Copy the code
As you can see, the last two objects are no longer accessible, but because they refer to each other, their reference counts will never be zero, and the reference counting algorithm will never tell the GC collector to reclaim them.
2.2 Accessibility analysis algorithm
The path through GC ROOT’s object as the starting point for the search and down through the reference is called the reference chain. Determine whether an object can be reclaimed by whether it has a path to the reference chain (objects that can be GC ROOT: objects referenced in the virtual machine stack, objects referenced by class static attributes in the method area, objects referenced by constants in the method area, objects referenced by JNI in the local method stack)The reachable algorithm successfully solves the problem of circular dependencies that reference counting cannot solve, as long as you cannot establish a direct or indirect connection to the GC Root, you are considered recyclable. This raises the question of which ones belong to GC Root.
Objects in the Java memory region that can be used as GC ROOT:
Object referenced in the virtual machine stack
public class StackLocalParameter { public StackLocalParameter(String name) {} public static void testGC() { StackLocalParameter s = new StackLocalParameter("localParameter"); s = null; }}Copy the code
When s is null, the localParameter object breaks the reference chain with GC Root and will be reclaimed. The object referenced by the class static property in the method area
public class MethodAreaStaicProperties { public static MethodAreaStaicProperties m; public MethodAreaStaicProperties(String name) {} public static void testGC(){ MethodAreaStaicProperties s = new MethodAreaStaicProperties("properties"); s.m = new MethodAreaStaicProperties("parameter"); s = null; }}Copy the code
In this case, s is GC Root, and s is set to null. After GC, the properties object pointed to by S will be reclaimed because it cannot establish a relationship with GC Root. M, as a static property of the class, also belongs to GC Root. The parameter object is still connected to GC Root, so the parameter object will not be reclaimed.
The object referenced by the constant in the method area
public class MethodAreaStaicProperties { public static final MethodAreaStaicProperties m = MethodAreaStaicProperties("final"); public MethodAreaStaicProperties(String name) {} public static void testGC() { MethodAreaStaicProperties s = new MethodAreaStaicProperties("staticProperties"); s = null; }}Copy the code
M is either a constant reference in the method area or GC Root. If S is set to null, final objects will not be reclaimed because there is no contact with GC Root.
Objects referenced in the local method stack
Any native interface will use some kind of local method stack. If the implemented local method interface uses the C connection model, then its local method stack is C stack. When a thread calls a Java method, the virtual machine creates a new stack frame and pushes it onto the Java stack. However, when it calls a local method, the virtual machine keeps the Java stack unchanged and no longer pushes new frames into the thread’s Java stack. The virtual machine simply connects dynamically and calls the specified local method directly.
Garbage collection algorithm
After determining what garbage can be collected, all the garbage collector has to do is start garbage collection, but there is a problem: how to do garbage collection efficiently. Here we discuss the core ideas of several common garbage collection algorithms.
3.1 Mark-clear algorithm
Mark-sweep is one of the most basic garbage collection algorithms. It is divided into two parts. First, it marks the objects in the memory area, marks which objects are recyclable, and then picks up the garbage and cleans it up. As shown above, the garbage that is cleared becomes unused memory area, waiting to be used again. But it has a big problem, and that is memory fragmentation.
The medium squares above are assumed to be 2M, the smaller ones 1M, and the larger ones 4M. By the time we’re done recycling, the memory will be cut into multiple segments. We know that to create memory, we need contiguous memory areas, so we need a 2M area of memory, and two 1M areas are unusable. As a result, we actually have so much memory, but we can’t use it.
3.2 Replication Algorithm
Copying algorithms are evolved on the basis of marker removal algorithms to solve the memory fragmentation problem of marker removal algorithms. It divides the available memory by capacity into two equally sized pieces, one of which is used at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again. The continuous availability of memory is guaranteed, and there is no need to consider the complexity of memory fragmentation when allocating memory. The copying algorithm exposes another problem, such as the high cost of using only 200 gigabytes of a 500 GB hard drive.
3.3 Mark-collation algorithm
Mark-clean algorithm the marking process is still the same as mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects to one end and clean up the memory area beyond the end boundary.
The mark collation algorithm solves the problem of memory fragmentation and avoids the disadvantage that the copy algorithm can only use half of the memory region. The mark collation algorithm changes the memory more frequently and needs to collate the reference addresses of all living objects, which is much less efficient than the copy algorithm. Typically, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be used for each generation.
3.4 Generational collection algorithm
Generation collection algorithm The generation collection algorithm is not strictly an idea or theory, but a combination of the above three basic algorithm ideas and different algorithms for different situations. According to the different life cycle of objects, the memory is divided into several blocks.
In the new generation, a large number of objects are found dead and only a few survive in garbage collection, so the replication algorithm is selected, and only a small amount of the replication cost of the surviving objects can be collected.
In the old days, because the object had a high survival rate and there was no extra space to allocate it, it had to be recycled using mark-clean or mark-clean algorithms.
4. Memory area and reclamation policy
Objects are allocated, in general, on the heap (but can also be JIT compiled to split into scalar types and indirectly allocated on the stack). Objects are allocated mainly on the Eden region of the new generation. If local thread allocation buffers are enabled, they will be allocated on TLAB first. In rare cases, it is possible to allocate directly to the old age (large objects are allocated directly to the old age). The rules for allocation are not 100% fixed, depending on which combination of garbage collectors is being used and how memory-related parameters are set in the virtual machine.
4.1 Objects are preferentially allocated in Eden
In most cases, objects are allocated in the Eden region of the new generation. When the Eden area does not have enough space to allocate, the virtual machine initiates a Minor GC. Minor GC is more frequent and faster than Major GC. After Minor GC, most of the objects in Eden will be reclaimed, and the surviving objects will be sent to the Survivor From zone (or to the Old zone if there is not enough space in the From zone).
4.2 Survivor area
The Survivor zone serves as a buffer between the Eden zone and the Old zone, similar to the yellow light in our traffic lights. Survivor is divided into two zones, one is From zone and one is To zone. Each time a Minor GC is executed, the surviving objects in the Eden region are placed in the Survivor From region, where the surviving objects are decided based on their age. (The logical relation between From and To Survivor will be reversed: From changes To, To changes From, in order To ensure that there is continuous space To store each other and avoid fragmentation)
4.2.1 Significance of Survivor zone
If there are no Survivor zones, every time a Minor GC is performed in Eden, the surviving objects are sent to the old age, which quickly fills up. There are a lot of objects that don’t get killed in one Minor GC, but don’t really last long, and may need to be wiped out in the second or third. Moving to the elderly area at this time is clearly not a wise decision. Thus, Survivor exists to reduce the number of objects sent to the old age, thus reducing the occurrence of Major GC. Survivor pre-screening guarantees that only objects that survive 16 Minor GC’s will be sent to the old age.
4.3 Large objects directly enter the old age
Large objects are Java objects that require a large amount of contiguous memory. The most typical large objects are long strings and arrays. Large objects are bad news for virtual machine memory allocation, as the frequent occurrence of large objects tends to trigger garbage collection in advance to obtain enough contiguous space to “place” them when there is not enough space in memory.
Virtual machine provides a XX: PretenureSizeThreshold parameters, make more than the set value of object directly in old age distribution, the aim is to avoid happen between Eden area and two Survivor area a lot of memory copy (new generation USES a replication algorithm).
4.4 Objects of long-term survival will enter the old age
The virtual machine defines an Age counter for each object, and if the object survives after Eden’s birth and the first Minor GC and can be held by Survivor, Will be moved into the Survivor space (normally objects are constantly moved between the From and To sections of Survivor), and the object’s age is set To 1. Each time an object undergoes a Minor GC in a Survivor zone, its age increases by one year, and when it reaches a certain age (15 by default), it will be promoted to the old age. Object the promotion of the old s age threshold, can pass parameter XX: MaxPretenuringThreshold Settings.
4.5 Dynamic object age determination
In order to better accommodate different levels of memory, the VIRTUAL machine does not always require that the object’s age reach maxpreteningThreshold to promote to the old age. If the sum of all object sizes of the same age in the Survivor space is greater than half of the Survivor space, Objects older than or equal to their current age can go straight to the old age, without having to wait until the required age in maxPreteningThreshold.
This is a bit like load balancing, and polling is a type of load balancing that ensures that each machine gets the same requests. It looks balanced, but each machine has different hardware and health conditions. We can also adjust our load balancing algorithm based on the number of requests received by each machine, or the response time of each machine, etc.