The body of the
The garbage collector focuses on the Java heap and method areas because allocation and collection of this portion of memory is dynamic. You don’t know what objects will be created or how much memory will be required until your program is running.
The virtual machine stack and the local method stack do not need to worry too much about reclamation, because the amount of memory allocated for each stack frame in the stack is basically known when the class structure is determined, so memory allocation and reclamation in these areas are deterministic.
One, the object has died
The first thing the garbage collector does before collecting the heap is determine which objects in the heap are “alive” and which are “dead” (that is, objects that can no longer be used in any way).
1. Reference counting algorithm
Add a reference counter to the object, incrementing the counter by 1 each time a reference is made to it; When a reference is invalid, the counter value decreases by 1; An object whose counter is 0 at any point in time cannot be used again.
Advantages: simple implementation, high judgment efficiency.
Disadvantages: It is difficult to solve the problem of objects referring to each other circularly.
2. Accessibility analysis algorithm
Starting with a series of objects called “GC Roots”, the search starts from these nodes and goes down a path called the reference chain. When an object is not connected to GC Roots by any reference chain, the object is unavailable.
In Java, objects that can be used as GC Roots:
- The object referenced in the virtual machine stack (the local variable table in the stack frame).
- The object referenced by the class static property in the method area.
- A constant reference object in a method area.
- Objects referenced by JNI (or Native methods) in the Native method stack.
Unreachable objects in the reachability analysis algorithm must undergo at least two marking processes before they are reclaimed.
- The first mark is made when no reference chain connected to GC Roots is found.
- When an object overrides the Finalize () method and is not called, it will be put into a Queue called f-queue, and later the GC will mark the object in the F-queue a second time. If an object is not re-associated with an object on the reference chain in the Finalize () method, it will be recycled.
3. Four kinds of quotations
Whether reference counting algorithm or reachability analysis algorithm, determining whether an object is alive or not is related to “reference”. In Java, there are four types of references, in descending order of strength: strong reference, soft reference, weak reference, and virtual reference.
- Strong reference: a reference similar to Object obj = new Object(). As long as strong references exist, objects are never recycled.
- Soft references: Used to describe objects that are useful but not necessary. When memory is insufficient, objects may be reclaimed. You can use the SoftReference class to implement a SoftReference.
- Weak references: Used to describe objects that are not required, but are weaker than soft references. During GC, objects are reclaimed regardless of whether there is enough memory. Weak references can be implemented through the WeakReference class.
- Virtual references: Also known as ghost references or phantom references, virtual references do not affect the lifetime of an object. The only purpose of a virtual reference is to receive a system notification when an object is reclaimed. Virtual references can be implemented through the PhantomReference class.
4. Recycling method area
The garbage collection of the permanent generation mainly recycles two parts: discarded constants and useless classes.
How to determine discarded constants:
- Constants in the constant pool (literals, symbolic references) are not referenced anywhere.
How to identify useless classes:
- All instances of this class have been reclaimed.
- The ClassLoader that loaded the class has been reclaimed.
- The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.
Second, garbage collection algorithm
1. Mark-clear algorithm
There are two stages: “mark” and “clear”. All objects that need to be reclaimed are marked first, and then all marked objects are uniformly reclaimed.
This algorithm produces a large number of discrete memory fragments and may have to trigger a GC prematurely when allocating large objects because sufficient contiguous memory cannot be found.
2. Replication algorithm
Divide the available memory into two equally sized pieces by capacity and use only one piece at a time. When a block of memory is used up, the surviving objects are copied to another block, and the used memory space is cleaned up again.
The cost of this algorithm is that a chunk of memory is always “wasted”.
Since 98% of the new generation of objects are “live and die”, there is no need for a 1:1 partition of memory space. In today’s commercial virtual machines, memory is divided into one large Eden space and two smaller Survivor Spaces, using Eden and one Survivor at a time. When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor, and Eden and used Survivor are finally cleaned up.
The default size ratio of HotSpot VIRTUAL machine Eden to Survivor is 8:1.
Allocation guarantee mechanism: When another Survivor does not have enough space to hold the surviving object, another memory (old age) is required to allocate guarantee and move the object to another memory (old age).
3. Mark-tidy algorithm
All objects that need to be reclaimed are marked, all living objects are moved to one end, and memory beyond the end boundary is cleaned up.
4. Generational collection algorithm
The Java heap is divided into new generation and old generation according to the different life cycles of objects, and then the most appropriate collection algorithm is adopted according to the characteristics of each generation.
- New generation: Uses replication algorithms. Because a large number of objects die in each new generation GC, the GC can be completed with a small cost of copying the surviving objects.
- In the old days: “mark-clean” or “mark-tidy” algorithms were used. Because of the high survival rate of objects in the old age, and there is no extra space for allocation guarantees.
Third, HotSpot algorithm implementation
1. Enumerate the root node
For reacability analysis, you need to enumerate GC Roots nodes to mark all unavailable objects.
Nodes that can be used as GC Roots are mainly in global references (such as constants or class static attributes) and execution contexts (such as local variable tables in stack frames). It would take a lot of time to go through the references one by one. Therefore, the current mainstream Java virtual machines use exact GC to complete GC Roots enumeration.
Stop The World (STW) : Object reference relationships must not change during reachability analysis. Therefore, when GC is performed, all Java execution threads must be paused, and the entire execution system appears to be frozen at some point in time.
Exact GC: The virtual machine knows directly where object references are stored, so STW does not require a reference location that does not leak through all execution contexts and globally.
An implementation of precise GC in HotSpot: HotSpot uses a set of data structures called OopMap to record the location of references to objects. In this way, the GC can directly know the reference location information of the object during the scan.
When the class is loaded, HotSpot calculates and records what type of data is on what offsets within the object into OopMap. During JIT compilation, OopMap also records which locations in the stack and registers are references.
2. Be safe
HotSpot only records OopMap at certain locations, called safe points.
During program execution, only a safe point can be reached before it pauses for GC. Because OopMap records cannot be accessed until the safe point is reached.
How to make a thread run to the nearest safe point and then pause during GC:
- Preemptive interrupt: During GC, all threads are interrupted first, and if it is found that the thread is interrupted at a place other than the safe point, the thread is recovered and allowed to run to the safe point.
- Active interrupt: During GC, an interrupt flag is set, and each thread takes the initiative to poll this flag while executing. If the interrupt flag is true, it interrupts and suspends itself.
3. Safe areas
A security zone is a code fragment in which reference relationships do not change. It is safe to start GC anywhere in this region. Think of a security zone as an extended security point.
Why you need a safe zone: When a thread does not allocate CPU time, it cannot respond to interrupt requests from the JVM, runs to the safe point of interrupt suspension, and the JVM is less likely to wait for the thread to reallocate CPU time. This situation requires a safe zone.
Use of safe areas:
- When a thread executes code to a security zone, it identifies itself as entering a security zone.
- When the JVM initiates a GC, it does not care about the thread entering the safe zone.
- Before a thread leaves the safe zone, it must check that the system has completed the root node enumeration (or the entire GC process). If it completes, the thread continues, otherwise it must wait until it receives a signal that it can leave the safe zone.
Garbage collector
1. Serial collector
- The most basic and oldest collector.
- Single-threaded collector: Uses one CPU or one thread for garbage collection.
- The new generation collector is the default collector for VMS running in Client mode.
- Simple and efficient, with no overhead of thread interaction on a single CPU.
ParNew collector
- Multithreaded version of Serial collector.
- The Cenozoic collector is the preferred Cenozoic collector for many virtual machines running in Server mode.
- In addition to the Serial collector, it is currently the only one that works with the CMS collector.
- The number of collection threads enabled by default is the same as the number of cpus.
Parallel avenge
- Multithreaded collector.
- The New generation collector.
- Focus on throughput, which is the ratio of the time the CPU spends running user code to the total CPU consumption. High throughput can make efficient use of CPU time and complete the program’s computing tasks as soon as possible, which is suitable for tasks that do not require too much interaction in the background.
- You can enable the adaptive adjustment policy to enable the VM to perform memory management tuning.
Adaptive adjustment policy: Collects performance monitoring information about VMS based on the current system running status and dynamically adjusts VM parameters to provide the most appropriate pause time or maximum throughput.
Serial Old collector
- An older version of the Serial collector.
- Single-threaded collector.
- Use a mark-and-tidy algorithm.
- This parameter is used by VMS in Client mode.
Parallel Old collector
- An older version of the Parallel Exploiter.
- Multithreaded collector.
- Use a mark-and-tidy algorithm.
- The Parallel Avenge and Parallel Old collector can be used as a priority for throughput and CPU-sensitive applications.
6. CMS collector
- CMS: Concurrent Mark Sweep.
- Concurrent collector: The garbage collector thread works (basically) at the same time as the user thread.
- Use a “mark-clean” algorithm.
- The focus is on how to shorten the pause time of user threads during garbage collection. Short pause times mean fast response times, making it suitable for applications that require user interaction.
CMS Operation process:
- Initial tag: Marks objects to which GC Roots can be directly associated, requiring STW.
- Concurrent marking: The process of GC Roots Tracing, accessibility analysis.
- Relabeling: To correct the marking record of the part of the object whose reference relationship changes during concurrent marking, STW is required.
- Concurrent cleanup: Removes garbage objects.
Disadvantages of CMS:
- Very sensitive to CPU resources. While the concurrent phase does not cause user threads to pause, it does slow down the application because it consumes a portion of the threads (or CPU resources), reducing overall throughput.
- Unable to handle floating garbage. Garbage generated during the concurrent cleanup phase is called “floating garbage” and can only be removed by the next GC.
- A lot of memory fragmentation is generated. Full GC is triggered early when there is too much memory fragmentation. By default, the CMS collector starts the defragmentation process during Full GC.
G1 collector
- G1: Garbage – First.
- Is a server – oriented application garbage collector.
G1 features:
- Parallelism and concurrency
- Generational collection
- Spatial consolidation: G1 is based on a mark-and-tidy algorithm as a whole and a copy algorithm locally (between two regions). Therefore, there is no memory space fragmentation.
- Predictable pauses: G1 allows users to explicitly specify that no more than N milliseconds should be spent on garbage collection within a time segment of M milliseconds by modeling predictable pauses.
Region: G1 divides the entire Java heap into independent regions of equal size. Although the concept of new generation and old generation is retained, new generation and old generation are no longer physically isolated, but rather a collection of several regions (which do not need to be continuous).
Predictable time-lapse model: G1 makes predictable time-lapse models possible because it systematically avoids all-area garbage collection across the entire Java heap.
G1 tracks the Garbage accumulation value of each Region (the amount of space and time required for Garbage collection) and maintains a priority list in the background. Each time, the Region with the highest Garbage collection value (garbage-first name) is preferentially collected based on the allowed collection time.
G1 Operation process:
- Initial tag: Mark objects that GC Roots can directly associate with, and modify the value of TAMS (Next Top at Mark Start) so that the Next phase of user programs running concurrently can create new objects in the correct Region available. Need to STW.
- Concurrent marking: Perform reachability analysis.
- Final markup: Revises the markup record of the part of the object whose reference relationship changed during concurrent markup. Need to STW.
- Filter collection: Sort the value and cost of collection for each Region, and make a collection plan based on the expected GC downtime of the user.
5. Memory allocation and reclamation strategy
1. Objects are allocated in Eden first
- In most cases, objects are allocated in the Eden region of the new generation.
- When the Eden area does not have enough space to allocate, the virtual machine will initiate a Minor GC.
2, big object directly into the old age
- Large objects are Java objects that require a large amount of contiguous memory space.
- The GC is triggered early to get enough contiguous space to place large objects when there is a tendency to leave a lot of space in memory.
- Since the new generation uses the replication algorithm to collect memory, large objects will go directly to the old age in order to avoid massive memory replication between Eden and two Survivor regions.
3, the object of long-term survival into the old age
- The virtual machine defines an object age counter for each object.
- If the object is still alive after Eden is born and after a Minor GC and can be accommodated by Survivor, it is moved to Survivor and the object age is set to 1.
- Each time an object “survives” a Minor GC in Survivor, the age is increased by 1, and when the object age reaches a certain point (15 by default), it is promoted to the old age.
4, dynamic object age determination
- Virtual machines do not require objects to reach a certain age in order to better accommodate the memory conditions of different programs.
- If the total size of the objects of the same age in Survivor is greater than half of the Survivor space, the objects greater than or equal to this age directly enter the old age.
5. Guarantee of space allocation
- When a large number of objects survive the Minor GC, an allocation guarantee from the old age is required to allow the objects that are unable to be held by Survivor to go directly to the old age.