preface
Different GC options were mentioned in the interview, although I have read about them before, I haven’t really summarized them yet. Jot it down. The general content is divided into the following steps:
- Accessibility analysis
- Method area recovery
- Garbage collection algorithm
- Stop the world & safe point
- Comparison of different collectors
Accessibility analysis
- Reference calculation algorithm is also an algorithm to judge whether objects are free or not, but it cannot solve the problem of circular reference between objects.
Accessibility analysis algorithm
Starting with a series of objects called GC Root, the chain of references is traversed by searching down through object references. Objects that are not traversed are marked as stray objects and will be cleaned up during cleanup. GC Root objects include the following:
- Object of the virtual machine stack
- The object referenced by the constant in the method area
- Method area the object referenced by the static property of the class
- Native method stack Objects referenced by Native methods
PS: Do unreachable objects have to be deleted during reachability analysis?
- In fact, it is not. Objects that are found unreachable experience whether or not to execute
finalize
, if be in,finalize
Method to redeem itself, re-establish the reference, and the object will not be deleted. butfinalize
Method virtual machines execute the same object only once and are unreliable methods. Most programs do not choose to override this method.
Method area recovery
The method area is inefficient to recycle because it contains a lot of class objects and static variables that are rarely recycled. There are only two types of objects that can be reclaimed: discarded constants and unused classes
Obsolete constant
If a constant exists in an “ABC” string, but the literal constant is not referenced in the entire JVM, memory reclamation will reclaim the constant object.
Useless classes. If the following three conditions are met at the same time, the class is considered as useless
- All instances of the class are reclaimed
- The ClassLoader that loaded the class was reclaimed
- The Class object of the Class is not referenced by anything
PS: The use of bytecode frameworks such as reflection, dynamic proxy, CGLib, ASM, and custom classloaders such as dynamic JPS and OSGi generally require the function of class uninstallation to ensure that permanent generation does not overflow.
Garbage collection algorithm
Mark-clear
The basic collection algorithm.
- Pros: Simple, straightforward
- Disadvantages: inefficient, generates debris space.
Replication algorithm
Use double space to eliminate the above problems. Suitable for a dead man
- Advantages: Simple and efficient
- Cons: Double space
Mark-tidy
- Overcome the shortcomings of the above two. Suitable for the stable old age
- Advantages: No fragmentation space, no waste of memory
- Disadvantages: Long processing time, complex
Generational collection
Different algorithms are selected according to the characteristics of object survival. The new generation uses copy, while the old generation uses tag clearing or tag sorting algorithm.
Stop the world & safe point
The JVM suspends all Java threads of execution during GC because of consistency concerns during GC to prevent errors or misjudgments during reachability analysis due to changes in object relationships. It’s called Stop the world.
To trigger a GC, all Java threads in the JVM must reach the GC Safe point. The JVM only places safe points in specific locations, such as:
- Where memory is allocated
- The time when long execution of a block ends (method call, loop jump, etc.)
Safe Point also needs to consider how to suspend threads at safe Point locations. There are two designs: preemptive interrupts and active interrupts
- Preemptive. The GC first interrupts all threads, and if it finds a thread that does not have a safe point, it resumes the thread until it reaches the safe point. No virtual machine does that anymore.
- Active. The GC sets interrupt flags for all threads, which poll the flags “when appropriate “(usually Safe Point) and automatically suspend themselves if they find it true.
What if a thread never gets CPU resources and cannot reach safe Point due to hunger?
- The JVM extends Safe Point by defining the concept of Safe Region. When a thread enters a Safe Region, the GC can perform GC operations on the thread at any time. When a thread leaves a Safe Region, it checks to see if the root node enumeration is complete and waits for it to do so.
PS: Stop the world does not necessarily mean suspending all user threads. Some user threads can be active if they are in a safe zone. Of course, there’s no need to pick a bone.
Comparison of different collectors
No collector is perfect.
Serial/Serial Old collector
The most basic, simplest collector. Single thread, which works by suspending all worker threads when the JVM needs GC, copying GC for the new generation, and marking and cleaning for the old generation.
- Advantages: simple and efficient, suitable for single-core CPU, Client mode of the program
- Disadvantages: Stop the world obviously, not suitable for server use.
ParNew collector
The multithreaded version of the Serial collector, which is not very innovative at the moment, is favored because it can work with the CMS collector.
- Advantages: Better than Serial collector in multi-CPU environments
Parallel avenge
The goal of this collector is to achieve a manageable throughput. Throughput = time spent running user code /(time spent running user code + garbage collector time). Eg: If the virtual machine runs for 100 minutes and garbage collection is cut out for 1 minute, then the throughput is 99%
Two parameters that control throughput size:
- XX:MaxGCPauseMilis Maximum gc pause time. The shorter the time, the smaller the throughput
- XX:GCTimeRatio sets the throughput directly
Parallel Old collector
An older version of the Parallel Exploiter. The Parallel Collector is an excellent young-generation collector, but only works with Serial Old, so Parallel Old was invented to dock with it. Its combination is a throughput first configuration combination, suitable for some CPU resource sensitive applications.
CMS collector
The Concurrent Mark Sweep collector is a collector whose goal is to obtain the shortest collection pause time. The CMS collector is a good fit for applications that focus on server response speed and want a short pause time. The process is divided into four steps
- Initial Mark (CMS Inital Mark)
- CMS Concurrent Mark
- Re-marking (CMS Remark)
- CMS Concurrent sweep
There are several characteristics as follows:
- Initial tagging and re-tagging require stopping the world, but for a short time.
- Concurrent tagging and concurrent cleanup are the most time-consuming and occur in conjunction with the user thread.
- Concurrent marking is the process of Tracing from CG Root. The re-marking stage is to revise the marking records of the parts of objects that have changed the marking during the concurrent marking when the user program continues to operate. Advantages:
- The ongoing GC thread and the user thread execute together, and Stop the world is lower,.
Disadvantages:
- CPU resources are sensitive and occupy some CPU resources, causing user threads to pause and the total throughput to decrease. Generally more suitable for multi-core CPU host.
- Unable to handle floating garbage. Because the CMS cleans concurrently, objects generated by the user in that phase are not cleared in the current GC and are left for the next GC. It is necessary to reserve enough memory space for the user thread to use, so the CMS thread can not wait for the old generation to be full, like other collectors, to reserve space for concurrent collection programs.
-XX:CMSInitiatingOccupancyFraction=70
This parameter is used to control the ratio to activate the CMS GC when the old age space reaches 70%. But if set too high, it can causeConcurrent Mode Failure
, GC will use the backup scheme, using Serial Old. - Due to the mark-sweep algorithm, there will be space debris that may need to be addressed by the Full GC.
PS: Full GC does not equal CMS GC.
G1 collector
The G1 collector is an advanced one. Here are the details.
Heap structure
G1 divides the entire heap into regions with fixed memory, ranging in size from 1 to 32Mb.
Memory allocation
A region is divided into Eden, Survivor, and Old, which is just a label. The collection of region is done in parallel, and the other threads work as usual.
Young GC
The surviving object is transferred from Eden to one or more survivor zones, and when the age reaches a certain threshold, it rises to the old zone
The GC old s
- Initial tag (STW, basically the same principle as CMS)
- Root partition scanning
- Concurrent flags (look for live objects in the heap. This phase may be interrupted by YGC)
- Relabeling (STW, complete marking of live objects in heap memory, using a SATB start snapshot algorithm)
- Cleanup phase.
G1 characteristics
- Parallelism and concurrency
- Generational collection
- Spatial integration (mark-collation algorithm)
- Predictable pauses (allows the program to specify garbage collection times with parameters)
The G1 fits those machines
- Similar to CMS, the response time requirements are high, throughput requirements are not too strict
- For multi-core CPU, large-memory JVM applications, Oracle says G1 is suitable for server applications with heap sizes greater than 6GB.
reference
- Chapter 3: Garbage Collector and Memory Allocation Policies. By Zhiming Zhou
- Probably the most comprehensive Java G1 study notes
- Oracle has proposed G1 as the default garbage collector for Java 9