preface

Different GC options were mentioned in the interview, although I have read about them before, I haven’t really summarized them yet. Jot it down. The general content is divided into the following steps:

  • Accessibility analysis
  • Method area recovery
  • Garbage collection algorithm
  • Stop the world & safe point
  • Comparison of different collectors

Accessibility analysis

  • Reference calculation algorithm is also an algorithm to judge whether objects are free or not, but it cannot solve the problem of circular reference between objects.

Accessibility analysis algorithm

Starting with a series of objects called GC Root, the chain of references is traversed by searching down through object references. Objects that are not traversed are marked as stray objects and will be cleaned up during cleanup. GC Root objects include the following:

  • Object of the virtual machine stack
  • The object referenced by the constant in the method area
  • Method area the object referenced by the static property of the class
  • Native method stack Objects referenced by Native methods

PS: Do unreachable objects have to be deleted during reachability analysis?

  • In fact, it is not. Objects that are found unreachable experience whether or not to executefinalize, if be in,finalizeMethod to redeem itself, re-establish the reference, and the object will not be deleted. butfinalizeMethod virtual machines execute the same object only once and are unreliable methods. Most programs do not choose to override this method.

Method area recovery

The method area is inefficient to recycle because it contains a lot of class objects and static variables that are rarely recycled. There are only two types of objects that can be reclaimed: discarded constants and unused classes

Obsolete constant

If a constant exists in an “ABC” string, but the literal constant is not referenced in the entire JVM, memory reclamation will reclaim the constant object.

Useless classes. If the following three conditions are met at the same time, the class is considered as useless

  • All instances of the class are reclaimed
  • The ClassLoader that loaded the class was reclaimed
  • The Class object of the Class is not referenced by anything

PS: The use of bytecode frameworks such as reflection, dynamic proxy, CGLib, ASM, and custom classloaders such as dynamic JPS and OSGi generally require the function of class uninstallation to ensure that permanent generation does not overflow.

Garbage collection algorithm

Mark-clear

The basic collection algorithm.

  • Pros: Simple, straightforward
  • Disadvantages: inefficient, generates debris space.

Replication algorithm

Use double space to eliminate the above problems. Suitable for a dead man

  • Advantages: Simple and efficient
  • Cons: Double space

Mark-tidy

  • Overcome the shortcomings of the above two. Suitable for the stable old age
  • Advantages: No fragmentation space, no waste of memory
  • Disadvantages: Long processing time, complex

Generational collection

Different algorithms are selected according to the characteristics of object survival. The new generation uses copy, while the old generation uses tag clearing or tag sorting algorithm.

Stop the world & safe point

The JVM suspends all Java threads of execution during GC because of consistency concerns during GC to prevent errors or misjudgments during reachability analysis due to changes in object relationships. It’s called Stop the world.

To trigger a GC, all Java threads in the JVM must reach the GC Safe point. The JVM only places safe points in specific locations, such as:

  • Where memory is allocated
  • The time when long execution of a block ends (method call, loop jump, etc.)

Safe Point also needs to consider how to suspend threads at safe Point locations. There are two designs: preemptive interrupts and active interrupts

  • Preemptive. The GC first interrupts all threads, and if it finds a thread that does not have a safe point, it resumes the thread until it reaches the safe point. No virtual machine does that anymore.
  • Active. The GC sets interrupt flags for all threads, which poll the flags “when appropriate “(usually Safe Point) and automatically suspend themselves if they find it true.

What if a thread never gets CPU resources and cannot reach safe Point due to hunger?

  • The JVM extends Safe Point by defining the concept of Safe Region. When a thread enters a Safe Region, the GC can perform GC operations on the thread at any time. When a thread leaves a Safe Region, it checks to see if the root node enumeration is complete and waits for it to do so.

PS: Stop the world does not necessarily mean suspending all user threads. Some user threads can be active if they are in a safe zone. Of course, there’s no need to pick a bone.

Comparison of different collectors

No collector is perfect.

Serial/Serial Old collector

The most basic, simplest collector. Single thread, which works by suspending all worker threads when the JVM needs GC, copying GC for the new generation, and marking and cleaning for the old generation.

  • Advantages: simple and efficient, suitable for single-core CPU, Client mode of the program
  • Disadvantages: Stop the world obviously, not suitable for server use.

ParNew collector

The multithreaded version of the Serial collector, which is not very innovative at the moment, is favored because it can work with the CMS collector.

  • Advantages: Better than Serial collector in multi-CPU environments

Parallel avenge

The goal of this collector is to achieve a manageable throughput. Throughput = time spent running user code /(time spent running user code + garbage collector time). Eg: If the virtual machine runs for 100 minutes and garbage collection is cut out for 1 minute, then the throughput is 99%

Two parameters that control throughput size:

  • XX:MaxGCPauseMilis Maximum gc pause time. The shorter the time, the smaller the throughput
  • XX:GCTimeRatio sets the throughput directly

Parallel Old collector

An older version of the Parallel Exploiter. The Parallel Collector is an excellent young-generation collector, but only works with Serial Old, so Parallel Old was invented to dock with it. Its combination is a throughput first configuration combination, suitable for some CPU resource sensitive applications.

CMS collector

The Concurrent Mark Sweep collector is a collector whose goal is to obtain the shortest collection pause time. The CMS collector is a good fit for applications that focus on server response speed and want a short pause time. The process is divided into four steps

  • Initial Mark (CMS Inital Mark)
  • CMS Concurrent Mark
  • Re-marking (CMS Remark)
  • CMS Concurrent sweep

There are several characteristics as follows:

  • Initial tagging and re-tagging require stopping the world, but for a short time.
  • Concurrent tagging and concurrent cleanup are the most time-consuming and occur in conjunction with the user thread.
  • Concurrent marking is the process of Tracing from CG Root. The re-marking stage is to revise the marking records of the parts of objects that have changed the marking during the concurrent marking when the user program continues to operate. Advantages:
  • The ongoing GC thread and the user thread execute together, and Stop the world is lower,.

Disadvantages:

  • CPU resources are sensitive and occupy some CPU resources, causing user threads to pause and the total throughput to decrease. Generally more suitable for multi-core CPU host.
  • Unable to handle floating garbage. Because the CMS cleans concurrently, objects generated by the user in that phase are not cleared in the current GC and are left for the next GC. It is necessary to reserve enough memory space for the user thread to use, so the CMS thread can not wait for the old generation to be full, like other collectors, to reserve space for concurrent collection programs.-XX:CMSInitiatingOccupancyFraction=70This parameter is used to control the ratio to activate the CMS GC when the old age space reaches 70%. But if set too high, it can causeConcurrent Mode Failure, GC will use the backup scheme, using Serial Old.
  • Due to the mark-sweep algorithm, there will be space debris that may need to be addressed by the Full GC.

PS: Full GC does not equal CMS GC.

G1 collector

The G1 collector is an advanced one. Here are the details.

Heap structure

G1 divides the entire heap into regions with fixed memory, ranging in size from 1 to 32Mb.

Memory allocation

A region is divided into Eden, Survivor, and Old, which is just a label. The collection of region is done in parallel, and the other threads work as usual.

Young GC

The surviving object is transferred from Eden to one or more survivor zones, and when the age reaches a certain threshold, it rises to the old zone

The GC old s
  • Initial tag (STW, basically the same principle as CMS)
  • Root partition scanning
  • Concurrent flags (look for live objects in the heap. This phase may be interrupted by YGC)
  • Relabeling (STW, complete marking of live objects in heap memory, using a SATB start snapshot algorithm)
  • Cleanup phase.

G1 characteristics

  • Parallelism and concurrency
  • Generational collection
  • Spatial integration (mark-collation algorithm)
  • Predictable pauses (allows the program to specify garbage collection times with parameters)

The G1 fits those machines

  • Similar to CMS, the response time requirements are high, throughput requirements are not too strict
  • For multi-core CPU, large-memory JVM applications, Oracle says G1 is suitable for server applications with heap sizes greater than 6GB.

reference

  • Chapter 3: Garbage Collector and Memory Allocation Policies. By Zhiming Zhou
  • Probably the most comprehensive Java G1 study notes
  • Oracle has proposed G1 as the default garbage collector for Java 9