In java-related interviews, the interviewer will often be asked to explain the understanding and common classification of garbage collection in Java. As you can see, JVM garbage collection is also important to every Java developer, just for interviews. In addition, it is very helpful to understand and solve the performance of Java applications.
The garbage collection algorithm for Java virtual machine memory was introduced in the previous article. This chapter introduces the common garbage collectors and their features in Java.
Main contents of this paper:
- The basic concept
- Serial, parallel, and concurrent
- Serial, parallel, and concurrency in JVM garbage collection
- Serial garbage collector
- Serial
- Serial Old
- Parallel garbage collector
- ParNew
- Parallel
- Parallel Old
- CMS
- G1(Garbage First)
- summary
The basic concept
Before introducing specific garbage collectors, let’s look at a few basic concepts.
Serial, parallel, and concurrent
There are two ways of information exchange in computer system: parallel data transmission and serial data transmission.
- Serial: Serial in a computer is denoted by Serial. Task A and task B run on the same CPU thread. Task B cannot be executed until task A completes. That is, there is only one running context, i.e. one call stack and one heap, in the whole process of running the program. The program executes each instruction in sequence.
- Parallelism: Parallelism refers to two or more events or activities occurring at the same time. In a multiprogram environment, parallelism enables multiple programs to run simultaneously on different cpus at the same time. For example, tasks A and B can run on different CPU threads at the same time with high efficiency, but are limited by the number of CPU threads. If the number of tasks exceeds the number of CPU threads, the tasks on each thread will still be executed sequentially.
- Concurrency: multiple threads appear to execute simultaneously in macro (relative to a long time interval), but actually alternate with execution in turn. The essence of concurrency is a physical CPU multiplexing between several programs, its purpose is to improve the running efficiency of limited physical resources. Serial is not mutually exclusive. If concurrency is enabled on one CPU thread, it is naturally serial, whereas if concurrency is enabled on multiple threads, the program can execute concurrently and in parallel.
A simple example is writing code and listening to music:
- We don’t do anything else while we’re coding to keep us focused. Listen to music during tea breaks.
- When we’re coding, we can either write code or listen to music, both at the same time.
- We often take a break from writing code to listening to music, and those two things are interwoven.
The reader can think about serialization, parallelism, and concurrency as described above.
Serial, parallel, and concurrency in JVM garbage collection
These three concepts are also involved in the JVM garbage collector.
- Serial: collector that uses a single thread for garbage collection.
- Parallel: Multiple garbage collection threads work in Parallel while the user thread is still in a waiting state.
- Concurrent: When the user thread executes concurrently (but not necessarily in parallel, but possibly alternately) with the garbage collector running on another CPU.
With these concepts in mind, let’s start detailing common garbage collectors.
Serial garbage collector
As mentioned above, a serial collector is a collector that uses a single thread for garbage collection. The serial collector has only one worker thread at a time of collection, and tends to perform better in terms of focus and exclusivity on computers with less concurrency. The serial collector can be used in the new generation and the old age. According to the different heap space, it can be divided into the new generation serial collector and the old age serial collector.
Serial
The Serial collector, the oldest collector, has the disadvantage that when the Serial collector wants to do garbage collection, it must suspend all processes of the user, known as STW(service pause). For now, it remains the default generation collector for virtual machines running in client mode.
Parameter control: -xx :+UseSerialGC uses the serial collector.
Serial Old
An older version of the Serial collector, which is also a single-threaded collector. It is used primarily for two purposes: as a companion to the Parallel Scavenge collector in JDK1.5 and earlier releases, and as a fallback to the CMS collector.
UseSerialGC: This parameter is enabled using the Serial & Serial Old collector (default value in client mode).
Parallel garbage collector
Parallel collector is improved on the basis of serial collector, it can use multiple threads for garbage collection at the same time, for the computer with strong computing ability, can effectively shorten the actual time required for garbage collection.
ParNew
The ParNew collector is a garbage collector that works in the new generation. It simply multithreads the serial collector, and its collection strategy and algorithm are the same as the serial collector. New generation parallel, old age serial; New generation copy algorithm, old age marker – collation.
Parameter control: -xx :+UseParNewGC uses ParNew collector; -xx :ParallelGCThreads Limits the number of threads. In addition to the Serial collector, it is the only one that can work with the CMS collector (truly concurrent collector, described below).
Parallel
Parallel is a new generation of multi-threaded garbage collector using replication algorithm. Parallel collector pays more attention to system throughput. Throughput is the ratio of CPU time spent running user code to total CPU consumption, i.e. Throughput = user code time run /(user code time run + garbage collection time).
The shorter the pause time, the more suitable for the need to interact with the user program, good response speed can improve the user experience;
The high throughput can make the most efficient use of CPU time and complete the program’s computing tasks as soon as possible, which is mainly suitable for the tasks that do not require too much interaction in the background.
You can set parameters to enable the adaptive adjustment policy. The vM collects performance monitoring information based on the current system running status and dynamically adjusts these parameters to provide the most appropriate pause time or maximum throughput. You can also use parameters to control how many milliseconds or percentages GC takes; New generation copy algorithm, old age marker – collation
Parameter control:
- -xx :MaxGCPauseMillis Sets the maximum garbage collection pause time
- -xx :GCTimeRatio Sets the throughput size (default: 99)
- -xx :+UseAdaptiveSeizPolicy Enables the adaptive mode. After this parameter is enabled, details such as the size of the new generation, the ratio of Eden to Survivor area, and the age of the old age object are not manually specified. The vM collects performance monitoring information based on the current operating status of the system. Adjust these parameters dynamically to provide the most appropriate pause times or maximum throughput
Parallel Old
The Parallel Old collector is an older version of the Parallel Avenge collector, using multithreading and mark-collation algorithms, and also focused on throughput. Parallel plus Parallel Old collector can be preferred in places where throughput and CPU resource sensitivity are important.
-xx :+UseParallelOldGC ParallelOld collector; -xx :ParallelGCThreads Limits the number of threads.
CMS garbage collector
Concurrent Mark Sweep CMS(Concurrent Mark Sweep), which uses mark-sweep, works in the old days and focuses on system pause times.
CMS is not an exclusive collector, that is, the application is still working during the CMS collection process, and new garbage is constantly generated. Therefore, in the process of using CMS, you should ensure that the memory of the application is available, and the CMS does not wait until the application is saturated to collect garbage. Instead, collection starts at a certain threshold (68 by default), meaning that CMS is executed when the space utilization of the older generation reaches 68%. If memory usage is growing rapidly, in the execution of a CMS, there are already out of memory, at this point, the CMS recovery fails, the virtual machine will start the old s Serial garbage recycling, which led to the suspension of the application, until after the garbage collection will work normally, the process of GC pause times may be longer, Therefore, the threshold should be set according to the actual situation.
- Initial flag: Suspend all other threads and record objects directly connected to root, which is fast;
- Concurrent marking: Enable both GC and user threads, using a closure structure to record reachable objects. At the end of this phase, however, the closure structure is not guaranteed to contain all currently reachable objects. Because the user thread may constantly update the reference field, the GC thread cannot guarantee real-time accessibility analysis. So the algorithm keeps track of where these reference updates happen.
- Relabelling: The relabelling phase is to correct the mark record of the part of the object that the mark changes because the user program continues to run during the concurrent marking phase. The pause time of this phase is usually slightly longer than the initial marking phase, and much shorter than the concurrent marking phase
- Concurrent cleanup: The user thread is started and the GC thread begins to clean the unmarked area.
Main advantages: Concurrent collection, low pauses. But it has the following three obvious disadvantages:
- Sensitive to CPU resources;
- Unable to handle floating garbage;
- The use of
Mark-clear
The algorithm will result in a large amount of space debris at the end of the collection.
The CMS provides some optimization Settings, you can set a defragmentation after the CMS is completed, and you can set the number of CMS defragmentation after the CMS is collected.
Parameter control:
-XX:+UserConcMarkSweepGC
Use the CMS garbage collector-XX:CMSInitatingPermOccupancyFraction
Set the threshold-XX:ConcGCThreads
Limit the number of threads-XX:+UseCMSCompactAtFullCollection
After setting up the CMS, do a defragmentation-XX:CMSFullGCsBeforeCompaction
Set the number of CMS collections to defragment after
G1(Garbage First)
The G1(Garbage First) Garbage collector is one of the most advanced Garbage collection technologies today. As early as JDK7, it has joined the collector family of JVM and become HotSpot’s focus on garbage collection technology.
The G1 collector maintains a priority list behind the scenes, each time choosing the Region with the greatest collection value based on the allowed collection time. Including: Eden, Survivor, Old, and Humongous.
Humongous is a special Old type that reclaims the idle giant partition for large objects. This partitioning means that there is no need for a contiguous memory space management object. G1 divides the space into multiple zones, giving priority to those with the most garbage collection. An object may not be in the same Region as the object referenced within it. When garbage collection occurs, does the entire heap memory need to be scanned for a complete reachability analysis?
Of course not. Each Region has a Remembered Set that records the locations of objects referenced by all objects in the Region so that during reachablity analysis, Simply add Remembered Set to GC Roots to prevent traversal of all heap memory.
Like CMS garbage collector, G1 is also a garbage collector with minimal latency. It is also suitable for garbage collection of large heap memory, and it is officially recommended to use G1 instead of CMS. The biggest feature of G1 is that it introduces the idea of partitioning, weakens the concept of generation, rationally uses the resources of each cycle of garbage collection, and solves many defects of other collectors and even CMS.
The G1 collector operates in the following steps:
- Initial tagging: The initial tagging phase simply marks the objects that GC Roots can directly associate with and changes the value of TAMS so that new objects can be created in the correct available Region when the user program runs concurrently in the next phase. This phase requires thread pauses, but is very short in time.
- Concurrent marking: The concurrent marking phase starts with GC Root and analyzes the reachability of objects in the heap to find viable objects. This phase is time-consuming, but can be performed concurrently with user programs.
- Final mark: In the final flagged phase, it is used to correct flagged changes in the user’s application during concurrent flagged. The virtual machine recorded in the thread Remenbered Set Logs that it was about this time. The final marking phase needs to merge the data of the Remembered Set Logs into the Remembered Set Logs. This phase requires a pause thread, but can be executed in parallel.
- Screening and collection: Finally, in the screening and collection stage, the recovery value and cost of each Region are sorted first, and the recovery plan is made according to the expected GC pause time of users.
G1 can take full advantage of The hardware advantages of multi-CPU, multi-core environment, using multi-CPU (CPU or CPU core) to shorten The stop-the-world pause time, some of The other collectors would have to Stop GC actions performed by Java threads. The G1 collector can still allow Java programs to continue executing concurrently.
In addition, as with other collectors, the concept of generation is retained in G1. Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it can work differently with newly created objects and old objects that have been around for a while and survived multiple GC’s for better collection results.
Spatial integration: Different from CMS’s mark-clean algorithm, G1 is a collector based on mark-clean algorithm as a whole and a replication algorithm based on local (between two regions) implementation. Both algorithms mean that G1 does not generate memory space fragmentation during operation, however. Collection provides neat free memory. This feature helps programs run for a long time and allocate large objects without triggering the next GC prematurely because contiguity memory space cannot be found.
A predictable pause: This is another advantage G1 has over CMS. Reducing pause time is the common focus of G1 and CMS. However, G1 can not only pursue low pause, but also establish a predictable pause time model, allowing users to specify clearly within a time segment with a length of M milliseconds. It is almost characteristic of real-time Java (RTSJ) to spend no more than N milliseconds on garbage collection.
Parameter control: -xx :+UseG1GC.
summary
Seven common collectors are described in this paper: Serial, ParNew, Parallel Insane, Serial Old, Parallel Old, CMS, G1. And their location indicates whether they belong to a Cenozoic or an old age collector:
- Cenozoic collectors: Serial, ParNew, Parallel Scavenge;
- Collector: Serial Old, Parallel Old, CMS;
- Whole heap collector: G1;
Depending on the collection region (young or old) and the characteristics of the collector itself, there can be the following combinations: Serial/Serial Old, Serial/CMS, ParNew/Serial Old, ParNew/CMS, Parallel/Serial Old, Parallel/Parallel Old, G1.
ZGC has come. ZGC is the latest garbage collector to be released in JDK11. There is no concept of generation at all. Officially, the advantages of ZGC are fragle-free, time-controlled, and super-large heap. The reader can try to understand and use the ZGC.
Recommended reading
The most comprehensive JVM interview knowledge series
Subscribe to the latest articles, welcome to follow my official account
reference
- From serial to parallel, from parallel to distributed
- JVM Garbage First(G1) Garbage collector