Garbage collection for JVM – garbage collector
Serial collector Features Application scenario Setting parameters ParNew collector (multi-threaded version of Serial collector – use multiple threads for GC) Features Application scenario setting parameters Why only ParNew can work with CMS collector Parallel……
Welcome to visit my blind few whole site: Copy the future
If the collection algorithm described above (garbage collection for the JVM) is an abstract strategy for memory collection, then the garbage collector is a concrete implementation of memory collection.
The JVM specification has no rules about how the garbage collector should be implemented, so the garbage collector is provided by different vendors and different versions of virtual machines, so let’s just look at HotSpot virtual machines.
Just as there is no best algorithm, there is no best garbage collector, only the best fit. All we can do is choose the best garbage collector for the specific application scenario.
The Serial collector is the most basic and oldest garbage collector (the new generation uses the copy algorithm, the old generation uses the flag collation algorithm). As you can see from the name, this collector is a single-threaded collector.
Not only does it use only one garbage collection thread to complete The garbage collection, but it must suspend all other worker threads (” Stop The World “: suspend all threads that The user is working on) until The garbage collection is complete. Look at the picture and understand:
Above:
- The new generation uses The copy algorithm, Stop-the-world
- In The old days, The mark-tidy algorithm, Stop-the-world, was used
While it causes stop-the-world when it does GC work, just as every algorithm exists for a reason, so does The serial collector: Because it is simple and efficient (compared to the single-threaded performance of other collectors), for a single-CPU-limited environment, concentrating on GC naturally achieves the highest single-threaded efficiency without the overhead of thread interaction. So the Serial collector is a good choice for applications running in client mode (it is still the default collector for the next generation of virtual machines running in Client mode). The disadvantages of Serial collectors are obvious, and virtual machine developers are certainly aware of this. So I’ve been cutting back on Stop The World. The pause times get shorter in subsequent garbage collector designs (but there are still pauses, and the search for the best garbage collector continues)
To recap what we’ve learned about the Serial collector
The characteristics of
- Collectors for the new generation;
- Copy algorithm is adopted;
- Single-thread collection;
- When garbage collection is done, all worker threads must be paused until complete; “Stop The World”;
Application scenarios
- It is still the default new generation collector for HotSpot in Client mode.
- There are also advantages over other collectors:
- Simple and efficient (compared to the single-threaded of other collectors);
- For environments limited to a single CPU, the Serial collector achieves the highest single-thread collection efficiency without thread interaction (switching) overhead;
- In a user’s desktop application scenario, the available memory is generally small (tens of megabytes to one or two hundred megabytes) and garbage collection can be completed in a relatively short time (tens of MS to more than one hundred MS), as long as it does not occur frequently, this is acceptable
Set the parameters
Add this parameter to explicitly use the serial garbage collector: "-xx :+UseSerialGC"Copy the code
The ParNew collector is essentially a multithreaded version of the Serial collector, with the same behavior (control parameters, collection algorithms, collection strategies, and so on) as the Serial collector, except that it uses multiple threads for garbage collection. It is the primary choice for many virtual machines running in Server mode, and is currently the only one that works with the CMS collector in addition to the Serial collector. The CMS collector is considered an epoch-making concurrent collector, so if a garbage collector can be paired with it to make it even better, it must be an integral part. The running process of the collector is shown below:
The characteristics of
- With the exception of multithreading, the behavior and characteristics of the Serial collector are the same.
- For example, Serial collector can control parameters, collection algorithm, Stop The World, memory allocation rules, reclaim strategy, etc.
- The Serial collector shares a lot of code;
Application scenarios
In Server mode, the ParNew collector is a very important collector because it is currently the only one besides Serial that works with the CMS collector; However, in a single CPU environment, it is no better than the Serail collector because of the thread interaction overhead.
Set the parameters
"-xx :+UseConcMarkSweepGC" enforces the use of ParNew: "-xx :+UseParNewGC" specifies the number of garbage collection threads. ParNew Specifies the number of garbage collection threads that are enabled by default." -xx :ParallelGCThreads"Copy the code
Why does only ParNew work with the CMS collector
- CMS is HotSpot’s first truly Concurrent collector in JDK1.5, the first to allow garbage collection threads to work (basically) concurrently with user threads;
- CMS is an older collector that does not work as Parallel Scavenge with JDK1.4.
- The Parallel Insane (and G1) do not use the traditional GC collector code framework and are implemented independently; The other collectors share some of the framework code;
The Parallel Scavenge collector is a new generation collector that uses replication algorithms as well as the Parallel multithreaded collector.
The Parallel Scavenge collector focuses on throughput (how to use the CPU efficiently). Garbage collectors such as CMS focus more on the pause times of user threads (improving user experience). Throughput is the ratio of the CPU time spent running user code to total CPU consumption. Throughput: ratio of CPU time spent on user code to total CPU consumption, i.e. = time spent running user code /(time spent running user code + garbage collection time). For example, if the virtual machine runs for 100 minutes and garbage collection takes 1 minute, the throughput is 99%.
Operation diagram:
The Parallel Collector provides a number of parameters to find the most appropriate pause times or maximum throughput. Rather than manually optimize the collector, you can leave memory management optimization to the virtual machine if you don’t know how the collector operates.
The characteristics of
- Cenozoic collector;
- Copy algorithm is adopted;
- Multithreaded collection;
- Collectors such as CMS focus on minimizing the pause time of user threads during garbage collection; The goal of the Parallel Insane is to achieve a controlled Throughput.
Application scenarios
- The goal of high throughput is to reduce garbage collection time and allow user code to run longer;
- When the application runs on multiple cpus and does not have a particularly high pause time requirement, that is, the program mainly performs calculations in the background without much interaction with the user;
- For example, applications that perform batch processing, order processing (reconciliation, etc.), payroll, scientific calculations;
Set the parameters
The Parallel Avenge collector provides two parameters for precise throughput control:
Controls the maximum garbage collection pause time
"-XX:MaxGCPauseMillis"
Copy the code
- Controls the maximum garbage collection pause time, the number of milliseconds greater than 0;
- If MaxGCPauseMillis is set to a smaller size, pause times may decrease, but throughput may also decrease. Because garbage collection may occur more frequently;
Set the ratio of garbage collection time to total time
"-XX:GCTimeRatio"
Copy the code
Set the ratio of garbage collection time to total time, 0 < n < 100 integer; GCTimeRatio is equivalent to setting the throughput size. The ratio of garbage collection execution time to application execution time is calculated as 1 / (1 + n). For example, the -xx :GCTimeRatio=19 option sets garbage collection time to 5% of the total time =1 /(1+19); The default value is 1% = 1/(1+99), that is, n=99;
The time spent in garbage collection is the total time collected by the younger generation and the older generation; If the throughput goal is not met, the memory size of the generation is increased to maximize the time the user program runs;
Adaptive GC regulation Strategy (GC Ergonomics)
There is another parameter:
"-XX:+UseAdptiveSizePolicy"
Copy the code
With this parameter enabled, there is no need to manually specify details such as:
- The size of the Cenozoic (- Xmn), Eden and Survivor area ratio (- XX: SurvivorRation), promotion of the old s objects age (- XX: PretenureSizeThreshold);
- The JVM collects performance monitoring information based on current system performance and adjusts these parameters dynamically to provide the most appropriate pause times or maximum throughput, a method called GC adaptive tuning (GERGonomics).
- Here’s a recommended approach:
- (1), just set the size of memory data (such as “-xmx” to set the maximum heap);
- (2) Then use “-xx :MaxGCPauseMillis” or” -xx :GCTimeRatio” to set an optimization target for the JVM;
- (3) The tuning of those detailed parameters is done by the JVM adaptively; This is an important difference between the Parallel Scavenge collector and the ParNew collector.
Serial Old collector
An older version of the Serial collector, which is also a single-threaded collector. It is used primarily for two purposes: as a companion to the Parallel Scavenge collector in JDK1.5 and earlier releases, and as a fallback to the CMS collector
The characteristics of
- For the old age;
- A “Mark-sweep-compact” algorithm was adopted.
- Single-thread collection; Serial/Serial Old collector running schematic is shown above.
Application scenarios
- It is mainly used in Client mode.
- In Server mode there are two main uses:
- To be used with the Parallel Scavenge avenge, on or before JDK1.5;
- (B) as a backup plan for CMS collector in the event of Concurrent Mode Failure of Concurrent collection;
An older version of the Parallel Exploiter. Use multithreading and mark-tidy algorithms. The Parallel Avenge and Parallel Old collectors are preferred in applications where throughput and CPU resources are important. Only available in JDK1.6.
The characteristics of
- For the old age;
- The “mark-collation – compression” algorithm is adopted.
- Multithreaded collection; The Parallel Avenge /Parallel Old collector is illustrated as follows:
Application scenarios
- JDK1.6 and later used to replace the Serial Old collector;
- Especially in Server mode with multiple cpus;
- The Parallel Insane and Parallel Old collector is the result of the application of the Insane in throughput and CPU-sensitive scenarios.
Set the parameters
-xx :+UseParallelOldGCCopy the code
The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. It is ideal for use in ux focused applications.
The characteristics of
- For the old days
- Based on “mark-clean” algorithm (no compression operation, memory fragmentation)
- Aim to obtain the shortest collection pause time
- Concurrent collection, low pause
- You need more memory
CMS is HotSpot’s first truly Concurrent collector in JDK1.5; For the first time, garbage collection threads work (basically) at the same time as user threads;
Application scenarios
- Scenarios with a lot of user interaction; (such as common WEB, B/S – browser/server mode system on the server application)
- Expect the system to stop the shortest time, pay attention to service response speed;
- To give users a better experience;
CMS collector process
As the word Mark Sweep in its name implies, the CMS collector is implemented as a mark-and-sweep algorithm, which is a bit more complex than the previous garbage collectors. The whole process can be divided into four steps:
- Initial tag: Suspends all other threads. Initial tag only marks objects that GC Roots can be directly associated with, which is fast;
- Concurrent tags
- Concurrent marking is the process of GC Roots Tracing;
- Start both GC and user threads, using a closure structure to log reachable objects. At the end of this phase, however, the closure structure is not guaranteed to contain all currently reachable objects. Because the user thread may be constantly updating the reference field, the GC thread cannot guarantee real-time accessibility analysis. So the algorithm keeps track of where these references are being updated;
- Re-mark: the re-mark stage is to correct the mark record of the part of the object whose mark is changed because the user program continues to run during the concurrent mark period (multi-threaded parallel execution is adopted to improve efficiency); You need to “Stop The World”, and The pause is slightly longer than The initial tag, but much shorter than The concurrent tag;
- Concurrent cleanup: the user thread is started, and the GC thread starts cleaning the marked area and reclaiming all garbage objects.
The collector thread can work with the user thread because of the longest concurrent markup and concurrent cleanup process. So in general, CMS memory reclamation is performed “concurrently” with the user thread.
The CMS collector runs as follows:
Set the parameters
Specify CMS collector "-xx :+UseConcMarkSweepGC"Copy the code
disadvantages
(1) Sensitive to CPU resources
Programs designed for concurrency are CPU sensitive (a characteristic of concurrent programs). In the concurrent phase, it does not cause user threads to pause, but it does slow down the application and reduce overall throughput by taking up a portion of the threads (or CPU resources). (In an accounting system, the CMS collector is not appropriate).
CMS default number of collection threads =(number of cpus + 3)/4; As the number of cpus increases, the number of reclaimed threads consumes less CPU. That is, when there are more than 4 cpus, the concurrent collection of garbage collection threads is not less than 25% of the CPU resources, which may have a greater impact on user programs. Less than four, the impact is greater and may be unacceptable. (For example, if CPU=2, then a thread reclamation is initiated, accounting for 50% of the CPU resources.) (A collection thread will occupy CPU resources for the duration of the collection)
For this situation, “Incremental Concurrent Mark Sweep” (I-CMS) occurred; Similar to the idea of using preemption to simulate multitasking mechanism, let the collection thread and user thread run alternately, reduce the running time of the collection thread; But the effect is not ideal, JDK1.6 after the official no longer advocate users.
(2) Floating garbage cannot be processed
Floating garbage cannot be processed and may result in a “Concurrent Mode Failure”.
The solution
This makes it necessary to reserve a certain amount of memory space for concurrent cleanup, unlike other collectors, which can almost fill up the old years and then collect; You can also assume that the CMS needs more space than other garbage collectors; Can use “- XX: CMSInitiatingOccupancyFraction”, set the CMS reserved old s memory space; (See noun explanation for details)
(3) generate a large number of memory fragments
Because CMS is based on the “mark + clear” algorithm to recover old objects, it will cause a lot of space debris problems after running for a long time, which may lead to the failure of promotion of new generation objects to old generation.
Due to excessive fragmentation, the allocation of large objects can be troublesome. So you have a situation where the old generation has a lot of free space, but there is no contiguous space to allocate the current object, and you have to trigger a Full GC beforehand.
The solution
Use “- XX: + UseCMSCompactAtFullCollection” and “- XX: + CMSFullGCsBeforeCompaction”, need to combine.
UseCMSCompactAtFullCollection
"-XX:+UseCMSCompactAtFullCollection"
Copy the code
In order to solve the problem of space debris, the CMS provide the collector – XX: + UseCMSCompactAlFullCollection logo, setting the CMS above this happens without Full GC, and open the memory fragments merging finishing process;
- However, the merge and collation process cannot be concurrent and the pause time will be longer.
- The default open (but not, need to combine CMSFullGCsBeforeCompaction use);
CMSFullGCsBeforeCompaction
Since merge collation cannot be performed concurrently, the space fragmentation problem is gone, but has resulted in continuous pauses. So, you can use another parameter – XX: CMSFullGCsBeforeCompaction, said in many, many times without compression after Full GC, compression of space debris.
It can reduce the pause time of the consolidation process. The default is 0, which means that Full GC is performed every time and no collation is done;
Since space is no longer contiguous, CMS needs to use the available “free list” memory allocation method, which is more expensive than the simple practical “collision pointer” memory allocation.
CMS&Parallel Old
Overall, CMS reduces the application pause time when performing old-era garbage collection compared to Parallel Old garbage collector.
However, it increases the application pause time for new generation garbage collection, reduces throughput, and takes up more heap space. (Reason: THE CMS saves time by not collating memory, but the available space is no longer contiguous, and garbage collection can no longer simply use Pointers to the next address available to allocate memory for an object. Instead, in this case, you need to use a list of available Spaces. That is, a list pointing to the unallocated region is created, and each time memory is allocated for an object, an area of memory of the appropriate size is found from the list to allocate memory for the new object. As a result, the memory allocation on the old age is more expensive than the simple utility collision pointer allocation. This also adds an additional burden to the young generation garbage collection, since most objects in the old generation are promoted from the new generation to the old generation at the time of the new generation garbage collection. When the new generation cannot allocate large objects, it allocates them to the old generation.
The previous generation of garbage collectors (Serial, parallel, and CMS) divided the heap memory into three parts of fixed size: The young generation, the old generation, and the permanent generation.
Note: All objects in heap memory can be considered Java objects.
G1 (garbage-first) is the commercially available collector of JDK7-U4.
G1 (garbage-First) is a server-based Garbage collector, mainly for machines equipped with multiple processors and large memory capacity. High throughput performance characteristics while meeting the GC pause time requirements with extremely high probability. Is seen as an important evolutionary feature of the HotSpot virtual machine in JDK1.7.
G1 is intended to replace CMS in the future and has become the default collector in JDK1.9.
The characteristics of
Parallelism and concurrency
The G1 takes full advantage of The hardware advantages of cpus and multi-core environments, using multiple cpus (cpus or CPU cores) to shorten stop-the-world pause times. While other collectors would have paused GC actions performed by Java threads, the G1 collector can still allow Java programs to continue executing concurrently.
Generational collection
Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it retains the concept of generations.
- The ability to manage the entire GC heap (young and old) independently without needing to be paired with other collectors;
- Being able to deal with objects of different eras in different ways;
- While the generational concept remains, the memory layout of the Java heap varies considerably;
- Divide the whole heap into independent regions of equal size.
- Cenozoic and oleozoic are no longer physically separate; they are collections of regions (which do not need to be continuous);
Spatial integration
Different from CMS’s “mark-clean” algorithm, G1 is a collector based on “mark-clean” algorithm as a whole. Locally, it is based on a “copy” algorithm.
- As a whole, it is based on mark-collation algorithm.
- Locally (between two regions), it is based on the replication algorithm.
- This is an implementation of a train-like algorithm;
- Will not generate memory fragmentation, is conducive to a long time running; The train algorithm is the algorithm used by generational collectors to provide time-limited progressive collections in mature object space. This will be covered in a later article.)
Predictable pauses
This is another big advantage G1 has over CMS. Reducing pause times is a common focus for both G1 and CMS, but G1 models predictable pause times in addition to pursuing low pauses. You can explicitly specify that garbage collection takes no more than N milliseconds in a time slice of M milliseconds. High throughput with low pauses.
The problem
Why can G1 achieve predictable pauses
- Region-wide garbage collection in the Java heap can be systematically avoided;
- The G1 collector divides memory into independent regions of equal size, with the concepts of new generation and old generation retained, but no longer physically isolated.
- G1 tracks the value of each Region and maintains a priority list in the background.
- According to the allowed collection time, the Region with the highest value (garbage-first) is reclaimed First.
This ensures the highest possible collection efficiency in a limited time;
The problem of an object being referenced by different regions
A Region cannot be isolated. Objects in a Region can be referenced by objects in any Region. Do YOU need to scan the entire Java heap to determine whether an object is alive?
In other generational collectors, the problem is also present (and more so in G1) : collecting the new generation also has to scan the old generation? This reduces the efficiency of the Minor GC;
Solutions:
For both G1 and other generational collectors, the JVM uses Remembered Set to avoid global scans:
Each Region has a Remembered Set.
Each time a Reference data Write operation is performed, a Write Barrier operation is generated.
Then check whether the Reference to be written refers to an object in a different Region from the Reference type data (other collectors: check whether old objects refer to new ones).
If not, the related references are recorded in the Remembered Set of the Region where the reference points to the object through CardTable.
When garbage collection is performed, add the enumeration scope of the GC root to Remembered Set.
You can guarantee that no global scan will be done, and there will be no omissions.
Application scenarios
- Service-oriented applications, for machines with large memory, multi-processor;
- The primary application is to provide a solution for applications that require low GC latency and have a large heap;
- For example, when the heap size is about 6GB or larger, predictable pause times can be less than 0.5 seconds; (Practice: change CMS garbage collector to G1 in the reconciliation system and reduce the reconciliation time by more than 20 seconds)
When using the G1 garbage collector is better than CMS, consider the following (but not absolute) points:
- More than 50% of the Java heap is occupied by active data;
- The lifting frequency of the object assignment frequency or decade varies greatly;
- The GC pause time is too long (longer than 0.5 to 1 second);
Advice:
- If there are no problems with the current collector, don’t rush to G1;
- If your application is looking for low pauses, try G1;
- Whether or not to replace the CMS will only be known if you need actual scenario testing. (If you find that G1 performance is not as good as CMS, choose CMS.)
Set the parameters
You can use the following parameters to set some g1-related configurations.
Specify using the G1 collector: "-xx :+UseG1GC" to start the concurrent marking phase when the entire Java heap usage reaches the parameter value; Defaults to 45: "- XX: InitiatingHeapOccupancyPercent" set pause time target for G1, the default value is 200 milliseconds: "- XX: MaxGCPauseMillis" set each Region size, range 1 MB to 32 MB. "-xx :G1HeapRegionSize" New generation minimum, default 5%:" -xx :G1NewSizePercent" New generation maximum, default 60%: "-xx :G1MaxNewSizePercent" Sets the number of concurrent GC threads during STW:" -xx :ParallelGCThreads" Sets the number of concurrent GC threads during the marking phase: "-xx :ConcGCThreads"Copy the code
Operation process
Maintaining Remembered Set without counting can be divided into four steps (similar to CMS).
1. Initial Marking
Mark only objects to which GC Roots can be directly associated;
Next Top at Mark Start (TAMS) is modified so that when the Next stage is run concurrently, the user program can create new objects in the correct available Region.
You need to “Stop The World”, but very fast;
2. Concurrent Marking
It takes a long time to carry out the reachable analysis from GC Roots to find the living objects. It can be executed concurrently with the user thread and cannot guarantee that all the living objects can be marked. (New live objects are generated during analysis)
3. Final Marking
Fixed the mark record of the part of the object whose mark changed during the concurrent mark phase as the user thread continued running.
Changes made to objects in the last phase are recorded in the thread Remembered Set Log.
Merge the Remembered Set Log into the Remembered Set;
You need to “Stop The World”, and The pause is slightly longer than The initial tag, but much shorter than The concurrent tag;
G1 uses multi-threaded parallel execution to improve efficiency; Snapshot-at-the-beginning (SATB) is a faster initial Snapshot algorithm than CMS.
4. Live Data Counting and Evacuation
Firstly, the recovery value and cost of each Region are sorted.
Then make a collection plan based on the expected GC pause time of the user;
Finally, recycle garbage objects in some high-value regions according to plan;
The “copy” algorithm is used to copy living objects from one or more regions to another empty Region on the heap, and compress and release memory in the process.
Can be done concurrently, reducing pause times, and increasing throughput;
Diagram of G1 collector running
G1 in the marking process, the object activity of each region is calculated. When collecting, the region with low activity can be selected according to the pause time set by the user, which can ensure garbage collection and pause time without reducing throughput too much. The application of the new algorithm in the Remark stage and the compression in the collection process make up for the deficiency of CMS.
To quote the Oracle website: “G1 is planned as the long term replacement for the Concurrent mark-sweep Collector (CMS)”. G1 is planned as a long-term alternative to the concurrent mark-sweep collector (CMS)