CMS is the garbage collector of the old days
Concurrent Mark Sweep
- The CMS and other garbage collectors have been covered briefly.
CMS Theory
The way trash is judged
- GC Roots are generally used to determine garbage, rather than reference counters, which have cyclic reference problems.
Enumeration root node
- Instead of having a reference location that checks all execution contexts and globally when the execution system stops, the virtual machine should have a way to know directly where object references are stored. In HotSpot’s implementation, a set of data structures called OopMap are used for this purpose.
safer
- With the help of OopMap, HotSpot can do GC Roots enumerations quickly and accurately, but there is a real problem: there are a lot of instructions that can cause reference relationships to change, or that can cause OopMap content to change. If you generate an OopMap for each instruction, That would require a lot of extra space, which would make the cost of GC space even higher.
- In fact, HotSpot does not generate an OopMap for each instruction, but records the information that changes to the OopMap at “specific locations”, called safepoints, where the program does not always pause to start GC. Pause only when a safe point is reached.
- Safepoint selections should neither be so small that GC waits too long, nor so frequent that they unduly overload the runtime. So, safe point selected is basically “whether has the characteristics of let the program execution for a long time” as a standard for the selected – because each instruction execution time is very short, program is unlikely because the instruction stream length is too long for this reason and long time running, “long time” the most obvious feature is the instruction sequence reuse, Such as method calls, loop jumps, exception jumps, etc., which is why Safepoint is produced by instructions with these features.
- Another consideration for Safepoint is how to make all threads (except those that execute JNI calls here) “run” to the nearest Safepoint and then pause when GC occurs: Preemptive Suspension and Voluntary Suspension
Preemptive interrupt
- It does not require the execution code of the thread to actively cooperate. It interrupts all threads when GC occurs, and if any thread interrupts where it is not at the safe point, it recovers the thread and lets it “run” to the safe point.
Active interrupt
- When the GC needs to interrupt a thread, it does not operate directly on the thread, but simply sets a flag, which each thread actively polls during execution, and suspends itself when it finds that the interrupt flag is true. Polling flags overlap with safe points. In addition to allocating memory for creating objects, GC is usually triggered when creating objects, so it’s ok to set them to safe points. In short, active interrupts are automatically stopped when a thread reaches a certain flag.
Few virtual machines now use preemptive interrupts to suspend threads in response to GC events.
The safety area
- Safepoint seems to have solved the problem of how to get into GC perfectly, but that’s not always the case. The Safepoint mechanism ensures that when a program executes, it will not take too long to encounter a Safepoint ready for GC. But what if the program is “not executing”? A program that is not executing is not allocating CPU time, typically in a state called Sleep or Blocked, where the thread is unable to respond to interrupt requests from the JVM, and the JVM is obviously less likely to wait for the thread to reallocate CPU time. In this case, a SafeRegion is needed.
- When a thread executes code in the Safe Region, it first identifies itself as having entered the Safe Region, so that the JVM does not have to worry about the thread that identifies itself as Safe Region when it initiates a GC during that time. When a thread leaves the Safe Region, it checks to see if the system has completed the root node enumeration (or the entire GC process), and if so, the thread continues, otherwise it must wait until it receives a signal that it is Safe to leave the Safe Region. In general, during GC, you don’t need to worry about threads in the zone, but when a thread leaves the zone, you need to make sure that the GC has completed before leaving the zone.
Contents of the CMS garbage collector
The main contents of CMS
- A Concurrent Mark Sweep (CMS) collector aims at obtaining the shortest recovery pause time. Most of the CMS collectors are applied to servers on Internet sites or B/S systems.
- CMS is based on a mark-clean algorithm,The whole process is divided into four steps:
- CMS Initial Mark
- CMS Concurrent Mark
- Re-marking (CMS Remark)
- CMS Concurrent sweep
- The initial marking and re-marking steps still need “StopThe World”;
- The initial tag just marks objects that GC Roots can be directly associated with, which is fast;
- Concurrent tagsStage is the process of GC Roots Tracing.
- It marks which objects are not garbage, traversing the reference chain.
- Because it executes concurrently, meaning that the tag thread and the user thread execute together, there can be a problem. The content that was tagged can change, for example, if the tag thread marks an object as not garbage, then the user thread no longer uses the object that is not garbage. What was once not garbage becomes garbage, so there is the relabeling phase below.
- The re-marking phase corrects the marking record of the part of the object that is marked during the concurrent marking because the user program continues to operate. The pause time of this phase is generally slightly longer than that of the initial marking phase, but much shorter than that of the concurrent marking.
- advantages
- Concurrent Low PauseCollector (Concurrent Low PauseCollector)
- disadvantages
- The CMS collector is very sensitive to CPU resources.
- The CMS collector is unable to handle Floating Garbage and may fail “ConcurrnetMode Failure”, resulting in another Full GC. If in the application of Laos s growth is not fast, can raise the proper parameters – XX: CMSInitiatingOccupancyFraction value to improve trigger percentage, in order to reduce memory recovery times so as to obtain better performance. If the memory set aside during the CMS run does not meet the program’s needs, the virtual machine will start a backup: temporarily enable the Serial Old collector to redo the Old garbage collection, resulting in long pauses. So parameter – XX: CMSInitiatingOccupancyFraction set too high could easily lead to a large number of “Concurrent ModeFailure” failure, performance reduce instead.
- At the end of collection, a large amount of space debris will be generated. When space debris is too much, it will bring great trouble to the allocation of large objects. Often, there is a large amount of space left in the old age, but it cannot find a large enough continuous space to allocate the current object, so Full GC has to be carried out in advance.
- CMS collector provides a XX: + UseCMSCompactAtFullCollection switch parameters (the default is open), is used to hold to undergo a Full GC CMS collector, open the memory fragments merging sorting process, it is impossible to concurrent process of memory consolidation, The space debris problem was gone, but the pause had to be prolonged.
- For applications with large heap sizes, GC times are difficult to predict.
Space allocation guarantee
- Before Minor GC occurs, the virtual machine checks to see if the maximum available contiguous space of the old generation is greater than the total space of all objects of the new generation. If this condition is true, then Minor GC is guaranteed to be safe. When a large number of objects are still alive after the Minor GC, the old generation is required to guarantee space allocation, and objects that are unable to fit into Survivor are brought directly into the old age. If the old age determines that there is not enough space left (based on the average of the object capacity of each previous collection promotion to the old age as a rule of thumb), a Full GC is performed. In short, the old chrono space must be continuous, and must fit objects, otherwise Full GC.
Detailed steps of CMS
The total is subdivided into seven stages
- Phase 1: Initial Mark
- Phase 2: Concurrent Mark
- Phase 3: Concurrent Preclean
- Phase 4: Concurrent Abortable Preclean
- Phase 5: Final Remark
- Phase 6: Concurrent Sweep
- Phase 7: Concurrent Reset
Either Concurrent steps will STW, or those beginning Concurrent will not STW
Phase 1: Initial Mark
- This is one of CMS’s two stop-the-world events.
- The goal of this phase is to mark all objects that are referenced directly by GC Roots or by young generation survivable objects.
Don’t forget that CMS is an old-time garbage collector.
Phase 2: Concurrent Mark
- At this stage the Garbage Collector traverses the ages and then marks all surviving objects, which it traverses based on the GC Roots found in the previous stage. Concurrent marking phase, which runs concurrently with the user’s application. Not all surviving objects of the old age are marked, as the user’s program may change some references during marking.
The object reference marked with a black circle changes when compared to the diagram in phase one.
Phase 3: Concurrent Preclean
- This is also a concurrency phase, running concurrently with the application thread without stopping the application thread. Some object references may change during concurrent running, but when this happens, the JVM marks the area (Card) containing the object as Dirty, also known as Card Marking.
- In the Preclean phase, objects that can be reached from Dirty objects are also marked, after which the Dirty Card tag is cleared.
Phase 4: Concurrent Abortable Preclean
- This is also a concurrent phase, but again does not affect the user’s application thread, and this phase is intended to take on as much of the final marking phase in STW (Stop-the-world) as possible. The duration of this phase depends on a number of factors, as it is the repetition of many of the same tasks until certain conditions are met (for example, number of iterations, amount of work completed, or clock time).
Phase 5: Final Remark
- This is the second STW phase, and the last in the CMS. The goal of this phase is to mark all the old age living objects. Since the previous phase was executed concurrently, the GC thread might not be able to keep up with the changes in the application, so the STW is necessary to accomplish this goal.
- Usually, the Final Remark phase of CMS will be run when the young generation is as clean as possible, in order to reduce the possibility of continuous STW (if there are too many living objects in the young generation, there will be too many living objects involved in the old generation). This stage will be a little more complicated than the previous ones.
The above five stages are the marking stage, and the following is the clearing stage.
Phase 6: Concurrent Sweep
- STW is not required here, it is shipped with the user’s application.
- This phase is to clear out objects that are no longer used and reclaim their space for future use.
Phase 7: Concurrent Reset
- This phase, also executed concurrently, resets the data structures inside the CMS in preparation for the next GC.
summary
- CMS does an excellent job of reducing STW time by spreading out a lot of work into the concurrent processing phase, but CMS also has some other problems, which are the potential for excessive memory fragmentation.
- The whole idea of the collector is to mark most of the garbage concurrently, using STW to accurately correct the garbage mark.
The code illustrates the CMS
Code and VM parameters
- The following are VM parameters
-Xms20M // The minimum of the heap
-Xmx20M // The minimum of the heap
-Xmn10M // Young generation size
-XX:+PrintGCDetails // Prints GC logs
-XX:SurvivorRatio=8 // Eden accounts for the younger generation 8
-XX:+UseConcMarkSweepGC // The CMS garbage collector is used in the old days
Copy the code
- code
public class CMSTest {
public static void main(String[] args) throws InterruptedException {
int size = 1024*1024;// 1M
byte[] bytes = new byte[4*size];
System.out.println("Create 4M for the first time");
byte[] bytes1 = new byte[4*size];
System.out.println("Create 4M for the second time");
byte[] bytes2 = new byte[4*size];
System.out.println("Create 4M for the third time");
//Thread.sleep(2500);
byte[] byte3 = new byte[3*size];
System.out.println("Create 3M for the fourth time"); }}Copy the code
You can also observe that CMS is parallel, because CMS log output is arbitrary, and most of the steps in CMS do not affect the running of the program. So print statements and CMS logs are not printed in the same order.