There is a “high wall” between Java and C++ surrounded by dynamic memory allocation and garbage collection techniques. Those on the outside want to get in, but those on the inside want to get out
3.1 an overview of the
When it comes to Garbage Collection (GC), most people think of this technology as a companion to the Java language. In fact, GC is older than Java; Lisp, born at MIT in 1960, was the first language to really use dynamic memory allocation and garbage collection techniques. When Lisp was still embryonic, people were thinking about three things that GC needed to accomplish:
What memory needs to be reclaimed?
When is it recycled?
How to recycle?
After more than half a century of development, the technology of dynamic memory allocation and memory reclamation has been quite mature, everything seems to have entered the era of “automation”, so why do we need to understand GC and memory allocation? The answer is simple: we need to monitor and tune these “automated” technologies when it comes to troubleshooting memory spills and leaks, and when garbage collection becomes a bottleneck for higher concurrency.
Step back from more than half a century ago to the present, to the familiar Java language. Chapter 2 introduces the various parts of the Java memory runtime region. The program counter, virtual machine stack, and local method stack are born and die with the thread. The stack frames in the stack methodically perform stack and stack operations as methods enter and exit. How much memory allocation in each stack frame is basically set in the class structure is known (though at runtime will some optimization by a JIT compiler, but the discussion in this chapter, based on a conceptual model, in general can be thought of as compile-time knowable), so this a few areas of memory allocation and recovery with certainty, There is no need to worry too much about reclamation in these areas, because when a method terminates or a thread terminates, the memory will follow. The Java heap and method area are different. Multiple implementation classes in an interface may require different memory, and multiple branches in a method may require different memory. We only know which objects are created while the program is running, and the allocation and reclamation of this memory are dynamic. The garbage collector focuses on this portion of memory, and the “memory” allocation and rollback discussed later in this chapter refer only to this portion of memory.
3.2 Is the object dead
The heap holds almost every object instance in the Java world, and the first thing the garbage collector needs to do before collecting the heap is to determine which of these objects are “alive” and which are “dead” (objects that can no longer be used in any way).
3.2.1 Reference Counting Algorithm
Many textbook algorithms for determining whether an object is alive are as follows: add a reference counter to the object and increment it by one each time a reference is made to it; When a reference is invalid, the counter value is reduced by 1; An object whose counter is 0 at any point in time cannot be used again. The author has interviewed many recent graduates and some developers with years of experience who give this answer to this question.
Objectively speaking, Reference Counting algorithm is simple to implement and has high judgment efficiency. It is a good algorithm in most cases, and there are some famous application cases. For example, Microsoft’s Component Object Model (COM) technology, FlashPlayer using ActionScript 3, Python language and Squirrel, widely used in the field of game scripting, all use reference counting algorithm for memory management. However, reference counting algorithms are not used to manage memory, at least in mainstream Java virtual machines, mainly because it is difficult to solve the problem of circular references between objects.
For a simple example, look at the testGC () method in Listing 3-1: Instance =objB and objb. instance=objA. Otherwise, there is no reference to these two objects. In fact, they are no longer accessible. As a result, their citation counts are not zero, and the reference counting algorithm cannot tell the GC collector to reclaim them.
Listing 3-1 defects of the reference counting algorithm
/ * * *testWill objA and objB be GC after the GC () method is executed? @author ZZM */ ReferenceCountingGC{public Object instance=null; Private static final int_1MB=1024*1024; Private byte[]bigSize=new byte[2*_1MB]; private byte[]bigSize=new byte[2*_1MB]; public static voidtestReferenceCountingGC objA=new ReferenceCountingGC (); ReferenceCountingGC objB=new ReferenceCountingGC (); ObjA. Instance = objB; ObjB. Instance = objA; ObjA = null; ObjB = null; // Can objA and objB be reclaimed if GC occurs on this line? System. The gc (); }}Copy the code
Running results:
[F u l l G C (S y S t e m) [t e n u r e d: 0K - > 2 0K (10 2 4 0K), 0.0 1 4 9 1 4 2 S e C S]4603K- > 460k (19456K), [Perm: 2999K- > 2999K], secs][Times: User =0.01 sys=0.00, real=0.02 secs] Heap def new generation total 9216K,used 82K[0x00000000055e0000, 0x0000000005fe0000, 0x0000000005FE0000) Eden Space 8192K, 1%used[0x00000000055e0000, 0x00000000055F4850, 0x0000000005de0000) from space 1024K, 0%used[0x0000000005de0000, 0x0000000005de0000, 0x0000000005EE0000) to space 1024K, 0%used[0x0000000005ee0000, 0x0000000005EE0000, 0x0000000005FE0000) Tenured Generation Total 10240K,used 210K[0x00000005FE0000, 0x00000000069E0000, 0x00000000069E0000) The space 10240K, 2%used[0x0000000005Fe0000, 0x0000000006014a18, 0x0000000006014c00, 0x00000000069E0000) Compacting Perm Gen Total 21248K,used 3016K[0x00000000069E0000, 0x0000000007eA0000, 0x000000000BDE0000 the space 21248K, 14%used[0x00000000069E0000, 0x0000000006cd2398, 0 x0000000006cd2400,0 x0000000007EA0000) No shared Spaces configured.Copy the code
It is clear from the run result that the GC log contains “4603K- > 210K”, which means that the virtual machine does not recycle the two objects because they reference each other. This also indicates that the virtual machine does not determine whether the objects are alive by reference counting algorithm.
3.2.2 Accessibility analysis algorithm
Mainstream implementations of the major commercial programming languages (Java, C#, and even the aforementioned archaic Lisp) call for Reachability Analysis to determine whether objects are viable. The basic thinking path of this algorithm is to search down from a series of objects called “GC Roots” as the starting point. The search path is called the Reference Chain. When an object is not connected to GC Roots by any Reference Chain (in terms of graph theory, If the object is unreachable from GC Roots, then the object is not available. As shown in Figure 3-1, object 5, Object 6, and Object 7 are related to each other, but they are not reachable to GC Roots, so they will be judged as recyclable objects.
In the Java language, objects that can be used as GC Roots include the following:
The object referenced in the virtual machine stack (the local variable table in the stack frame).
The object referenced by the class static property in the method area.
The object referenced by the constant in the method area.
Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack.
3.2.3 References again
Whether the number of references is judged by reference counting algorithm or whether the citation chain of an object is reachable by reachability analysis algorithm, the survival of an object is determined by reference. Prior to JDK 1.2, references were traditionally defined in Java: if the value stored in a reference type represented the starting address of another chunk of memory, that chunk represented a reference. This definition is very pure, but too narrow, under this definition, an object can only be referenced or not referenced in two states, for how to describe some “tasteless to eat, it is a pity to abandon” the object is powerless. We want to describe objects that can remain in memory when there is enough space; If memory space is too tight after garbage collection, these objects can be discarded. Many system caching functions fit this application scenario
After JDK 1.2, Java expanded the concept of Reference, including Strong Reference, Soft Reference, Weak Reference and Phantom Reference. The intensity of these four citations gradually decreases in turn. Strong references are common references in program code such as “Object obj=new Object ()”. As long as strong references exist, the garbage collector will never reclaim the referenced Object.
Soft references are used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before an out-of-memory exception occurs. An out-of-memory exception is thrown if there is not enough memory for this callback. After JDK 1.2, a SoftReference class was provided to implement a SoftReference.
Weak references are also used to describe non-essential objects, but they are weaker than soft references, and objects associated with weak references only survive until the next garbage collection occurs. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory. After JDK 1.2, WeakReference classes were provided to implement weak citation.
Virtual references, also known as ghost references or phantom references, are the weakest type of reference relationship. The existence of virtual references does not affect the lifetime of an object, nor can we obtain an object instance through virtual references. The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector. After JDK 1.2, the PhantomReference class was provided to implement virtual references.
3.2.4 To be or not to be
Even in the reachability analysis algorithm, unreachable objects are not necessarily dead. At this time, they are temporarily in the “probation” stage. In order to truly declare an object dead, at least two marking processes must be experienced: If an object is found to have no reference chain connected to GC Roots after the reachabality analysis, it will be marked for the first time and filtered based on whether it is necessary to execute the Finalize () method. When an object does not overwrite a Finalize () method, or a Finalize () method has already been called by a virtual machine, the virtual machine regards both cases as “not necessary to execute”.
If the object is determined to be necessary to finalize (), then the object will be placed ina Queue called f-Queue and executed later by a low-priority Finalizer thread automatically set up by the virtual machine. Here the so-called “execution” refers to the virtual chance to trigger this method, but not promised to wait for it to run over, the reason for this is that if an object in the finalize () method of slow execution, or the death cycle (the more extreme Conditions), will likely result in a permanent in F – other objects in the Queue Queue waiting, Even crash the entire memory reclamation system. The Finalize () method is the last chance for objects to escape death. Later, GC will make a second small tag on objects in the F-queue. If objects want to save themselves successfully in Finalize () — just re-associate with any object on the reference chain, For example, if you assign yourself (this keyword) to a class variable or member variable of an object, it will be removed from the collection on the second tag; If the object has not escaped by this time, it is basically recycled. From listing 3-2, we can see that finalize () of an object is executed, but it can still live.
Listing 3-2 shows a self-saving demonstration of an object
/** * This code demonstrates two things: *1. Objects can save themselves when GC is performed. *2. You only get one chance to save yourself. Because the finalize () method of an object is automatically called by the system only once at most *@author ZZM */ public class FinalizeEscapeGC{public static FinalizeEscapeGC SAVE_HOOK = null; Public void isAlive () {system.out.println ("Yes, I am still alive :)"); } @override protected void finalize () throws Throwable{super.finalize (); System. Out.println ("finalize mehtod executed!"); FinalizeEscapeGC. SAVE_HOOK = this; }p ublic static void main (String[]args) throws Throwable{SAVE_HOOK=new FinalizeEscapeGC (); SAVE_HOOK=null; System. The gc (); // Since the Finalize method has a low priority, we pause for 0.5 seconds to wait for thread.sleep (500);if(SAVE_HOOK! =null) {save_hook.isalive (); }else{System. Out. Println ("no,i am dead:("); SAVE_HOOK=null; SAVE_HOOK=null; System. The gc (); // Since the Finalize method has a low priority, we pause for 0.5 seconds to wait for thread.sleep (500);if(SAVE_HOOK! =null) {save_hook.isalive (); }else{System. Out. Println ("no,i am dead:("); }}}Copy the code
Running results:
Finalize mehtod executed! Yes, I am still alive :) no, I am dead:Copy the code
As you can see from listing 3-2, the SAVE_HOOK object’s Finalize () method was actually triggered by the GC collector and escaped before it was collected.
Another noteworthy is that the code have two places are exactly the same code snippet, turned out to be a flight to take off the success, a failure, this is because any object’s finalize () method will only be automatically call time, if the object facing the next recovery, its finalize () method will not be executed again, So the self-help action of the second code fails.
It should be noted in particular that the above description of Finalize () method when objects die may have a tragic artistic color, and the author does not encourage people to use this method to save objects. Instead, I recommend avoiding it as much as possible because it is not a DESTRUCtor in C/C++, but rather a compromise made when Java was born to make it more acceptable to C/C++ programmers. It is expensive and uncertain to run, and cannot guarantee the order of each object. Some textbooks describe it as suitable for “shutting off external resources” and the like, which is a self-soothing way to use this method. Everything that can be done by Finalize () can be done better and more timely with a try-finally or other method, so the writer recommends that you completely forget about the Java language with finalize ().
3.2.5 Recycling method area
Many people think that the method area (or the persistent generation in the HotSpot VIRTUAL machine) has no garbage collection. The Java VIRTUAL Machine specification does say that the VIRTUAL machine is not required to implement garbage collection in the method area, and the “cost effectiveness” of garbage collection in the method area is generally low: In the heap, especially in the new generation, a conventional garbage collection can recover 70% to 95% of the space, while the garbage collection efficiency of the permanent generation is much lower.
The garbage collection of the permanent generation mainly recycles two parts: discarded constants and useless classes. Recycling deprecated constants is very similar to recycling objects in the Java heap. For example, if a String “ABC” is already in the constant pool, but there is no String named “ABC” in the system. In other words, no String refers to the “ABC” constant in the constant pool, and there is no other reference to the literal. If a memory collection occurs at this point, and if necessary, the “ABC” constant is cleaned out of the constant pool by the system. Symbolic references to other classes (interfaces), methods, and fields in the constant pool are similar.
It’s easier to determine whether a constant is “obsolete,” whereas the criteria for determining whether a class is “useless” are much tougher. A class must meet all three of the following criteria to be considered “useless” :
All instances of the class have been reclaimed, meaning that there are no instances of the class in the Java heap.
The ClassLoader that loaded the class has been reclaimed.
The java.lang.Class object corresponding to this Class is not referenced anywhere, and its methods cannot be accessed anywhere through reflection.
A virtual machine can recycle useless classes that meet the above three conditions. This is only “yes”, but it is not like an object that will be recycled when it is no longer used. The HotSpot virtual machine provides the -xnoclassGC parameter to control whether or not to reclaim a class. It can also use -verbose: class, -xx: +TraceClassLoading, -xx: +TraceClassLoading to view class loading and unloading information, where -verbose: class and -xx: +TraceClassLoading can be used on VMS in the Product version, and -xx: The + traceclasssemantics parameter requires FastDebug vm support.
In scenarios where reflection, dynamic proxies, ByteCode frameworks such as CGLib, dynamically generated JSPS, and frequent custom Classloaders such as OSGi are used extensively, virtual machines with the ability to unload classes are required to ensure that permanent generations do not overflow
3.3 Garbage collection algorithm
Since the implementation of garbage collection algorithms involves a lot of program details, and virtual machines on different platforms operate memory in different ways, this section does not discuss the implementation of algorithms too much, just the idea of several algorithms and their development history.
3.3.1 Mark-clear algorithm
The most basic collection algorithm is the Mark-sweep algorithm, which, as its name suggests, is divided into two phases: Firstly, mark all the objects that need to be recycled, and uniformly recycle all the marked objects after the completion of marking. The marking process has been introduced in the previous section about the object marking judgment. The reason why it is the most basic collection algorithm is that the subsequent collection algorithms are based on this idea and improve on its shortcomings. It has two main shortcomings: one is the efficiency problem, marking and clearing the efficiency of the two processes are not high; Another is the space problem. After the mark is cleared, a large number of discontinuous memory fragments will be generated. Too much space fragment may lead to the failure to find sufficient continuous memory and another garbage collection action will have to be triggered in advance when large objects need to be allocated in the process of program running. Figure 3-2 shows the execution process of the mark-clear algorithm.
3.3.2 Replication Algorithm
To solve this problem of efficiency, a collection algorithm called Copying emerged, which divided available memory into two equally sized pieces by volume, using only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again. In this way, the memory is reclaimed for the whole half area every time, and there is no need to consider the complicated situation of memory fragmentation when allocating memory. As long as the top finger of heap is moved, the memory can be allocated in order, which is simple to implement and efficient to run. However, the cost of this algorithm is to reduce the memory to half of the original, which is a bit too high. Figure 3-3 shows the execution process of the replication algorithm.
Today’s commercial virtual machines all use this collection algorithm to recover the new generation. According to the special research of IBM, 98% of the objects in the new generation are “live and die”. Therefore, it is not necessary to divide the memory space according to the 1:1 ratio, but to divide the memory into a large Eden space and two small Survivor Spaces. Use Eden and one piece of Survivor each time [1]. When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor space once and for all, and Eden and the Survivor space that was just used are cleaned up last. By default, the HotSpot VIRTUAL machine has an 8:1 ratio of Eden to Survivor, which means that each new generation has 90% (80%+10%) of its available memory, and only 10% of its memory is “wasted”. Of course, 98% of object collectables are only data for the general scenario, there is no way to ensure that no more than 10% of objects survive each collection, and when Survivor space is insufficient, other memory (here refers to the old years) is required for allocation guarantee (Handle Promotion).
Memory allocation of guarantee like we go to a bank loan, if we have a good reputation, in 98% of cases can repay on time, so Banks may default we unluckily to repay the loan in the next, only need to have a guarantor to keep card if I cannot reimbursement, can deduct money from his account, the bank is considered without risk. The same is true of the memory allocation guarantee. If there is not enough space in another Survivor space to hold the surviving objects collected by the previous generation, these objects will enter the old age directly through the allocation guarantee mechanism. More about allocating guarantees to the new generation will be covered later in this chapter when we explain the rules for garbage collector execution.
[1] It should be noted that this generational approach in HotSpot has been this layout from the beginning and has no real connection to IBM’s research. IBM’s research is cited in this book only to illustrate the significance of this generational layout.
3.3.3 Mark-collation algorithm
The copy-collection algorithm has to carry out more replication operations when the object survival rate is high, and the efficiency will be low. More importantly, if you do not want to waste 50% of the space, you need to have extra space for allocation guarantee, in the extreme case that all objects in the memory used are 100% alive, so in the old days, this algorithm generally cannot be used directly.
According to the characteristics of the old s, someone put forward another “tag – finishing” (Mark – Compact) algorithm, the labeling still like “tag – clear” algorithm, but the subsequent steps are not directly to recyclable objects to clean up, but for all the live objects are moving to end, then clear directly outside the boundary of memory, Figure 3-4 shows the schematic diagram of the mark-tidy algorithm.
3.3.4 Generational collection algorithm
Current garbage Collection on commercial virtual machines is done using a “Generational Collection” algorithm, which is nothing new but divides memory into chunks based on the lifetime of the object. Typically, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be used for each generation. In the new generation, it is found that a large number of objects die and only a few survive during garbage collection, so the replication algorithm is selected, and the collection can be completed only by paying the replication cost of a small number of viable objects. In the old days, because the object has a high survival rate and there is no extra space to allocate it, it has to use the “mark-clean” or “mark-tidy” algorithm for recycling.
3.4 Algorithm implementation of HotSpot
Sections 3.2 and 3.3 theoretically introduce the object survival determination algorithm and the garbage collection algorithm, and when implementing these algorithms on a HotSpot VIRTUAL machine, the efficiency of the algorithm must be strictly considered to ensure that the virtual machine runs efficiently.
3.4.1 Enumerating root nodes
From GC Roots node from the accessibility analysis to find the reference chain operations, for example, could serve as a GC Roots node is in the global reference (such as constant or static attributes) and the execution context (for example, the local variables of a stack frame), is now in many application method area alone there are hundreds of megabytes, if you want to inspect the reference here, It’s going to take a lot of time.
In addition, the sensitivity of reachable analysis to execution time is also shown in GC pauses, because the analysis must be done in a snapshot that ensures consistency — by “consistency” I mean that the entire execution system appears to be frozen at a point in time throughout the analysis. The object reference relationship should not be constantly changing in the analysis process. If this point is not satisfied, the accuracy of the analysis result cannot be guaranteed. This is one of The important reasons that all Java threads of execution must be stopped (Sun calls this “Stop The World”) during GC, even in The supposedly (almost) non-stopping CMS collector, when enumerating root nodes.
Because the mainstream Java virtual machines today use Exact GC (a concept discussed in Chapter 1 when introducing Exact VM improvements to the Classic VM), there is no need for a reference location that checks all execution contexts and globally when the execution system stops. The virtual machine should have a direct way of knowing where object references are stored. In its implementation, HotSpot uses a set of data structures called OopMap to do this. When the class is loaded, HotSpot calculates what type of data is in the object at what offsets. During JIT compilation, HotSpot also records which locations in the stack and registers are references. This allows the GC to know this information directly during the scan. Listing 3-3 below is native code for a string.hashcode () method generated by the HotSpot Client VM. You can see that the call instruction at 0x026eb7a9 has an OopMap record, It indicates that there is a reference to an Ordinary Object Pointer in the EBX register and in the memory area of stack with offset 16. The valid range is from the call instruction until 0x026eb730 (the start of the instruction flow) +142 (the offset recorded by OopMap) = 0x026eb7Be, or HLT instruction.
Listing 3-3 Native code for the string.hashCode () method compiled
[Verified Entry Point] 0x026EB730: mov% eAX, -0x8000 (%esp)...... ; ImplicitNullCheckStub slowcase0x026eb7a9: Call 0x026e83e0; OopMap {ebx = Oop [16] = Oop off = 142}; * caload; - Java. Lang. String:hashCode@48 (line 1489); {runtime_call} 0 x026eb7ae: push$0x83c5c18; {external_word} 0x026eb7B3: call 0x026eb7B8 0x026eb7b8: pusHA 0x026eb7b9: Call 0xCopy the code
3.4.2 safer
With the help of OopMap, HotSpot can do GC Roots enumerations quickly and accurately, but there is a very real problem: There are so many instructions that can cause the reference relationship to change, or the content of the OopMap to change, that generating the corresponding OopMap for each instruction would require a lot of extra space, and the space cost of the GC would be very high.
In fact, HotSpot does not generate an OopMap for every instruction, as mentioned earlier, but records this information at “specific locations” called safepoints, meaning that the program does not stop everywhere to start GC, but only when it reaches a Safepoint. Safepoint selections should neither be so small that GC waits too long, nor so frequent that they unduly overload the runtime. So, safe point selected base Is on the program “is for a long time to perform the characteristics of” as a standard for the selected – because each instruction execution time is very short, the program is unlikely because instruction stream length is too long for this reason and long time running, “long time” the most obvious feature is the instruction sequence reuse, Such as method calls, loop jumps, exception jumps, etc., which is why Safepoint is produced by instructions with these features.
Another consideration for Sefepoint is how to get all threads (except the ones making JNI calls here) to “run” to the nearest safe point and then pause when GC occurs. There are two options: Presuspension and Voluntary Suspension, in which presuspension does not require the execution code of the thread to cooperate actively. When GC occurs, all threads are first interrupted. If it is found that the thread interrupts at a place other than the safe point, the thread is restored and it “runs” to the safe point. Few virtual machine implementations now use preemptive interrupts to suspend threads in response to GC events.
The idea of active interrupts is that when the GC needs to interrupt a thread, instead of working directly on the thread, it simply sets a flag, and each thread actively polls the flag as it executes, and interrupts and suspends itself when it finds that the interrupt flag is true. Polling flags overlap with safe points, plus where memory is allocated to create objects. The test instruction in listing 3 and 4 is a polling instruction generated by HotSpot. When a thread needs to be suspended, the virtual machine makes the memory page 0x160100 unreadable. When the thread executes the test instruction, it generates a trap exception signal and suspends the thread in the pre-registered exception handler to wait. Such an assembly instruction completes safe-point polling and triggers thread interrupts.
Listing 3-4 polling instructions
0x01B6D627: Call 0x01B2b210; OopMap {460} [60] = Oop off =; * invokeinterface size; -client1: main@113 (line 23); {virtual_call} 0x01B6D62c: nop; OopMap {461} [60] = Oop off =; * if_icmplt; -Client1: main@118 (line 23) 0x01B6D62D:test% eax, 0 x160100; {poll} 0x01B6D633: mov 0x50 (% ESP), % ESI 0x01B6D637: CMP %eax, %esiCopy the code
3.4.3 Security Zone
Using Safepoint seems to have solved the problem of how to get into GC perfectly, but that’s not always the case. The Safepoint mechanism ensures that when a program executes, it will not take too long to encounter a Safepoint ready for GC. But what about when the program “doesn’t execute”? When a program is not executing, it is not allocating CPU time. A typical example is when a thread is in the Sleep or Blocked state, unable to respond to interrupt requests from the JVM, “walks” to a safe place to interrupt and suspend, and the JVM is obviously less likely to wait for the thread to be reallocated CPU time. In this case, a Safe Region is required.
A security zone is a code snippet in which reference relationships do not change. It is safe to start GC anywhere in this region. We can also think of Safe Region as Safepoint extended.
When a thread executes code in the Safe Region, it first identifies itself as having entered the Safe Region, so that the JVM does not have to worry about the thread that identifies itself as Safe Region when it initiates a GC during that time. When a thread wants to leave the Safe Region, it checks to see if the system has completed the root enumeration (or the entire GC), and if so, the thread continues, otherwise it must wait until it receives a signal that it is Safe to leave the Safe Region.
So far, I have given a brief introduction to how the HotSpot VIRTUAL machine can initiate a memory collection, but how the virtual machine performs the memory collection action is still not covered because it is determined by the GC collector used by the virtual machine, and there is often more than one GC collector in a virtual machine. Let’s take a look at what GC collectors are available in HotSpot.
3.5 Garbage Collector
If the collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection. Java virtual machine specification for the garbage collector should be how to implement and there are no rules, so different vendors, different versions of the virtual machine provided by the garbage collector may have very big difference, and generally provide parameters for users according to their own characteristics and requirements should be with combination of using different s collector. The collector discussed here is based on the HotSpot virtual machine after JDK 1.7 Update 14 (in which the G1 collector was officially available in commercial form before G1 was still in experimental form), and the virtual machine contains all collectors as shown in Figure 3-5.
Figure 3-5 shows seven collectors acting on different generations, and if there is a line between the two collectors, they can be used together. The region in which the virtual machine is located indicates whether it belongs to the new generation collector or the old generation collector. In the following sections, I will introduce the features, rationale, and usage scenarios of each of these collectors, focusing on CMS and G1, two relatively complex collectors, to understand some of their operation details.
Before we introduce the individual characteristics of these collectors, let’s be clear: while we are comparing collectors, we are not trying to pick the best collector. Since there is no single best collector to date, let alone a one-size-fits-all collector, we have chosen only the one that is most appropriate for our specific application. It doesn’t take much explanation to prove that if there were a perfect collector that could be used universally in any situation, the HotSpot VIRTUAL machine would not need to implement so many different collectors.
1. The Serial collector
The Serial collector is the most basic and oldest, and was once (prior to JDK 1.3.1) the only choice for the new generation of virtual machines. You to look at the name will know that this collector is a collector of the single thread, but it’s the meaning of the “single thread” is not just that it can only use a CPU or a collection of threads to complete garbage collection work, more important is in it for garbage collection, must suspend all other worker threads, until it is about the end of the collection. The name “Stop The World” may sound cool, but this job is actually initiated and completed automatically by The virtual machine in The background, without The user being seen to Stop The user’s normal working thread, which is difficult for many applications to accept. Imagine if your computer stopped responding for five minutes every hour. Figure 3-6 illustrates the Serial/Serial Old collector in operation.
Bad to “Stop The World” to The user’s experience, The virtual machine’s designers fully understand, but also in very grievance: “when your mother give you clean The room, will surely make you honestly on The chair or stay outside The room, if she is clean, you throw confetti, still can finish cleaning The room?” This is a legitimate conflict. While garbage collection may sound like cleaning, it’s actually a lot more complicated than cleaning.
Since JDK 1.3 and up to the latest JDK 1.7, the HotSpot VIRTUAL Machine development team has been working to eliminate or reduce worker thread pauses due to memory reclaiming, from the Serial collector to the Parallel collector, Moving on to the forefront of Concurrent Mark Sweep (CMS) and even the Garbage First (G1) collector, we’ve seen better and better (and more complex) collectors emerge as user threads pause faster and faster, But there is still no way to eliminate it completely (not including collectors in the RTSJ). The search for a better garbage collector continues!
At this point, I seem to have described the Serial collector as an “old, useless, boring, and regrettably useless” bug, but it is still the default new generation collector for virtual machines running in Client mode. It also has an advantage over other collectors: it is simple and efficient (compared to the single-threaded collection of other collectors). For a single-CPU-limited environment, the Serial collector naturally achieves the highest single-threaded collection efficiency because it does not have the overhead of thread interaction. In the user’s desktop application scenario, the memory allocated to the virtual machine management in general is not very big, collect YiLiangBaiZhao even with a few megabytes of Cenozoic (use the memory is just a new generation, a desktop application basically won’t big), pause time can control in a few milliseconds up to more than one hundred milliseconds, as long as it’s not frequent, This pause is acceptable. Therefore, Serial collectors are a good choice for virtual machines running in Client mode.
2. ParNew collector
The ParNew collector is essentially a multi-threaded version of the Serial collector, with all control parameters available to the Serial collector (e.g. -xx: SurvivorRatio, -xx: SurvivorRatio, -xx: SurvivorRatio, SurvivorRatio, SurvivorRatio, SurvivorRatio, SurvivorRatio). XX: PretenureSizeThreshold, – HandlePromotionFailure, collection algorithms, Stop The World, object allocation rules, collection strategies, and so on are all identical to The Serial collector, and The two collectors share much more code in implementation. Figure 3-7 shows the working process of the ParNew collector.
The ParNew collector does not offer much innovation over the Serial collector except for multi-threaded collection, but it is the preferred new generation collector for many virtual machines running in Server mode. For a non-performance related but important reason, besides the Serial collector, It is currently the only one that works with the CMS collector. In JDK 1.5, HotSpot introduced a garbage collector that could almost be considered revolutionary in strongly interactive applications, the CMS collector (more on this later in this section), This collector is the first truly Concurrent collector in the HotSpot simulator, and for the first time allows the garbage collector thread to work (basically) at the same time as the user thread, so that, in the previous example, you can throw paper scraps on the floor while your mom cleans the house.
Unfortunately, CMS, as an older collector, does not work with the Parallel Scavenge collector, which already exists in JDK 1.4.0 [1], so when CMS was used to collect older ages in JDK 1.5, The new generation can only choose one of the ParNew or Serial collectors. The ParNew collector is also the default Cenozoic collector with the -xx: +UseConcMarkSweepGC option, which can also be forcibly specified with the -xx: +UseParNewGC option.
The overhead of thread interaction, the collector is not 100 percent guaranteed to outperform the Serial collector in either CPU environment implemented through hyperthreading technology. Of course, with the increase in the number of cpus available, it is beneficial for efficient utilization of system resources during GC. By default, it enables the same number of garbage collection threads as the number of cpus. For very large numbers of cpus (such as 32, as is increasingly the case with 4-core cpus and servers with more than 32 logical cpus), you can limit the number of garbage collection threads by using the -xx: ParallelGCThreads parameter.
Note that starting with the ParNew collector, you’ll see several concurrent and parallel collectors later. Before you get confused, it’s worth explaining two terms: concurrency and parallelism. Both of these terms are concepts in concurrent programming, and in the context of talking about the garbage collector, they can be explained as follows.
- Parallel: When multiple garbage collection threads work in Parallel, but the user thread is still in a waiting state.
- Concurrent: The execution of both the user thread and the garbage collector thread at the same time (but not necessarily in parallel and may occur alternately), with the user program continuing to run while the garbage collector is running on another CPU.
Be insane
The Parallel Scavenge collector is a new generation collector. It is also a collector using replication algorithms, and is a Parallel multithreaded collector. It looks the same as ParNew, so what’s so special about it?
The Parallel Collector is characterized by its focus on minimizing the pause time of user threads during garbage collection. The goal of the Parallel Insane is to achieve a controlled Throughput. Throughput is the ratio of CPU time spent running user code to total CPU consumption, i.e. Throughput = time spent running user code/(time spent running user code + garbage collection time). If the virtual machine runs for 100 minutes and garbage collection takes 1 minute, then the throughput is 99%.
The shorter the pause time is, the more suitable for the program that needs to interact with the user. Good response speed can improve the user experience, while the high throughput can efficiently use the CPU time to complete the program’s computing tasks as soon as possible. It is mainly suitable for the tasks that do not need too much interaction in the background.
The Parallel Scavenge collector provides two parameters for precise throughput control, the -xx: MaxGCPauseMillis parameter, which controls the maximum garbage collection pause time, and the -xx: GCTimeRatio parameter, which directly sets the throughput size.
The MaxGCPauseMillis parameter allows a number of milliseconds greater than 0, and the collector will do its best to ensure that memory rollback takes no longer than the set value. However, don’t assume that if you set this parameter to a slightly smaller value, the system will be able to collect garbage faster, and the GC pause time will be reduced at the expense of throughput and new generation space: The system changed the size of the new generation to 300MB faster than 500MB, which directly caused the garbage collection to happen more frequently, instead of 100 ms pauses every 10 seconds, it now happens every 5 seconds and 70 ms pauses. Pause times did go down, but so did throughput.
The value of the GCTimeRatio parameter should be an integer greater than 0 and less than 100, which is the ratio of garbage collection time to total hours, equivalent to the inverse of throughput. If you set this parameter to 19, the maximum allowed GC time is 5% of the total time (1/ (1+19)), and the default is 99, which is the maximum allowed garbage collection time of 1% (1/ (1+99)).
The Parallel Avenge collector is also often referred to as a “through-first” collector due to its affinity for throughput. The Parallel Exploiture collector has another parameter -XX: +UseAdaptiveSizePolicy that is worth noting. This is a switch parameter. When turned on, there is no need to manually specify the size of the new generation (-xMN), Eden to Survivor ratio (-xx: SurvivorRatio), and age of the older age (-xx: SurvivorRatio). PretenureSizeThreshold), the vm collects performance monitoring information based on current system performance and dynamically adjusts these parameters to provide the most appropriate pause times or maximum throughput. This regulation method is called GC Ergonomics [1]. If the reader is not familiar with the collector operation and manual optimization is difficult, use the Parallel Scavenge avenge in conjunction with the adaptive adjustment strategy and hand off the memory management tuning to the virtual machine is a good choice. Just set the basic memory data (such as -xmx for maximum heap), then set an optimization target for the virtual machine using the MaxGCPauseMillis parameter (more concerned with maximum pause time) or GCTimeRatio parameter (more concerned with throughput), and the virtual machine takes care of the details. The adaptive adjustment strategy is also an important difference between the Parallel Avenge collector and the ParNew collector.
4.Serial Old collector
Serial Old is an older version of the Serial collector, which is also a single-threaded collector using a mark-and-trim algorithm. The main significance of this collector is also for the use of virtual machines in Client mode. If in Server mode, it has two main uses: The Application is used in JDK 1.5 and earlier with the Parallel Scavenge collector [1], and as a fallback to the CMS collector when Concurrent Mode failures occur in Concurrent collections. Both of these will be covered in more detail later. Figure 3-8 shows the working process of the Serial Old collector.
[1] The Parallel Scsweep collector is the application of the PS MarkSweep Collector. The Parallel Scsweep collector is not the Serial Old collector. However, this PS MarkSweep collector is so close to the implementation of Serial Old that many official sources refer directly to Serial Old instead of PS MarkSweep, which is also used in this article.
Parallel Old collector
The Parallel Old is an older version of the Parallel Avenge collector that uses multithreading and a “mark-and-collate” algorithm. The collector was only available in JDK 1.6, after the recent generation of the Parallel Exploder was somewhat of an embarrassment. The reason is that if the new generation chooses the Parallel Scavenge, the older generation will have no choice but to insane the Serial Old (PS MarkSweep) collector. . The Parallel collector is not necessarily able to maximize the throughput of the application as a whole because of the Serial Old collector on the server application performance “tired”, the use of the Parallel collector is not able to take full advantage of the multi-CPU processing power of the server due to the single-thread Old collection set. In older environments with large and more advanced hardware, the throughput of this combination may not even be as “awesome” as ParNew plus CMS.
It was not until the Parallel Old collector was invented that the “through-first” collector became a real application group. The Parallel Scavenge collector could be used as a priority for the throughput and CPU resource-sensitive applications. Figure 3-9 shows the working process of the Parallel Old collector.
6. CMS collector
The CMS(Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. At present, a large part of Java applications are concentrated on the server side of Internet sites or B/S systems. These applications pay special attention to the response speed of the server and hope that the system pause time is the shortest, so as to bring users a better experience. The CMS collector is a good fit for such applications
As the name (which includes “Mark Sweep”) suggests, the CMS collector is based on a “mark-sweep” algorithm, which is more complex than the previous collectors. The process is divided into four steps, including:
(1) Initial mark
(2) Concurrent markup
(3) re-marking
(4) Concurrent clearing
The initial marking and re-marking steps still need to “Stop The World”. Initial marking only marks objects that GC Roots can directly associate with, which is very fast. The stage of concurrent marking is the process of GC RootsTracing, while the stage of re-marking is to correct the mark records of those objects whose marks are changed due to the continued operation of the user program during concurrent marking. The pause time in this phase is generally slightly longer than in the initial tagging phase, but much shorter than in concurrent tagging.
Because the collector thread, which takes the longest concurrent markup and concurrent cleanup, can work with the user thread, the CMS collector’s memory reclamation process is, in general, executed concurrently with the user thread. Figure 3-10 illustrates the concurrency and pause times in the CMS collector’s operational steps.
CMS is an excellent Collector, and its main advantages are already reflected in its name: Concurrent collection, Low Pause Collector, and also referred to as Concurrent Low Pause Collector in some official Sun documentation. But CMS is far from perfect, and it has three obvious drawbacks:
The CMS collector is very sensitive to CPU resources. In fact, programs designed for concurrency are CPU sensitive. In the concurrent phase, it does not cause user threads to pause, but it does slow down the application and reduce overall throughput by taking up a portion of the threads (or CPU resources). By default, the number of garbage collection threads started by CMS is (number of cpus +3) /4. That is, when the number of cpus is more than 4, garbage collection threads should occupy at least 25% of the CPU resources in concurrent collection, and the number of garbage collection threads decreases with the increase of the number of cpus. However, when there are less than four cpus (such as two), the CMS can have a significant impact on the user program. If the CPU is already heavily loaded and half of its computing power is allocated to the collector thread, the user program’s execution speed can suddenly be reduced by 50%, which is not acceptable. To counter this, virtual machines provide a variation of the CMS collector called Incremental Concurrent Mark Sweep (I-CMS), Do and single CPU s PC operating system using the principle of preemptive to simulate the minds, is let the GC thread when concurrent tags, clean up, user threads run alternately, try to reduce the time of the GC thread exclusive resources, so that the whole process of garbage collection is longer, the impact on the user program will seem a bit less, That is, the decrease in velocity is not so significant. In practice, the INCREMENTAL CMS collector has proved to be less effective, and in the current version i-CMS has been declared “deprecated”, meaning that it is no longer recommended for use by users.
The CMS collector is unable to handle Floating Garbage, and a “Concurrent Mode Failure” may occur, resulting in another Full GC. Because the CMS concurrent cleanup phase user threads are still running, new garbage is naturally generated as the program runs. This part of garbage is generated after the marking process, and the CMS cannot dispose of it in the current collection, so it has to be cleaned up in the next GC. This part of the garbage is called “floating waste”. Is also due to the user thread also need to run the garbage collection stage, it will also need to set aside enough memory space To the user thread is used, therefore the CMS collector can’t wait for old age like other collector is almost completely filling up again in collection, need to set aside part of space to provide operational use concurrent collection program. Under JDK 1.5, the CMS collector is activated when the old generation is using 68% of the space. This is a conservative setting. If the old generation is not growing too fast in the application, you can adjust the parameter to -xx: CMSInitiatingOccupancyFraction value to improve trigger percentage, in order to reduce memory recovery times so as to obtain better performance, in JDK 1.6, start the threshold of the CMS collector has increased to 92%. If the CMS is running without enough memory to meet the program’s requirements, a “Concurrent Mode Failure” occurs, at which point the virtual machine starts a fallback: the Serial Old collector is temporarily enabled to restart the Old garbage collection, resulting in long pauses. So parameter – XX: CM SInitiatingOccupancyFraction set too high it is easy to cause the Failure of a large number of “Concurrent Mode Failure”, performance reduce instead.
One final drawback, as mentioned at the beginning of this section, is that CMS is a collector based on a “mark-clean” algorithm, which, if you remember from the previous algorithm introduction, means a lot of space debris is generated at the end of the collection. When space debris is too much, it will bring great trouble to the allocation of large objects. Often, there will be a large amount of space left in the old years, but they cannot find a large enough continuous space to allocate the current object, and they have to trigger a Full GC in advance. To solve this problem, the CMS collector provides a -xx: + UseCMSCompactAtFullCollection open pass parameters (the default is open), is used to hold the CMS collector to FullGC open memory fragments of combined finishing process, it is impossible to concurrent process of memory consolidation, the problem of space debris, but not the same long pauses. Virtual machine designers also provides another parameter – XX: CMSFullGCsBeforeCompaction, how many times this parameter is used to set the execution without compression after Full GC, followed by a with compression (the default value is 0, said defragment every time when you enter a Full GC).
7. G1 collector
The G1 (garbage-first) collector is one of the most advanced collector technologies of today. Sun’s JDK 1.7 RoadMap was developed as an important evolutionary feature of the HotSpot VIRTUAL machine in JDK 1.7 just after the project goal was established. An Early Access version of the G1 collector has been available for developers to experiment with since JDK 6U14, and the G1 collector remained in an “Experimental” state for several years until JDK 7U4, when Sun decided it was mature enough for commercial use. Remove the label of “Experimental”.
G1 is a garbage collector for server-side applications. The HotSpot development team has given it the mission of (in the longer term) replacing the CMS collector released in JDK 1.5. Compared to other GC collectors, G1 has the following characteristics.
Parallelism and concurrency: G1 can take full advantage of The hardware advantages of multi-CPU, multi-core environment, using multiple cpus (CPU or CPU core) to shorten The stop-the-world pause time, some of The other collectors would have to Stop GC actions performed by Java threads. The G1 collector can still allow Java programs to continue executing concurrently.
Generational collection: As with other collectors, the generational concept remains in G1. Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it can handle newly created objects and old objects that have been around for a while and have survived multiple GC’s in different ways for better collection results.
Spatial integration: Unlike CMS’s mark-clean algorithm, G1 is a collector based on the mark-clean algorithm as a whole, and is implemented locally (between two regions) based on the “copy” algorithm. However, both algorithms mean that G1 does not generate memory space fragmentation during operation. Collection provides neat free memory. This feature helps programs run for a long time and allocate large objects without triggering the next GC prematurely because contiguity memory space cannot be found.
A predictable pause: This is another big advantage of G1 relative to CMS, reduce the pause time is the common attention of G1 and CMS, but G1 in addition to the pursuit of low pause, also can establish predictable pauses model, can let the user specify in a length of M segment within milliseconds, time on garbage collection may not consume more than N milliseconds, This is almost already a feature of the real-time Java (RTSJ) garbage collector.
While other collectors prior to G1 collected the entire Cenozoic or old age, G1 no longer does. When using the G1 collector, the memory layout of the Java heap is very different from that of other collectors. It divides the entire Java heap into independent regions of equal size. While the concept of new generation and old generation is retained, the new generation and old generation are no longer physically separated. They are all collections of parts of regions (which do not need to be continuous).
The G1 collector is able to model predictable pause times because it can systematically avoid region-wide garbage collection across the entire Java heap. G1 tracks the value of Garbage accumulation in each Region (the amount of space collected and the empirical value of the collection time), maintains a priority list in the background, and collects the most valuable Region (hence the garbage-first name) according to the allowed collection time. This use of regions and prioritized Region collection ensures that the G1 collector achieves the highest possible collection efficiency in a limited amount of time.
The idea of the G1 “breaking down” memory might seem easy to understand, but the implementation details are far less straightforward than you might think, or it would be nearly a decade since Sun LABS published the first G1 paper in 2004 that a commercial version has been developed. I use one detail as an example: can garbage collection really be done region-by-region when the Java heap is divided into multiple regions? It sounds obvious, but if you think about it more carefully, it’s easy to see where the problem lies: Regions can’t be isolated. An object assigned to a Region cannot be referenced only by other objects in the Region, but can be referenced by any object in the Java heap. Wouldn’t you have to scan the entire Java heap for reachability tests to determine whether an object is alive or not? This problem isn’t unique to G1, it’s just more pronounced. Points before collecting concentration, the size of the new generation is generally more than old s to xiao xu, new generation is also much more frequently than the old s collection, the collection of objects in the Cenozoic is also facing the same problems, such as fruit recycling also has to scan old s in Cenozoic, so the efficiency of the Minor GC may fall.
VMS in the G1 collector use Remembered Set to avoid full heap scans for references between regions and references between new and old generations in other collectors. Each Region in G1 has a corresponding Write Barrier for data written to the Reference type. The vm discovery program will generate a Write Barrier for data written to the Reference type. Check whether the object referenced is in a different Region (in the generational example, check whether the object in the old age refers to the object in the new generation), if so, The referenced information is recorded in the Remembered Set of the Region to which the referenced object belongs through CardTable. When an memory collection is made, adding Remembered Set to the enumeration scope of the GC root ensures that no full heap scans will be missed.
If the operation of maintaining the Remembered Set is not counted, the OPERATION of the G1 collector can be roughly divided into the following steps:
- Initial Marking
- Concurrent Marking
- Final Marking
- Live Data Counting and Evacuation
Readers familiar with how the CMS collector works will have noticed many similarities between the first few steps of G1 and CMS. The initial marking phase simply marks the objects that GC Roots can be directly associated with and changes the value of TAMS (Next Top at Mark Start) so that new objects can be created in the correct Region when the user program runs concurrently in the Next phase. This phase requires the thread to be stopped, but the time is very short. The concurrent marking phase starts with GC Root to analyze the reachability of objects in the heap to find viable objects. This phase is time-consuming, but can be executed concurrently with user programs. The final tagging phase is used to correct the part of the tagging record that changes during the concurrent tagging as the user’s program continues to operate. The virtual machine records the changes in the thread Remembered Set Logs during this time. The final marking phase requires merging the data of the Remembered Set Logs into the Remembered Set. This phase requires a pause line, but can be performed in parallel. Finally, in the screening recovery stage, the value and cost of each Region are sorted first, and the recovery plan is made according to the GC pause time expected by users. According to the information revealed by Sun Company, this stage can also be executed concurrently with user programs, but because only part of regions are recovered, Time is user-controlled, and pausing the user thread greatly improves collection efficiency. Figure 3-11 shows a clear view of the concurrent and pauses phases of the G1 collector’s operational steps.