One, foreword
JAVA Garbage Collection (GC) is an important feature that distinguishes C++, which requires developers to implement their own Garbage Collection logic, while JAVA developers only need to focus on business development, because Garbage Collection is already done by the JVM. From this point of view, JAVA is still a bit more perfect. But that doesn’t mean we don’t need to understand how GC works, because if we don’t understand how GC works, it can lead to memory leaks, frequent GC freezes, and OOM problems, so we need to understand how IT works in order to write high-performance applications and solve performance bottlenecks.
To understand how GC works, we must first understand the JVM memory management mechanism so that we know which objects to recycle, when to recycle, and how to recycle.
Second, JVM memory management
According to the JVM specification, the JVM divides memory into the following regions:
1. Method Area 2. Heap 3. VM Stack 4. Native Method Stack 5. Program Counter RegisterCopy the code
The method area is shared with all the threads in the heap.
2.1 Method Area
The method area holds information about the class to be loaded (such as class name, modifiers, and so on), static variables, constructors, constants defined by final, fields in the class, and methods. The method area is shared globally and can be GC under certain conditions. When the method area exceeds its allowable size, an OutOfMemory: PermGen Space exception is thrown.
In the Hotspot VIRTUAL machine, this region corresponds to the Permanent Generation. In general, GC is rarely performed on the method region, which is one of the reasons why the method region is called persistent Generation, but this does not mean that GC is completely absent from the method region. The GC on it focuses on the collection of constant pools and the unloading of loaded classes. GC on the method area is very demanding and difficult.
The Runtime Constant Pool is the part of the method area used to store compiler-generated constants and references. In general, the allocation of constants is determined at compile time, but not always. Constants generated at run time can also be stored. For example, the String class intern () method maintains a constant pool. If the String character “hello” is already in the constant pool, it returns the address in the constant pool. Otherwise, it adds a new constant to the pool and returns the address.
2.2 Heap area
The heap area is where GC is most frequent and is the most important area to understand GC mechanics. The heap area is shared by all threads and is created when the virtual machine starts. The heap area is mainly used to store object instances and arrays. All new objects are stored in this area.
2.3 VM Stack
Virtual machine is operating system memory occupied stack, each thread stack corresponds to a virtual machine, it is thread private, life cycle and threads, each method is performed to create a stack Frame (Statck Frame), the stack Frame is used to store the local variables method, dynamic linking, operands, and export information, such as when a method is invoked, the stack Frame into the stack, When the method call ends, the stack frame goes off the stack.
The local variable table stores local variables related to the method, including various basic data types and reference addresses of objects, so it has a characteristic: the memory space can be determined at compile time and does not change at run time.
The virtual machine stack defines two types of exceptions: StackOverFlowError(stack overflow) and OutOfMemoryError (memory overflow). A StackOverFlowError is raised if the stack depth of the thread call is greater than the maximum depth allowed by the virtual machine. However, most virtual machines allow the virtual stack to be dynamically expanded, so threads can keep requesting stacks until they run out of memory, throwing outofMemoryErrors.
2.4 Native Method Stack
The local method stack is used to support the execution of native methods and stores the execution state of each native method. The local method stack and the virtual machine stack run the same mechanism, the only difference is that the virtual machine stack executes Java methods, while the local method stack executes native methods. In many virtual machines (such as the HotSpot VIRTUAL machine, the default of Sun’s JDK), the virtual machine stack is used together with the native method stack.
2.5 Program Counter Register
A program counter is a small area of memory, not in RAM but directly on the CPU, that cannot be manipulated by a program. Its purpose is that the JVM stores the line number of the bytecode (.class) file executed by the current thread as it interprets it. When the bytecode interpreter works, it takes the next instruction to be executed by changing the value of the program counter. Branch, loop, jump and other basic functions are completed by this technical area.
Each program counter can only record the line number of one thread, so it is thread private.
If the program is currently executing a Java method, the program counter records the address of the virtual machine bytecode instruction being executed. If the native method is executing, the counter value is empty, and this memory area is the only one that does not throw OutOfMemoryError.
Iii. GC mechanism
With the running of the program, instance objects and variables occupy more and more memory. If the memory is not reclaimed in time, the program running efficiency may be reduced or even system exceptions may occur.
Of the five memory areas described above, three do not require garbage collection: the local method stack, the program counter, and the virtual machine stack. Because their life cycle is synchronized with the thread, their memory is automatically released as the thread is destroyed. Therefore, only the method and heap areas need to be garbage collected, and the objects to be collected are those that do not have any references.
3.1 Search Algorithm
In the classic reference counting algorithm, each object is added to the reference counter, each time it is referenced, the counter +1, the reference is lost, the counter -1, and when the counter is zero for a period of time, the object is considered recyclable. However, this algorithm has an obvious flaw: when two objects reference each other, but neither of them has any effect, they should be recycled, but because they reference each other, they do not qualify for garbage collection, so this area of memory cannot be processed. So instead of using this algorithm, Sun’s JVM uses something called a — root search algorithm, as shown here:
The basic idea is to start from a root node called GC Roots and search down. If an object cannot reach GC Roots, it is no longer referenced and can be reclaimed. For example, Object5, Object6, and Object7 in the figure above, although they still reference each other, they actually have no effect, which solves the defect of reference counting algorithm.
Supplementary concepts, four concepts were introduced after JDK1.2: strong reference, soft reference, weak reference, and virtual reference. Strong references: Objects that come out of new are strong references and GC will not reclaim them anyway, even if an OOM exception is thrown. Soft references: Are reclaimed only when the JVM runs out of memory. Weak references: Whenever GC is collected, it is collected immediately, regardless of whether memory is sufficient. Virtual references: Negligible, the JVM doesn’t care about virtual references at all, as you can interpret them as rounding up the four Kings. Its only function is to make some trace records to assist the use of Finalize function.
Finally, what classes need to be recycled:
A. All instances of the class have been reclaimed; B. The ClassLoad that loaded the class has been recycled. C. The reflection java.lang.Class object corresponding to this Class is not referenced anywhere.Copy the code
3.2 Memory Partitions
Memory is mainly divided into three parts: Youn Generation, Old Generation and Permanent Generation. The three generations have different characteristics that result in different GC algorithms. The younger generation is suitable for objects with short lifetimes that can be quickly created and destroyed, while the older generation is suitable for objects with long lifetimes. In Sun Hotpot virtual machines, persistent generation refers to method area (some JVMS do not have persistent generation at all).
Youn Generation: It is roughly divided into Eden zone and Survivor zone, and Survivor zone is divided into two parts of the same size: FromSpace and ToSpace. The newly created objects are allocated memory from the generation. If the Eden area is insufficient, the surviving objects will be transferred to the Survivor area. Minor GC (also known as Youn GC) starts when the new generation does garbage collection.
Old Generation: The Old Generation is used to store objects that have survived multiple collections by the new Generation, such as cache objects. When the old generation is Full, it needs to be collected, and the garbage collection of the old generation is called the Major GC (also known as Full GC).
Permanent Generation: Means method region in Sun’s JVM, although most JVMS do not have this Generation.
3.3 GC algorithm
Common GC algorithms: copy, mark-clean, and mark-compress
Copy: The copy algorithm scans from the root collection and moves the surviving objects to a free area, as shown in the figure:
When the number of living objects is small, the replication algorithm will be more efficient (the Eden district of the new generation adopts this algorithm), which brings the cost of an extra free space and object movement.
Mark-clearing: the algorithm starts from scanning the following set, marks the surviving objects, and then scans the unmarked objects in the whole space and clears them. The marking and cleaning process is as follows:
In the figure above, the blue parts are referenced objects and the brown parts are not. In Marking stage, comprehensive scanning is required, which is time-consuming.
The cleanup phase cleans up unreferenced objects, and the surviving objects are retained.
The mark-clear action does not need to move objects and only cleans up non-viable objects. When there are many viable objects in the space, the efficiency is higher. However, because it is only cleared without rearrangement, memory fragmentation will be caused.
Mark-compression: This algorithm is similar to the mark-clear algorithm in that the living object is marked first, but after the clear, the living object is moved to the left free space, and then the pointer to the reference object is updated, as shown in the following figure
The algorithm avoids the mark-clean debris problem because of the move alignment, but the cost is increased because of the need to move. (This algorithm is suitable for the old generation)
Garbage collector
In the JVM, GC is performed by the garbage collector, so in a practical application scenario, we need to choose the appropriate garbage collector, which we will describe below.
4.1 Serial GC
The Serial GC is the oldest and most basic collector, but is still widely used today and is the default configuration for client virtual machines in JAVA SE5 and JAVA SE6. Suitable for systems with only one processor. Both minor and Major GC are collected by a single thread in serial processors. Is the biggest characteristic of it in the garbage recycling, the need for all the executing thread pause, stop the world, is difficult to accept for some application, but if the application is not so high real-time requirement, as long as the pause time control in the N within milliseconds, most applications still acceptable, but in fact, It did not disappoint us. Pauses of tens of milliseconds were perfectly acceptable to our clients, and the collector was the default GC mode at the client level for single-CPU, small-generation, and less-time-critical applications.
4.2 ParNew GC
It is basically the same as the Serial GC, but the essential difference is that it can be used on the server side with the addition of multi-threading, which makes it more efficient. It can also be used with the CMS GC, so it makes more sense to use it on the server side.
4.3 the Parallel Scavenge GC
In the whole process of scanning and copying, multithreading is adopted. It is applicable to applications with multiple cpus and short pause time requirements. It is the default GC mode at the server level.
4.4 Concurrent Mark Sweep (CMS) Collector
The goal of this collector is to solve the problem of Serial GC pauses to achieve a minimum collection time. Common B/S architecture applications are suitable for this kind of collector, because of its high concurrency, high response characteristics, CMS is based on mark-clear algorithm implementation.
Advantages of CMS collector: Concurrent collection, low pauses, but far from perfect;
Disadvantages of the CMS collector:
The A. CMS collector is very sensitive to CPU resources. In the concurrent phase, although it does not cause user pause, it will occupy CPU resources, resulting in slow application and reduced total throughput. The B. MS collector is unable to handle floating garbage, and there may be a "Concurrnet Mode Failure" that causes another Full GC. C. The CMS collector is based on an implementation of the mark-sweep algorithm and therefore also generates fragmentation.Copy the code
4.5 G1 Collector
There are many improvements over the CMS collector. First, based on the mark-compression algorithm, there is no memory fragmentation, and second, it can control pauses more accurately.
4.6 Serial Old Collector
Serial Old is an older version of the Serial collector, which also uses a single thread to perform the collection, using a mark-tidy algorithm. VMS in Client mode are used.
4.7 Parallel Old Collector
The Parallel Old is an older version of the Parallel Avenge collector that uses multithreading and a “mark-and-collate” algorithm.
4.8 RTSJ garbage collector
RTSJ garbage collector for Real-time Java programming.
Five, the summary
An in-depth understanding of the JVM’s memory model and GC mechanism can help us write high-performance code and provide ideas and directions for code optimization.