We know that the JVM tuning is the main tone of the choice of the garbage collector and parameter setting, so we must grasp knowledge of recycling, otherwise how tuning, then what is garbage, garbage in our analogy life, is not, to clear, so the first step is to find the garbage, In Java we have two approaches.
How to define garbage
Reference counting method
Add a reference counter to the object that is +1 every time it is referenced. When the reference is invalid, the counter value is -1. An object with a counter of 0 at any time is no longer likely to be used and is judged unreachable for GC cleanup. This method is almost dead, because if AB holds references to each other, it can never be recycled, as shown in the following code.
/** * @author jack xu */ public class ReferenceCount { public Object instance = null; public static void main(String[] args) { testGC(); } public static void testGC() { ReferenceCount a = new ReferenceCount(); ReferenceCount b = new ReferenceCount(); a.instance = b; b.instance = a; a = null; b = null; }}Copy the code
As you can see from the figure below, these two objects can no longer be accessed, but because they refer to each other, their reference counts will never be zero, and the reference counting algorithm will never tell the GC collector to reclaim them. This leads to memory leaks, which eventually lead to memory leaks.
Accessibility analysis
The basic idea of the accessibility analysis algorithm is to search down from these nodes through some objects called GC Roots as the starting point, and the search path is called the Reference Chain. When an object is not connected to GC Roots by any reference chain (that is, unreachable from the GC Roots node to the node), the object is proved to be unavailable, as shown in the figure below.
A list of objects that can be used as GC Roots in Java:
- Objects referenced in the Java virtual machine stack (the local variable table in the stack frame)
- Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack.
- The object referenced by the class static property in the method area
- A reference object to a constant in a method area.
- Class loader
- Thread Thread class
Garbage collection algorithm
After determining what garbage can be collected, all the garbage collector has to do is start garbage collection, but there is a question: how can garbage collection be done efficiently? There are three algorithms.
Mark-sweep
This is the most basic garbage collection algorithm. The mark-sweep algorithm is divided into two phases: the mark phase and the sweep phase. The mark stage is to mark all objects that need to be reclaimed, and the clear stage is to reclaim the space occupied by the marked objects.
Disadvantages:
- Discontinuous position, resulting in debris
- Low efficiency, two times of scanning, marking and cleaning are time-consuming
Copying algorithms
In order to solve the defects of the Mark-sweep algorithm a Copying algorithm was proposed. It divides the available memory by capacity into two equally sized pieces, one of which is used at a time. When this area of memory is used up, the surviving objects are copied to the other area, and then the used memory space is cleaned up all at once, so that the problem of memory fragmentation is not likely to occur.
Advantages: No debris, continuous space
Disadvantages: causes 50% of memory space to be always free and wasted!
Mark-compact
In order to solve the defects of Copying algorithm and make full use of memory space, a Mark-Compact algorithm is proposed. The marking phase of this algorithm is the same as Mark-sweep, but after the marking is done, instead of cleaning up the recoverable objects directly, it moves all the living objects to one end and then cleans up memory beyond the end boundary.
Advantages: No debris, continuous space
Disadvantages: low efficiency, two scanning, pointer need to adjust
Generational collection algorithm
Generational Collection is not strictly an idea or theory, but a combination of three basic algorithm ideas and different algorithms for different situations.
We know that most objects are dead, so we divide the heap into the new generation, the old generation, and the immortal generation (called metacreage in JDK8) so that they can do different garbage collection for different generations. The Cenozoic generation was further divided into Eden and Survivor zones, with a ratio of 8:1:1.
Let me illustrate the generational garbage collection process using a diagram:
Step 1: The newly allocated object will be placed in Eden, and when Eden is full, the Minor GC will be triggered, and the Minor GC will clean up all garbage that is not used in the young generation, including S0 and S1.
Step 2: The objects in Eden that have not been cleared are the ones that survived. Place their age +1 in zone S0
Step 3: When Eden is full, minor GC is triggered again, and the surviving objects in Eden are aged +1 in S1, and the surviving objects in S0 are aged +1 in S1, so s0 is empty
Step 4: Repeat
Step 5: When the survivor reaches the age, enter the old age, the default is 15 years old (default is 6 years old in CMS), this age can be adjusted by yourself
Step 6: If the old generation is full, either the Major or full GC is triggered. The so-called Stop the World (STW) phenomenon occurs when full GC is triggered. The larger the memory, the longer the STW, so it’s not just that bigger is better.
After looking at the diagram above, we know that the young generation uses the replication algorithm, because most objects have a short life cycle and are recycled frequently, so the replication algorithm is highly efficient. The old age is replaced by the tag clearing or tag sorting algorithm, because in the old age objects live longer and there is no need to copy and copy.
Garbage collector
The garbage collector is a concrete implementation of the garbage collection algorithm, namely, the landing. We introduce the following common collectors.
Serial/Serial Old
The Serial/Serial Old collector is The most basic and oldest collector. It is a single-threaded collector, and while it is garbage collecting, all user threads must be paused, stopping The World. Serial collectors are collectors targeted at the new generation using a Copying algorithm. Serial Old collectors are collectors for older generations that use the Mark-Compact algorithm. Its advantage is that it is simple and efficient to implement, but its disadvantage is that it brings users to a standstill.
Parameter control: -xx :+UseSerialGC -xx :+ UseSerialOldGC
ParNew
The collector is essentially a multithreaded version of the Serial collector, with multiple threads doing garbage collection in parallel. It is more efficient than Serial on multi-core cpus and comparable to Serial on single-core cpus. In the new generation, use copy algorithm, use with CMS.
Parameter control: -xx :+UseParNewGC
Parallel Scavenge/Parallel Old
The Parallel Scavenge collector is a new generation of multithreaded collectors. It also looks like ParNew, but Parallel Scanvenge is more concerned with system throughput and uses the Copying algorithm.
The Parallel Old is an older version of the Parallel Avenge collector, also a Parallel collector that uses multithreading and the Mark-Compact algorithm, and is also more focused on throughput.
Parameter control: -xx: +UseParallelGC -xx: +UseParallelOldGC
CMS
CMS (Concurrent Mark Sweep) collector is a collector aimed at obtaining the shortest collection pause time. It is a Concurrent collector and adopts the Mark-sweep algorithm.
Note: parallelism refers to garbage collection threads executing in parallel, and refers to user threads executing together with garbage collection threads.
Its operation process is more complex than the previous collectors, the whole process is divided into four steps, including:
- CMS Initial Mark
- CMS Concurrent Mark
- Re-marking (CMS Remark)
- CMS Concurrent sweep
The initial marking and re-marking steps still need to “Stop The World”. Initial marking only marks the objects directly associated with GC Roots, which is very fast. Concurrent marking is the process of GC Roots Tracing, while re-marking is to revise the marking records of those objects whose marks are changed due to the continuous operation of user programs during concurrent marking. The pause time in this phase is generally slightly longer than in the initial tagging phase, but much shorter than in concurrent tagging.
Since the collector thread can work with the user thread during the longest concurrent markup and concurrent cleanup, the CMS collector’s memory reclamation process is generally executed concurrently with the user thread.
Parameter control: -xx :+UseConcMarkSweepGC
G1
At the forefront of today’s collector technology, the G1 collector is a server oriented collector that takes full advantage of a multi-CPU, multi-core environment. So it’s a parallel and concurrent collector, and it can set pause times. Compared to the CMS collector, the G1 collector has the following features:
1, space integration, G1 collector adopts mark-collation algorithm, will not generate memory space fragmentation. Allocating large objects does not trigger the next GC prematurely because contiguous space cannot be found.
2, predictable pauses, this is another big advantage of G1, reduce the pause time is the common concern of G1 and CMS, but G1 in addition to the pursuit of low pause, also can establish predictable pauses model, can let the user specify in a length of N segment within milliseconds, time on garbage collection may not consume more than N milliseconds, This is almost already a feature of the real-time Java (RTSJ) garbage collector.
The garbage collector mentioned above collects the entire Cenozoic or old generation, which is no longer the case with G1. When using the G1 collector, the memory layout of the Java heap is very different from that of the other collectors. It divides the entire Java heap into independent regions of equal size. While the concept of new generation and old generation is retained, the new generation and old generation are no longer physically separated. They are all collections of partial (possibly discontinuous) regions.
The working process is as follows:
- Initial Marking: Marking the objects that GC Roots can associate with and modifying the value of TAMS requires suspending the user thread
- Concurrent Marking: Reachability analysis from GC Roots to identify viable objects to be executed concurrently with the user thread
- Final Marking: Fixes the need to suspend the user thread during the concurrent Marking phase when data changes due to concurrent execution of the user program
- Live Data Counting and Evacuation: Sequence the recovery value and cost of each Region, and develop recovery plan parameter control based on the expected GC downtime of the user: -xx: +UseG1GC
Garbage collector selection
The choice of garbage collector is based on two key metrics, pause times and throughput.
- Pause time: the time it takes for the garbage collector to respond to the execution of the garbage collection terminal application
- Throughput: run user code time /(run user code time + garbage collection time)
The shorter the pause time, the more suitable for the need to interact with the user program, good response speed can improve the user experience; High throughput can make efficient use of CPU time and complete the operation tasks of the program as soon as possible, which is mainly suitable for the operation in the background without too much interaction.
Let’s subclass the garbage collector introduced above
- Concurrent collectors [pause time first] : CMS, G1 —-> are suitable for scenarios where relative time is required, such as the Web
- Parallel collector [throughput first] : Parallel Scanvent and Parallel Old —-> is suitable for scientific computing, background processing and other weak interaction scenarios
- Serial collector: Serial and Serial Old —-> Suitable for small memory, embedded devices
Finally, when JamesGosling designed Java in 1995, he didn’t realize that The language would have more Web development in The future, that The pause times would be smaller, that everything would be serialized at first, that it would have to Stop The World, which is hard to imagine now, Imagine if you are high on Taobao, suddenly the page does not open, this can you tolerate? Java8’s default PS+Parallel Old was introduced, which was throughput first. Later, Java8 and Java9 had higher time requirements, including CMS, G1, and ZGC, which is not covered in this article. So the garbage collector has been constantly updated as The Times have progressed.