Generational garbage collection
Fundamentals of Garbage Collection
As shown in the figure below: Garbage collector mainly collects heap memory, which is divided into: new generation and old generation. For collecting new generation GC: Minor GC or Young GC. Collecting older GC is called “Major” or “Old” GC. Note that the Full GC does not only reclaim heap memory, but also the method area (in JDK1.8, the method area is in metadata; In JDK1.7 method area in permanent generation) generation recycling theory:
- The vast majority (98%) of ephemeral objects were placed in the Cenozoic era
- The harder it is to recycle objects that have survived multiple garbage collections
So after the above theory is divided into the new generation and the old age, then what is the garbage recycling algorithm?
Algorithms in the new generation: replication algorithms
In the old algorithm: mark clearing algorithm, mark sorting algorithm
Garbage collection algorithm
Replication algorithm
Simple to implement, efficient to run, no memory fragmentation, but only half of the space utilization.
What is the principle of the replication algorithm?
- You need to split the memory space in two, with half used and the other half reserved
As shown in the figure below, half of the space is used as reserved space that does not allocate objects before GC, while the other half allocates objects. The circle we added in the figure below is an object, and when half of the space is full, GC garbage collection is performed.
So there’s the next step.
- Copy the surviving objects to the reserved space. At this time, the reserved space will allocate objects. Any surviving objects will be recycled directly, and the other half of the space will be formatted as reserved space.
Note: In the last article I talked about how to determine the survival of objects: reachability analysis and reference counting
We assume that only two of the above four objects survive, and then the two objects will be copied into the reserved space, and the previous space will be directly formatted. After the second step, the result will be as follows:
- The new object is then placed in the current space, and if the space is full again, the second step is continued.
Now let’s look at how the new generation of heap uses the replication algorithm.
Source of Eden District: Appel recycling, improving space utilization rate and space allocation guarantee the source of Eden area, which is mainly the basis to solve the above replication algorithm theory. The problem that the space utilization rate is only half, Eden area accounts for 80% from 10% to 10%. Almost all objects are allocated in Eden area first. It will judge the survival of the object (98% of the objects die overnight), collect the garbage, and enter the From area. After that, Eden will be emptied to prepare for the next object allocation. When Eden area is full again, garbage collection will continue To be triggered. Eden’s surviving objects will be placed in To area, and the surviving objects in From area will also enter To area after reachablity analysis. You may only understand the text, but let me draw a picture to show it.
- Almost all objects in Eden will be allocated here. Generally, large objects will enter the old age directly
- When Eden space is full, garbage collection will be triggered to judge the viable objects according to root reacability and allocate the viable objects to From area. Meanwhile, Eden area will be cleared to prepare for the next allocation of objects
- When Eden space is full again, GC is triggered To judge the viable objects according To the root reachability, and the viable objects are copied To the To area. Meanwhile, the viable objects in From area are copied To the To area after reachability analysis
- When Eden is full again, the same step is triggered, and the surviving objects are copied back and forth between From and To.
- After the above n steps, we know that From area and To area only have 10% space. When Eden area is full, the surviving objects should be moved To To area. However, after n steps, there are too many surviving objects that exceed 10% reserved space, and the space allocation guarantee operation will be triggered at this time, and the area will enter the old age area. The JVM also has an optimized operation, which I mentioned in a few previous articles, which increases the generational age of objects in the new generation without a single GC, and when the generational age reaches 15(not absolute) and objects are still alive, they enter the old age region. (OK step, will be the previous knowledge will be connected)
This illustration illustrates the new generation of replication algorithms. The new generation of replication algorithms use Eden area to reduce the space utilization rate to only 10% reserved, and the rest of the space is used.
Note: if it is a large object it will go straight to the old age.
Mark-sweep algorithm
Discontinuous position, resulting in fragmentation; Slightly lower efficiency, two scans; Space utilization rate 100%
- Scan space according to the reachability analysis to mark recyclable objects, marked.
- The scan space is cleared according to the objects that can be recycled.
The token clearing algorithm causes memory discontinuity and fragmentation.
Suppose an object occupies 5 grids of memory, but the reclaimed space is discontinuous and has no more space to store, so it cannot be allocated.
So you have the mark-collation algorithm, which collates memory Spaces that are not contiguous
Mark-compact Algorithm
No memory fragmentation, low efficiency, two scans, Pointers need to be adjusted.
The efficiency of the mark-up algorithm is low, because the pointer needs to be moved. Suppose that the pointer of an object is at position 1, but the mark-up pointer changes to position 10, then the corresponding reference needs to be changed. If the method is changed, all threads need to pause to update the corresponding reference.
- On the first scan, the recyclable object is marked, which is similar to the first step of the tag removal algorithm
- The second scan clears the recyclable objects and defiles the memory space. (This step is one more step than the mark clearing algorithm to make the space continuous)
A common garbage collector in the JVM
- Single-threaded garbage collector
- Multithreaded parallel garbage collector
- Concurrent garbage collector
collector | Reclaim objects and algorithms | Collector type |
---|---|---|
Serial | New generation, replication algorithm | Single threaded (serial) |
Parallel Scavenge | New generation, replication algorithm | Parallel multithreaded collector |
ParNew | New generation, replication algorithm | Parallel multithreaded collector |
Serial Old | In the olden days, tag sorting algorithms | Single threaded (serial) |
Parallel Old | In the olden days, tag sorting algorithms | Parallel multithreaded collector |
CMS | In the olden days, tag clearing algorithms | Concurrent multithreaded collector |
G1 | Cross Cenozoic and old age: mark finishing + divide whole into parts | Concurrent multithreaded collector |
- Single thread collection Serial/Serial Old:
Parameter: UseSerialGC the oldest single thread, exclusive. Only applicable to tens of megabytes to one or two hundred megabytes, each garbage collection will Stop The World (STW), Stop The business thread, and update The reference of The business thread after garbage collection.
Figure below, single thread collection
All user threads are paused and the GC thread is started during GC
STW(Stop The World) : The focus of garbage collector development is to reduce STW. With the development of garbage collectors, there are multi-threaded garbage collectors and concurrent garbage collectors
- Avenge /Parallel Old
The Parallel Scavenge/Parallel is OldJDK1.8 default garbage collector
Apply the Insane
-sheldon: I’m trying to Parallel Old
Garbage collector is multi-threaded to improve collection efficiency
Here are the relevant parameters:
Throughput = time to run user code /(Time to run user code + garbage collection time)
UseAdaptiveSizePollcy: completes the maximum throughput
The Parallel Insane /Parallel Old garbage collector seeks throughput, but the STW doesn’t decrease much.
What harm will STW bring?
Assuming that STW garbage collection is paused for 10s and the request response takes only 1s then the total request is 1s + 10s= 11s
To reduce STW time, the concurrent garbage collector was created.
- Concurrent garbage collector CMS(Concurrent Mark Sweep)/ParNew
CMS is mainly aimed at older garbage collectors, mainly to reduce STW. The Parallel Exploiture is the next-generation garbage collector, but the Parallel Exploiture does not reduce the STW insane, so ParNew is used to pair CMS with ParNew, which is responsible for the new generation and CMS for the older generation. CMS uses the tag clearing algorithm, so how does CMS concurrent tag clearing? See the following:
- Initial tagging: Tagging objects that GC roots are directly related to is very fast. The business thread is suspended at this point
- Concurrent tags: In the process of GC Roots Tracing, there will be a large amount of tags and a long time to mark recyclable objects. Therefore, the concurrent method is adopted to let the business thread and the concurrent tags run together without suspending the business thread. The GC and the user thread run at the same time, there is bound to be some deviation, some variation and unmarked in between, so there is re-marking
- Relabelling: In order to correct the record of the part of the markup that changed during the concurrent tagging because the user program continued to run, this process is longer than the initial tagging, but far longer than the concurrent tagging end, so all user threads (STW) are suspended and the rest of the tag is marked for a short time.
- Concurrent cleanup: In the above mentioned tag cleanup algorithm, it is necessary to delete the marked recyclable objects one by one. This operation must take a long time, so use concurrent cleanup to let the user and the GC run simultaneously.
- Reset the thread
Concurrent tagging and concurrent cleanup can be very effective in reducing STW.
CMS is an excellent collector, and its main advantages are already evident in its name: concurrent collection, low pauses
Disadvantages of CMS:
- Sensitive to CPU resources: Programs designed for concurrency are sensitive to CPU resources. Although they will not cause user threads to pause in the concurrent stage, they will occupy a part of the threads, which will slow down the application and reduce the total throughput. By default, CMS starts the number of garbage collection threads (NUMBER of cpus +3)/4, that is, when there are more than 4 cpus, concurrent garbage collection threads should not be less than 25% of the CPU resources. When there are less than 4 cpus, CMS’s impact on user programs is likely to become very large.
- Unable to handle floating garbage: Since the CMS concurrent cleanup phase user threads are still running, new garbage is naturally generated as the program runs. This part of the garbage appears after the marking process, and the CMS cannot collect and process it again, leaving it for the next GC to clean up. If the old age 100m, when to 92m trigger garbage collection, and this time the old age has occupied 90m, and this time generated 20m floating garbage how to deal with? Serial Old will be used instead
- When the space fragmentation is too much, it will bring great trouble to the allocation of large objects. There will often be residual old space, but it cannot find enough continuous space to allocate the current object.
The problems with CMS garbage collectors are numerous. It is very cumbersome to maintain, so in JDK1.7, JDK1.8, CMS is not set as the default garbage collector. But CMS is the cornerstone of the concurrent tag garbage collector, and while there are many problems, the concurrent tag idea of CMS is what makes the subsequent G1 good garbage collector
Why doesn’t CMS use a mark-up algorithm? If the use of the tag sorting algorithm, need to clean and tidy at the same time the user thread will run, and the tag sorting involves pointer adjustment, then the CMS has to do one more step, but the use of tag clearing only need to clear the tag. After CMS, the JVM introduced the G1 garbage collector.
- G1(Garbage First) Garbage collector
G1: Pursue pause time; Region area; Screening recovery; Predictable pause; Copy and tag collation algorithms
G1 divides the whole heap into parts, creating Region:
E and S are Cenozoic;
O is for old times
H means big objects are old
The G1 garbage collector operates in the following steps:
- Initial markup: Just Mark objects that GC Roots can be directly associated with, and modify the value of TAMS (Nest Top Mark Start) so that objects can be created in the correct Region when the user program runs concurrently in the next phase, which requires a short pause of the thread.
- Concurrent marking: Reachability analysis of objects in the heap, starting with GC Root, is performed to find viable objects, which is time-consuming but can be performed concurrently with user programs.
- Final mark: In order to correct the part of the tag record that changed during the concurrent tag because the user program continued to operate, the virtual machine recorded the changes in the thread’s Remembered Set Logs during this time. The final marking phase requires the consolidation of data from the Remembered Set Logs into the Remembered Set. This phase requires the thread to be paused, but can be performed in parallel.
- Filter collection: The collection value and cost in each Region are first sorted, and a collection plan is made based on the expected GC pause time of the user. In this phase, concurrent execution can also be performed with user programs. However, since only part of the Region is reclaimed, the time is controlled by the user, and suspending the user thread greatly improves collection efficiency.
Parameters for starting G1 garbage collector:
G1 will be difficult to understand, so I will explain it in a separate article, mainly to understand the design idea and operation mode of G1.