Several algorithms for garbage collection
- Generational collection theory Vm garbage collection adopts generational collection algorithm, that is, memory is divided into logical blocks based on the life cycle of objects. Java divides memory into new generation and old generation, so you can select an appropriate collection algorithm based on the characteristics of objects in different blocks. In Cenozoic, for example, each collect a large number of objects are removed, so you can choose to copy algorithm, only need to pay a small amount of object replication costs can be completed each garbage collection, and the object of the old s survival rate is very high, and there is no extra space allocated guarantee, so you can choose to clear and marking sorting algorithm. Tag clearing and tag collation will be much slower than the copy algorithm, about ten times slower
- Tag – copy memory replication algorithm is divided into two blocks of the same size, use of time can only use one of the blocks (that is, can only use half of memory), garbage collection, will be those marked not garbage, copied to the other half block, and then directly remove the original block of memory. This can be very efficient and there is no memory fragmentation, but the problem is memory waste
- The mark-clear algorithm is divided into two phases: mark/clear. Mark stage: mark the objects that should survive. Clear stage: uniformly reclaim all unmarked memory. This operation will be inefficient, if there are a large number of objects, it will take a long time to mark, and after cleaning up the memory, there will be a large amount of memory fragmentation
- – sorting algorithms will be marking and finishing two stages, mark phase is also will object tags, which are not garbage sorting phase is to those who is marked (not garbage objects) to the mobile memory end, clear and then directly outside the boundary of memory Such operations benefits is no memory fragments
Garbage collector
These three collection algorithms are memory collection methodologies, and the garbage collector is an implementation of these methodologies
There is no best garbage collector, only the best garbage collector. We can’t find the best one, but we can choose the garbage collector that suits our application scenario
Serial collector (-xx :+UseSerialGC -XX:+UseSerialOldGC)
The Serial collector is The most basic and oldest garbage collector. It is “Serial” not only because it has only one thread for garbage collection, but also because while it is performing GC, all user threads Stop The World (STW: Stop The World) until garbage collection is complete.
The new generation adopts copy algorithm and the old generation adopts mark-collation algorithm
Serial’s STW is very user experience intensive, so in subsequent garbage collectors, efforts have been made to reduce STW times, but Serial has some advantages: for example, even a single thread can perform collections efficiently because there is no overhead of thread interaction
The SerialOld collector, an older version of the Serial collector, is also a single-threaded garbage collector. It has two main uses:
- Used with the Parallel Scanvenge collector in jdk1.5 and earlier
- As a fallback to the CMS collector
Avenge collector (-xx :+UseParallelGC, -xx :+UseParallelOldGC)
The PS+PO collector is the default garbage collector for jdk1.8
The PS collector is essentially a multithreaded version of the Serial collector, with the exception of multithreaded garbage collection, its behavior (control parameters, collection algorithms, collection strategies, and so on) is similar to that of the Serial collector. The default number of phone threads is the same as the number of CPU cores. Of course, you can specify the number of threads to collect with the parameter (-xx :ParallelGCThreads), but you generally don’t need to change this.
While the PS collector is concerned with throughput (efficient utilization of CPU), a collector like CMS is more concerned with the pause times of user threads.
The PS collector provides a number of parameters for the user to find the best STW time or maximum throughput. If you are not familiar with how the collector works, you can choose to leave memory management optimization to the virtual machine.
The new generation of PS+PO collector adopts the copy algorithm, and the old generation adopts the mark-collation algorithm.
The PO collector is an older version of the PS collector, using multithreading and mark-collation algorithms, and PS+PO is an option where throughput and CPU resources are important.
ParNew collector (-xx :+UseParNewGC)
The PN collector is similar to the PS collector, except that it can be used in conjunction with the CMS collector
The new generation adopts copy algorithm and the old generation adopts mark-collation algorithm
It is the first choice for many virtual machines running in Server mode, and is the only one that works with the CMS collector other than the Serial collector
CMS collector (-xx :+UseConcMarkSweepGC(old))
CMS(Concurrent Mark Sweep) translates into Concurrent Mark Sweep. The algorithm it implements is mark-clear. Therefore, memory fragmentation occurs.
CMS is a GC collector whose goal is to achieve minimal collection pause times, which is ideal for use in user experience oriented applications. It was the first garbage collector to allow the user thread to work (basically) at the same time as the GC thread.
It has five stages
- The initial tag suspends all worker threads (STW) and records objects that GC ROOTS can reference directly, which is fast
- Concurrent marking The concurrent marking phase is the process of traversing the entire object graph starting with the directly associated objects of GC ROOTS. This process is time consuming, but I don’t need STW, so the user experience is better, but the subsequent question is: when the user thread continues to run, will change the state of each object should be (should be GC thread decide to delete, but cited, ought to have been the GC thread decide to live, and yet be disconnected reference)
- Relabeling The relabeling phase addresses the problem of the second phase. The pause time at this stage is slightly longer than the initial tag pause and much shorter than the concurrent tag pause. The main use of tricolor markup in the incremental update algorithm for re-marking
- Concurrent cleanup starts the user thread, while the GC thread starts to clean the unmarked area. If new objects are marked black at this stage, no processing is done (multiple tags: floating garbage is raised).
- Concurrent reset Reset data for this GC mark (multi-mark)
CMS is a better garbage collector because it collects concurrently and has low pauses, but it has several very obvious drawbacks
- Sensitive to CPU resources (and user threads grabbing resources)
- There is no way to handle floating garbage (garbage generated during the concurrent marking and concurrent cleanup phases), which can only wait for the next GC to clean up
- Mark – clear algorithm can bring a lot of memory fragments, certainly through the parameter (- XX: + UseCMSCompactAtFullCollection) let the JVM the execution of the tag to remove do again after finishing
- There will be some uncertainty in the execution process: For example, the previous garbage collection may not have been completed, and then the garbage collection will be triggered again, especially during the concurrent marking and cleaning phases. The system may run while the garbage collection is being completed, and then the FullGC will be triggered before the garbage collection is completed, i.e. “Concurrent mode failure”. At this point, STW is performed directly and collected using the SerialOld garbage collector
Because CMS uses incremental update algorithm to solve multi-label problems, it also causes some low-probability bugs
Reachability analysis algorithm and three-color marking method
To perform a collection, the GC first marks which objects are alive and which are reclaimed, and then tells the GC thread to reclaim memory through various state markers. So how do you find out which objects are alive/should be reclaimed?
The limit to whether an object is garbage is whether it is also referenced by other objects. Hence reference counting
Reference counting method
Set aside space in the object header to store the number of references to the object. Each time an object is referenced, the count is +1. Each time a reference is broken, the count is -1. But this can be buggy
For example, class A and class B depend on each other, but no one else uses these two objects, but the reference count of these two objects is always 1 because they reference each other, then these two objects will never be cleaned up, resulting in A memory leak
Accessibility analysis algorithm
Reference counting method is the root cause of memory leaks occur, only consider themselves without considering the other object, in fact, in the right way should judge whether the object reference by other objects, and a program running, there must be some of the most fundamental object, will be like a tree has been reference to all objects, this is the principle of accessibility analysis algorithm.
The reachability analysis algorithm starts from GC ROOTS to traverse the object graph to see which objects are reachable. If not, it is garbage.
GC ROOTS are the most fundamental objects
- The object referenced in the virtual machine stack (the local variable table in the stack frame)
- The object referenced by the class static property in the method area
- The object referenced by the constant in the method area
- Objects referenced in the local method stack
Tricolor marking
As the name suggests, three colors mark objects, each representing a different tag state.
- Black indicates that the object has been accessed by the garbage collector and that all references to the object’s member variables have been scanned. If there are other object references to black objects, there is no need to rescan. A black object cannot point directly to a white object (without passing the gray object)
- Gray indicates that the object has been accessed by the garbage collector, but at least one of its member variable references has not been scanned
- White indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all objects are white, and at the end of the analysis, objects that are still white represent unreachability.
Multi-label problem – Floating garbage
In the process of concurrent marking, the object A is reachable and will be marked as black, but the execution of the user thread may cause the object A to be disconnected from the GC ROOTS. Then the object A should be white, but since it has been marked as black previously, the GC collector will no longer pay attention to the object A. This means that the A object is overmarked, and the garbage collection will not clear the A object. This is the multi-label problem, where an A object is floating garbage that can only be collected by the next GC
Undermarked – deleted by mistake
During concurrent marking, object A is unreachable, but execution by the user thread makes object A reachable, but garbage collection has already started, which is A very serious bug.
Write barriers
In the JVM, assigning a value to a member variable of an object looks something like this
/** * @param field Member variable of an object, for example, A.B.D * @param new_value New value, for example, null */
void oop_field_store(oop* field, oop new_value) {
*field = new_value; // Assign
}
Copy the code
Write barriers include some processing before and after the assignment (similar to AOP)
void oop_field_store(oop* field, oop new_value) {
pre_write_barrier(field); // Write barrier ‐ pre-write operation
*field = new_value;
post_write_barrier(field, value); // Write barrier ‐ post write operation
}
Copy the code
Incremental Update -CMS solves under-indexing problems
When a new reference is inserted into a black object, the black object is changed to gray, and after the concurrent marking is complete, the re-marking phase takes the gray object as root, instead of starting with GC ROOTS and re-scanning it
The write barrier implements incremental update. When the member variable reference of object A changes, for example, the new reference a.d = d, the write barrier can be used to record the new member variable reference of object d
void post_write_barrier(oop* field, oop new_value) {
remark_set.add(new_value); // Record the newly referenced object
}
Copy the code
Incremental update bug
The GC1 thread is marking C in object A, and the object A is gray. Then the user thread points the reference to C to A white object F, so the GC2 thread sets the object A to gray, and the GC1 thread resumes running. The GC1 thread only knows that C is marked, D is marked, and D is marked, making sure ACD is marked successfully, turning A black, and F is missed.
Write Barrier + Snapshot At The Beginning SATB -G1
When the gray object wants to delete the reference relation to the white object, the reference to be deleted is recorded. After the concurrent scan, the gray object in the recorded reference relation is scanned again, so that the white object can be scanned. If other references point to the white object during the rescan process, the white object is marked as black directly (the purpose is to make the object survive the current GC and wait for the next GC to rescan, which may also be floating garbage).
Write barriers implement SATB
void post_write_barrier(oop* field, oop new_value) {
remark_set.add(new_value); // Record the newly referenced object
}
Copy the code
SATB fixes the incremental update missing mark bug at the expense of more floating garbage