directory
- Garbage collection algorithm
- Mark Sweep
- Replication algorithm
- Mark-compact Algorithm
- Generational collection algorithm
- Generation area size parameter
- Memory allocation and reclamation policies
- Garbage collector
- Serial collector (Replication algorithm)
- ParNew collector (replication algorithm)
- Parallel Scvenge collector (Replication Algorithm)
- Serial Old collector (mark-collation algorithm)
- Parallel Old collector (Mark-collation algorithm)
- Concurrent Mark Sweep collector (CMS)
- G1 collector
- JVM parameters
❝
Four garbage collection algorithms, seven garbage collectors
❞
Garbage collection can be used to determine whether an object is dead or not by reference counting and reachabability analysis. The JVM garbage collection can be used by reference counting and reachabability analysis. Interview will ask, vernacular explanation, white please come in, this article will focus on how to recycle the problem!
Garbage collection algorithm
Zhi-ming zhou teacher’s deep understanding of the Java virtual machine “is a great introductory information, I was writing the JVM also refer to the book, but at the time of writing this article I found there was a mistake, and access to a large number of Google and wikipedia data, confirmed that there is a problem, and the domestic many blogs are used for this kind of mistake, It’s probably the same book.
❝
In the second edition of the book “Deep Understanding of Java Virtual Machine”, Zhou Zhiming, a teacher, wrote on page 69: First mark all the objects to be recycled, after the completion of marking all the marked objects.
❞
For garbage collector old said, objects are divided into recycling and do not need to recycle, here Zhou Zhiming teacher said is to mark all the objects to be recycled, but in fact, the real mark is not need to recycle the object.
Mark Sweep
The mark-sweep algorithm, the first GC algorithm and arguably the oldest GC algorithm, was published in 1960 by John McCarthy, the father of Lisp, in his paper. As the name suggests, the mark-clear algorithm is divided into two phases:
Reachability analysis algorithm: this algorithm is mainly used to judge whether objects survive, a heap is a piece of continuous memory space, a heap has a root root, from this root there are many reference chains, all objects that can reach the root are reachability objects, that is, the surviving referenced objects, do not need to be garbage collection; Objects that cannot reach root, such as Object5, 7, and 8, are unreachable objects and are dead objects that need to be collected by garbage collector, as shown below:
Mark clearing algorithm
- tag
Every once in a while or when there is insufficient space on the heap to a garbage collection, each time the garbage collection to all memory allocated on the heap units marked as “unreachable”, and then from a set of references to start scanning, the unit that can be achieved in all starting from the root references marked as “accessible”, other inaccessible is junk, distinguish just the way they are.
- Clear Retrieves and clears memory units marked as unreachable.
Disadvantages:
- Low efficiency, marking and clearing need to be traversed, low efficiency;
- Space debris problem, as can be seen from the figure, after garbage collection, unused memory space is not continuous
How the problem was discovered:
For cleanup, you just need to know what to recycle and what not to recycle, so it looks like you can mark either one, but that’s all under the premise that you can separate them. In the case of JVMS, with reachability analysis algorithms, objects that need to be reclaimed and those that don’t need to be reclaimed can be separated, but there were no JVMS and the JAVA language didn’t even exist when the original tag scavenging algorithm came out.
So when analyzing this algorithm without the JVM, the confusion is as follows:
- The younger generation has a lot of objects generated and a lot of objects are dead objects that will soon be recycled, so if you mark dead objects it’s a lot of marking and a lot of work.
- In addition, with the advanced reachability analysis algorithm of JVM, GC Roots chain marks all living objects and uses OopMap data structure to mark these living objects. Teacher Zhou said to mark dead objects here, which was contradictory, so I couldn’t convince myself, so I thoroughly checked this problem.
What is really marked is the survival object
. - In addition, I have thought that actually mark the action is 0,1 problem, mark the survival of the death of the natural mark, but this is based on the first mark and then traversal, if the CMS collector is marking as it traverses, this process marks the survival of the object, such Angle is also not valid.
Replication algorithm
The above algorithm is the original, the following garbage collection algorithm is to optimize the above algorithm
Divide the available memory into two equally sized pieces by capacity and use only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.
Advantages:
- No memory fragmentation;
- Allocating memory is simple and efficient, just move the pointer on the top of the heap and allocate memory in sequence.
Disadvantages:
- Memory is reduced to half of the original, space utilization is too low;
- If there are many living objects, many objects need to be copied
Mark-compact Algorithm
Mark-compact, the marking process is still the same as mark-clean, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects to one end. Compact means Compact, and then clean up the memory directly beyond the end boundary.
Generational collection algorithm
Sources of generational thinking
Currently, garbage Collection of commercial VMS uses a “Generational Collection” algorithm. This algorithm divides memory into several blocks based on the survival cycle of objects, and then selects one of the three algorithms to exploit the advantages of each block based on the survival time characteristics of objects in the block.
The following graph is a test done by a doctor who concluded that most Java objects live only for a short time, while the few that survive live for a long time.
Because of this feature, the idea of generational recycling of Java virtual machines is created. In simple terms, the heap space is divided into two generations, called Cenozoic and old age. The new generation is used to store new objects. When an object survives long enough, it is moved to the old age.
Generational collection
The young generation
The red ones represent the surviving objects, and From and To are survivor zones.
Most newly created objects will be allocated to Eden, and most of them will soon die. The Eden area is a contiguous memory space, so allocating memory on it is extremely fast. When Eden is full, Minor GC is performed.
As shown in the figure, the memory of the young generation is divided into a large Eden space (western Eden represents the birth of life) and two small survivor Spaces. The default ratio is 8:1:1, that is, the available memory space of each new generation is 90% of the capacity of the whole new generation. Use Eden and one of those survivour each time. When the collection is done, the surviving objects in Eden and survivor are copied to another survivor, and Eden and survivor are cleaned up. If another survivor space does not have enough memory space to hold surviving objects collected from the previous generation, these objects will enter the old age directly through the allocation guarantee mechanism.
Old generation
Young generation can be seen to use the "copy algorithm", older objects live more, less die, use the tag sorting algorithm.
The permanent generation
Many people refer to the method region as a permanent generation, but essentially the two are not equivalent.
The Java Virtual Machine specification does not require the method area to be garbage collected because this area is cost-effective to collect. But in many places where ByteCode frameworks such as reflection, dynamic proxy, and CGLIB are heavily used, as well as dynamically generated JJSP, and OSGI frequently define ClassLoder, the runtime data area of the method area increases content like crazy at runtime, and without the garbage collector it is easy to get OOM.
So the HotPot design team designed permanent generation garbage collection, a term built on the HotPot virtual machine. The design team extended GC generation collection to the method area, which is implemented with permanent generation so that the garbage collector can manage method area memory like the heap. There is no permanent generation for other virtual machines.
There are two types of permanent generation recycling: constant in constant pool, useless class information, constant recycling is very simple, no reference can be recycled. To recycle useless classes, you must ensure three things:
- All instances of the class have been reclaimed
- The ClassLoader that loaded the class has been reclaimed
- The Class object of the Class object is not referenced (that is, there is no place where the Class is referenced by reflection)
Generation area size parameter
- -Xms Sets the minimum size of the heap
- -Xmx Sets the maximum size of the heap
- -xx :NewSize Sets the minimum space size of the new generation
- -xx :MaxNewSize Sets the maximum space size of the new generation
- -xx :PermSize Sets the minimum space size of the permanent generation
- -xx :MaxPermSize Sets the maximum space of the permanent generation
- -Xss sets the stack size for each thread
There is no direct setting of the old age parameters, but you can set the heap size and the new generation space size two parameters to indirectly control.
Old age space size = heap space size - young generation large space size
Memory allocation and reclamation policies
- Large objects are Java objects that require a large amount of contiguous memory. Typically, large objects are very long character strings or large arrays. – XX: PretenureSizeThreshold can make more than the set value of objects directly in the old s distribution. This avoids a lot of memory replication between Eden and two Survivor regions;
- An object’s age increases by one year each time it “survives” a Minor GC in a Survivor zone, and when it reaches a certain age (15 by default), it is promoted to the old age;
- Age determination of dynamic objects If the total size of all objects of the same age in Survivor space is greater than half of Survivor space, objects older than or equal to this age can be directly entered into the old age without waiting for the age required in MaxTenuringThreshold.
- The space allocation guarantees HandlePromotionFailure, which checks if the maximum contiguous space available in the old age is greater than the average size of the objects promoted to the old age, and if so, a Minor GC is attempted. If less than, or if the setting does not allow risk, then a Full GC should be performed instead.
Garbage collector
The young garbage collector is at the top and the old garbage collector is at the bottom. There is a line between the two collectors, indicating that they can be used together.
There is no best garbage collector, only the most appropriate garbage collector.
Serial collector (Replication algorithm)
Single-threaded (single-threaded means not only that it uses a CPU or a garbage collection thread to do garbage collection, but that it must “stop the world” by suspending all other worker threads while it collects garbage).
“Stop the world” is started and ended by the garbage collector in the background, and the user can’t see it by secretly stopping the user’s thread. If it runs for an hour, it has to stop for five minutes to collect. This is very painful, so reducing this pause time is the direction of the garbage collector generation after generation.
Enable parameter: -xx :+UseSerialGC
Application scenario: User desktop application scenario (This kind of collector is simple. The memory allocated by desktop application to vm management is usually tens of megabytes to hundreds of megabytes, and the collection pause time is tens of milliseconds or hundreds of milliseconds, so the pause time is completely acceptable)
ParNew collector (replication algorithm)
The multithreaded version of The Serial collector behaves exactly The same as The Serial collector, including all The control parameters available to The Serial collector, collection algorithms, Stop The World, object allocation rules, collection policies, and so on, except that it uses multiple threads for garbage collection.
Enable parameter: -xx :+UseParNewGC
Application scenario: In Server mode, the ParNew collector is a very important collector because it is currently the only one besides Serial that works with older CMS collectors; However, in a single CPU environment, it is no better than the Serail collector because of the thread interaction overhead.
Parallel Scvenge collector (Replication Algorithm)
Others are similar to ParNew, except that collectors such as CMS focus on minimizing the pause time of user threads during garbage collection. The goal is to achieve a manageable throughput that can be done in the background without much interaction.
Parallel Insane Focus: Controlled throughput
Enable parameters: -xx :+UseParallelGC
Application scenario: Background computing does not require much interaction, for example, applications that perform batch processing, order processing, payroll payment, and scientific computing.
The Parallel Insane provides two parameters for precise throughput control:
- -xx :MaxGCPauseMillis // Maximum garbage collection pause time (greater than 0 ms)
- -xx :GCTimeRatio // Throughput size (integers greater than 0 and less than 100, throughput percentage)
- -xx :+UseAdaptiveSizePolicy // Assigns memory tuning to VM management
Serial Old collector (mark-collation algorithm)
From here on out is the old garbage collector.
Serial Old is an older version of the Serial collector, which is also a single-threaded collector using a mark-tidy algorithm. The main significance of this collector is also for the use of virtual machines in Client mode, that is, the desktop application scenarios of users.
Parallel Old collector (Mark-collation algorithm)
The Parallel Old is an older version of the Parallel Avenge collector that uses multithreading and a “mark-and-collate” algorithm. This collector has only been available since JDK 1.6 to replace the Serial Old collector.
Application scenario: Especially in Server mode with multiple cpus. The Parallel Insane plus the Parallel Old collector is the result of the Parallel Insane insane for throughput and CPU-sensitive scenarios.
Concurrent Mark Sweep collector (CMS)
The CMS(Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. The CMS collector is implemented based on a mark-sweep algorithm. The whole process is divided into four steps,
- Initial tag (mark only objects to which GC Roots can be directly associated; Very fast; But you need to “Stop The World”)
- Concurrent markup (the process of GC Roots Tracing; The surviving object is marked in the collection just generated; The application is also running; There is no guarantee that all live objects will be marked;)
- Re-mark (to correct the mark record of the part of the object whose mark changed during concurrent marking because the user program continued to operate; You need to “Stop The World”, and The pause is slightly longer than The initial tag, but much shorter than The concurrent tag; Multi-threaded parallel execution to improve efficiency;)
- Concurrent cleanup (reclaiming all garbage objects).
Advantages: Concurrent collection, low pauses;
Disadvantages:
- Floating garbage cannot be handled because, due to the CMS concurrent cleanup phase, the user thread continues to execute and new garbage is generated along with the program. This part of garbage occurs after the tag and CMS cannot process it during the next collection, leaving it for the next gc. A “Concurrent Mode Failure” may occur. At this point, the JVM enables the fallback: the Serail Old collector is temporarily enabled, resulting in a longer pause and possibly another Full GC.
- Sensitive to CPU resources. Concurrent collection does not suspend user threads, but it can slow down the application and reduce overall throughput because it consumes CPU resources. The default number of collection threads started by CMS is (number of cpus +3) /4. When the number of cpus is more than 4, the collection threads occupy more than 25% of the CPU resources, which may have a great impact on user programs. Less than four, the impact is greater and may be unacceptable.
- Using the tag clearing algorithm, there is a lot of memory fragmentation, large object allocation is difficult, there is a lot of space in the old but not continuous, need to trigger another Full GC action in advance. In order to solve this problem a CMS offers parameters – XX: + UseCMSCompactAtFullCollection open (the default), used to hold Full GC start compression memory fragments, pause time longer, Offers another parameter – XX again: CMSFullGCsBeforeCompaction = 0 to determine what the implementation of many, many times without compression after Full GC, to a compression, the default is 0, or every time to enter the Full GC, is to defragment a memory.
Enable parameter: -xx :+UseConcMarkSweepGC
Application scenario: On the Internet or WEB server
G1 collector
Look at the two introductions above.
Enable parameter: -xx :+UseG1GC
Application scenario: Server application
JVM parameters
# JVM parameter Settings
RUN JVM_ARGS="-server -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -Djava.io.tmpdir=/tmp -Djava.net.preferIPv6Addresses=false" && \
JVM_GC="-Xloggc:/tmp/gc.log -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps" && \
JVM_GC=$JVM_GC" -XX:CMSFullGCsBeforeCompaction=0 -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=80" && \
JVM_HEAP="-XX:SurvivorRatio=8 -XX:+HeapDumpOnOutOfMemoryError -XX:ReservedCodeCacheSize=128m -XX:InitialCodeCacheSize=128m"
ENTRYPOINT java -Djetty.http.port=8080 \
${JVM_ARGS} \
${JVM_GC} \
${JVM_HEAP} \
-Dspring.profiles.active=${PROFILE} \
-Xmx${JVM_XMX} \
-Xms${JVM_XMS} \
-jar /root/${MODULE_NAME}.jar
Copy the code
My company uses a CMS collector.
-
– XX: CMSInitiatingOccupancyFraction this parameter refers to the use of CMS collector cases, old age when using the memory of the specified threshold, triggering FullGC. Such as: – XX: CMSInitiatingOccupancyFraction = 80: CMS garbage collector, when old s reached 80%, triggering a CMS recycling;
-
-xx :SurvivorRatio Parameter indicates the ratio of Survivor zones to Eden zones:
-xx :SurvivorRatio=8 Indicates that two Survivor: Eden = 2:8 and each Survivor accounts for 1/10.
- An interesting, but often overlooked, area of memory in the JVM is the “code cache,” which is used to store native code generated by compiled methods. Code caching does rarely cause performance problems, but when it does, it can be devastating. If the code cache is full, the JVM prints a warning message and switches to interprelator-only mode: the JIT compiler is disabled, and bytecode will no longer be compiled into machine code. So the application will continue to run, but an order of magnitude slower, until someone notices the problem. Just like any other memory area, we can customize the size of the code cache. The relevant parameters are -xx :InitialCodeCacheSize and -xx :ReservedCodeCacheSize.
Have a harvest of friends point “watching”? Welcome to pay attention to me, grow up together ~
Resources: Chi-ming Chow, Inside the Java Virtual Machine, Google, Wikipedia
JVM garbage collection (1) : The interview will be asked, explained in Chinese, small white please come in
The JVM memory model is the root cause of Java cross-platform
Sorry, you probably don’t even know what the new object is actually doing!