This is the 6th day of my participation in the August More Text Challenge
preface
In the previous article, object allocation & garbage collection mechanism, from the creation process of an object to its memory distribution, and through the algorithm of the JVM to determine the mechanism of object survival, through the object survival mechanism, garbage collection and four garbage collection algorithms. This article focuses on the garbage collector in the JVM and tuning with the JVM to gain a basic understanding of the JVM.
To learn to
- Common garbage collector
- Multithreaded garbage collection STW
- CMS garbage collector
- The JVM tuning
Before we begin, let’s look at garbage collector types
Garbage collector classification
-
Throughput (Throughput-first garbage collector)
- Throughput: THE JVM performs GC in a GC thread. When GC is performed, it competes with application threads for the clock cycles of the current CPU, where throughput is referred toApplication threadTake upTotal program timeThe proportion of. We can do this by setting parameters
-XX:GCTimeRatio
To control throughput, if the default value is 99, the total GC time is 1% and the throughput is 99%. If set to 19, the GC takes up 1/(19+1) 5% of the time and the throughput is 95%. High throughput make program users feel only application in work, high throughput programs run faster, and the pause time is short, can improve the user experience, will not cause the application to suspend and form the phenomenon of caton, game development, for example, your program partition time recycling garbage, user interface card, once game experience is very poor.
- Throughput: THE JVM performs GC in a GC thread. When GC is performed, it competes with application threads for the clock cycles of the current CPU, where throughput is referred toApplication threadTake upTotal program timeThe proportion of. We can do this by setting parameters
-
Minimum pause time (responsivity priority garbage collector)
- Pause time: a period of time during which the application thread stops the world by allowing the GC thread to execute, such as 100ms during GC, during which no application is active.
However, high throughput and low pause times are at odds, and GC requires certain preconditions in order to run safely. You must ensure that the application thread does not modify the state of objects while the GC thread determines which objects are reachable. Therefore, the application thread needs to be paused to ensure proper access by the GC thread, and GC is expensive, resulting in pause times and reduced throughput. Therefore, the throughput needs to be increased by minimizing GC as much as possible.
However, only occasionally run GC means that whenever the GC runtime will have a lot of work to do, because in this period accumulation in the number of objects in the heap is very high, a single GC need to spend more time to complete, resulting in a higher average and maximum cost time, therefore, considering the low pause times, the operation of the best frequent GC to finish quickly, This in turn increases overhead and leads to increased throughput.
Common garbage collector
Single-threaded garbage collector
- SerialWith Serial Old, you need to pause all programs
- Single-threaded serial New generation replication algorithm
- Serial Old
- Single – threaded serial old – age – marker collation algorithm
Multithreaded parallel garbage collector
- Parallel ScavengeParallel Old (PS for short) with throughput first
- New generation replication algorithm parallel multithread collector
- Parallel Old
- Old age tag collation algorithm parallel multithreaded collector
- ParNewMatch the CMS
- New generation of parallel multithreaded collector copy algorithm
Concurrent garbage collector
- CMS garbage collectorConcurrent Mark Sweep token cleaning algorithm
-
The ParNew garbage collector can be configured separately for older garbage collectors
-
CMS garbage collector
CMS as a whole consists of four phases initial marking, concurrent marking, re-marking and concurrent cleaning.
Initial tag
The process is short and fast, simply marking the first object that GC Roots can directly associate with
Concurrent tags
The process of GC roots tracing is carried out simultaneously with the user’s application. It takes a long time to mark all objects associated with GC roots and start traversing the whole reachable analysis path, so it adopts concurrency (garbage collector thread and user thread work at the same time).
To mark
Ephemeral, in order to correct the mark record of the part of the object whose mark changed because the user program continued to operate during the concurrent mark, the user program pause time in this phase is slightly longer than the initial mark, but much shorter than the concurrent mark time.
Concurrent remove
In this case, the tag clearing algorithm is used to release unreachable objects without affecting the business thread (only those related to GC Roots can be executed), and without GC Roots, objects (both unreachable objects) may themselves be garbage
Because the CMS garbage collector is based on high user response, in CMS, user program pauses involve only two phases, one initial marking and the other re-marking. While the initial marker is directly influenced by the root, GC roots cannot change it (programmatically). Therefore, we need to pay attention to the re-marking phase. If some of the re-marking can be done ahead of time in concurrent tags, then re-marking does not have to do so much, resulting in less application pause time, and thus reduced system response. And then optimize it
The JVM also does the following to optimize the concurrency markup
Concurrent markup – pre-clean
At a time
In the concurrent marking stage, the business thread is not suspended. If an object from new at this time points to an unmarked object in the old age (garbage), that is, the Eden area of the new generation references an unmarked object in the old age, the old age needs to be marked
When old s reference changes in internal threads (business), we need to tag, use a similar card table structure, the object of reference change records, can avoid the mark phase again for processing (phase to tag must need to find the change of objects), so that the mark phase again not traverse the entire old s object. To optimize.
Concurrent marker – Concurrency can interrupt pre-cleanup
A prerequisite for
The Eden memory usage has reached 2M. Procedure
It is cyclic and interruptible
Preprocessing Eden area, and concurrent preprocessing can be interrupted for the following processing
Processing objects in the From and to sections results in reference changes in older concurrent tags analogous to preprocessing
When the internal reference changes in the old age, the structure similar to card table is used to preprocess the object with reference changes by token analogy
- Interrupt conditions
- Cycles CMSMaxAbortablePrecleanLoops
- The cycle time of CMSMaxAbortablePrecleanTime
- Eden area memory usage than CMSScheduleRemarkEdenPenetration
CMS log
The following is a section of the output log
[GC (CMS Initial Mark) [1 CMS-initial-mark: 68287K(68288K)] 99007K(99008K), 0.0031153 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[CMS-concurrent-mark-start]
[CMS-concurrent-mark: 0.017/0.017 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
[CMS-concurrent-preclean-start]
[CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[CMS-concurrent-abortable-preclean-start]
[CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[GC (CMS Final Remark) [YG occupancy: 30719 K (30720 K)][Rescan (parallel) , 0.0035498 secs][weak refs processing, 0.0000284 secs][class unloading, 0.0003008 secs] [scrub symbol table, 0.0003731 secs] [scrub string table, 0.0001169 secs[1]CMS-remark: 68287K(68288K99007)]K(99008K), 0.0044921 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[CMS-concurrent-sweep-start]
[CMS-concurrent-sweep: 0.011/0.011 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
[CMS-concurrent-reset-start]
[CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Copy the code
CMS problem
- The CPU is sensitive to concurrency and needs to be cleaned up while the user thread is running. If the number of CPU cores is less than four, the CMS has great impact on users.
- Floating garbage CMS concurrent cleaning phase user threads are still running, with the program running, new garbage will be generated continuously, this part of garbage appears after the marking process, CMS cannot deal with the new garbage, can only wait for the next GC to clean, this part of garbage is called floating garbage. In order to save this part of floating garbage, the space usage threshold of the old age in the CMS cannot reach 100%. Generally, it is calculated as 92%
- Memory fragmentsThe mark-clear algorithm (location discontinuities produce fragmentation, but can be paused without) causesDiscontinuous space debris, and discontinuous space fragments need to be accessed one by one using free linked list to find the address that can store newly created objects, but continuous space can be allocated in the way of pointer collision. And because of fragmentation, it is difficult to find contiguous space for large objects. Caused by discontinuous space debris
Promotion Failed
. Serial old is a single-threaded garbage collector that collects garbage from the entire heap. Serial uses a single-thread full pause algorithm to collect garbage from the entire heap, so the pause time is longer than CMS
The JVM tuning
After GC optimization, system efficiency can be improved, and system pauses caused by GC are reduced, thus reducing memory jitter
Generational partitioning of JVM memory
In the generation model, the performance of heap GC is greatly influenced by the size of each partition. How to adjust the size of each partition to the appropriate size, analyzing the size of active data is a good starting point. The size of active data is the size of the heap occupied by long-lived objects when the application is running stably. Is the amount of space taken up by older generations in the heap after Full GC. The size of the active data can be calculated by averaging the size of the old data in the GC log, or by retrieving GC data multiple times after the program has stabilized.
space | A multiple |
---|---|
Total size | 3 to 4 times the active data size |
The new generation | 1 to 1.5 times the active data size |
The old s | Two to three times the active data size |
Permanent generation/meta-space | 1.2-1.5 times the space occupied by permanent generation after Full GC |
If we get active data size of 300M from old GC logs, The size of each partition can be set to total heap :300M * 4 = 1200M New generation :300M * 1.5 = 450M old generation :1200M – 450M = 750M permanent generation :300M * 1.5 = 450M
Extensional Cenozoic
When garbage collection is carried out by the new generation, the replication algorithm will be adopted to copy objects from Eden area to S area. However, before replication, Eden area needs to be scanned to determine whether the objects are alive.According to the image, set the capacity of Eden area as R. A The lifetime is 600ms and the Minor GC interval is 500msT = T1(scan new generation)+T2(copy object)
If the Eden area capacity is expanded to 2R, the survival time of A will still be 600ms(related to our code), and the Minor GC interval will be doubled to 1000ms due to the expansion of Eden area (Eden area is enlarged, objects are doubled, and the number of object copies is enlarged). Although the interval is lengthened, it remains unchanged on the whole), but at this time, the survival time of A is only 600ms, because the time interval is lengthened, resulting in the death of the object, at this pointT = 2T1
. I don’t need to copy it anymore. I don’t have enough at this pointTwo T2
. This reduces the replication time. So, ifCenozoic objects do not live very longCan improve the efficiency of GC. Not much improvement in GC efficiency for long-lived objects, howeverCapacity expansion does not place too much burden on the JVMInstead, the GC interval is lengthened. Therefore, Eden region enlargement can be tuned. After GC optimization, the system efficiency is naturally improved and the system pause caused by GC is reduced.
The JVM avoids the Minor GC scanning the full heap
When we scan objects in Eden area, if the old age exists in Eden areaAcross generations referenceIn this case, it is necessary to determine whether the object in the old age is GC roots. Therefore, GC Roots information needs to be collected in the old age. At this point, a full heap scan is basically done. To reduce whole heap scans, the JVM usesCard table
. Identify the referenced objects as dirty data, and then scan only the objects identified in the new generation and the old generation.Card table: Divides the old space into 512 MB cards.
Cardtable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Cardtable | True | True | false | True | True | True | True | True | True |
As shown in the above table, mark the three marks as false. Since the object is dirty data, the object and the new generation should be scanned during scanning.
Reference material
Ergonomics (oracle.com)
The JVM parameter
[HotSpot VM] Various pitfalls of JVM tuning “standard parameters” – Discussion – High-level language Virtual machine – ITeye Group
Advanced Joint Ventures and GC Tuning (Pivotal. IO)
Java – How do card tables and writer barriers work? – Stack overflow (stackOverflow.com)
java (oracle.com)