JVM garbage collector and performance tuning

This is the 6th day of my participation in the August More Text Challenge

preface

In the previous article, object allocation & garbage collection mechanism, from the creation process of an object to its memory distribution, and through the algorithm of the JVM to determine the mechanism of object survival, through the object survival mechanism, garbage collection and four garbage collection algorithms. This article focuses on the garbage collector in the JVM and tuning with the JVM to gain a basic understanding of the JVM.

To learn to

Common garbage collector
Multithreaded garbage collection STW
CMS garbage collector
The JVM tuning

Before we begin, let’s look at garbage collector types

Garbage collector classification

Throughput (Throughput-first garbage collector)
- Throughput: THE JVM performs GC in a GC thread. When GC is performed, it competes with application threads for the clock cycles of the current CPU, where throughput is referred toApplication threadTake upTotal program timeThe proportion of. We can do this by setting parameters-XX:GCTimeRatioTo control throughput, if the default value is 99, the total GC time is 1% and the throughput is 99%. If set to 19, the GC takes up 1/(19+1) 5% of the time and the throughput is 95%. High throughput make program users feel only application in work, high throughput programs run faster, and the pause time is short, can improve the user experience, will not cause the application to suspend and form the phenomenon of caton, game development, for example, your program partition time recycling garbage, user interface card, once game experience is very poor.
Minimum pause time (responsivity priority garbage collector)
- Pause time: a period of time during which the application thread stops the world by allowing the GC thread to execute, such as 100ms during GC, during which no application is active.
However, high throughput and low pause times are at odds, and GC requires certain preconditions in order to run safely. You must ensure that the application thread does not modify the state of objects while the GC thread determines which objects are reachable. Therefore, the application thread needs to be paused to ensure proper access by the GC thread, and GC is expensive, resulting in pause times and reduced throughput. Therefore, the throughput needs to be increased by minimizing GC as much as possible.

However, only occasionally run GC means that whenever the GC runtime will have a lot of work to do, because in this period accumulation in the number of objects in the heap is very high, a single GC need to spend more time to complete, resulting in a higher average and maximum cost time, therefore, considering the low pause times, the operation of the best frequent GC to finish quickly, This in turn increases overhead and leads to increased throughput.

Common garbage collector

Single-threaded garbage collector

SerialWith Serial Old, you need to pause all programs
- Single-threaded serial New generation replication algorithm
Serial Old
- Single – threaded serial old – age – marker collation algorithm

Multithreaded parallel garbage collector

Parallel ScavengeParallel Old (PS for short) with throughput first
- New generation replication algorithm parallel multithread collector
Parallel Old
- Old age tag collation algorithm parallel multithreaded collector
ParNewMatch the CMS
- New generation of parallel multithreaded collector copy algorithm

Concurrent garbage collector

CMS garbage collectorConcurrent Mark Sweep token cleaning algorithm
- The ParNew garbage collector can be configured separately for older garbage collectors

CMS garbage collector

CMS as a whole consists of four phases initial marking, concurrent marking, re-marking and concurrent cleaning.

Initial tag

The process is short and fast, simply marking the first object that GC Roots can directly associate with

Concurrent tags

The process of GC roots tracing is carried out simultaneously with the user’s application. It takes a long time to mark all objects associated with GC roots and start traversing the whole reachable analysis path, so it adopts concurrency (garbage collector thread and user thread work at the same time).

To mark

Ephemeral, in order to correct the mark record of the part of the object whose mark changed because the user program continued to operate during the concurrent mark, the user program pause time in this phase is slightly longer than the initial mark, but much shorter than the concurrent mark time.

Concurrent remove

In this case, the tag clearing algorithm is used to release unreachable objects without affecting the business thread (only those related to GC Roots can be executed), and without GC Roots, objects (both unreachable objects) may themselves be garbage

Because the CMS garbage collector is based on high user response, in CMS, user program pauses involve only two phases, one initial marking and the other re-marking. While the initial marker is directly influenced by the root, GC roots cannot change it (programmatically). Therefore, we need to pay attention to the re-marking phase. If some of the re-marking can be done ahead of time in concurrent tags, then re-marking does not have to do so much, resulting in less application pause time, and thus reduced system response. And then optimize it

The JVM also does the following to optimize the concurrency markup

Concurrent markup – pre-clean

At a time

In the concurrent marking stage, the business thread is not suspended. If an object from new at this time points to an unmarked object in the old age (garbage), that is, the Eden area of the new generation references an unmarked object in the old age, the old age needs to be marked

When old s reference changes in internal threads (business), we need to tag, use a similar card table structure, the object of reference change records, can avoid the mark phase again for processing (phase to tag must need to find the change of objects), so that the mark phase again not traverse the entire old s object. To optimize.

Concurrent marker – Concurrency can interrupt pre-cleanup

A prerequisite for

The Eden memory usage has reached 2M. Procedure

It is cyclic and interruptible

Preprocessing Eden area, and concurrent preprocessing can be interrupted for the following processing

Processing objects in the From and to sections results in reference changes in older concurrent tags analogous to preprocessing

When the internal reference changes in the old age, the structure similar to card table is used to preprocess the object with reference changes by token analogy

Interrupt conditions
- Cycles CMSMaxAbortablePrecleanLoops
- The cycle time of CMSMaxAbortablePrecleanTime
- Eden area memory usage than CMSScheduleRemarkEdenPenetration

CMS log

The following is a section of the output log

[GC (CMS Initial Mark) [1 CMS-initial-mark: 68287K(68288K)] 99007K(99008K), 0.0031153 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[CMS-concurrent-mark-start]
[CMS-concurrent-mark: 0.017/0.017 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 
[CMS-concurrent-preclean-start]
[CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[CMS-concurrent-abortable-preclean-start]
[CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC (CMS Final Remark) [YG occupancy: 30719 K (30720 K)][Rescan (parallel) , 0.0035498 secs][weak refs processing, 0.0000284 secs][class unloading, 0.0003008 secs] [scrub symbol table, 0.0003731 secs] [scrub string table, 0.0001169 secs[1]CMS-remark: 68287K(68288K99007)]K(99008K), 0.0044921 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[CMS-concurrent-sweep-start]
[CMS-concurrent-sweep: 0.011/0.011 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
[CMS-concurrent-reset-start]
[CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
Copy the code

CMS problem

The CPU is sensitive to concurrency and needs to be cleaned up while the user thread is running. If the number of CPU cores is less than four, the CMS has great impact on users.
Floating garbage CMS concurrent cleaning phase user threads are still running, with the program running, new garbage will be generated continuously, this part of garbage appears after the marking process, CMS cannot deal with the new garbage, can only wait for the next GC to clean, this part of garbage is called floating garbage. In order to save this part of floating garbage, the space usage threshold of the old age in the CMS cannot reach 100%. Generally, it is calculated as 92%
Memory fragmentsThe mark-clear algorithm (location discontinuities produce fragmentation, but can be paused without) causesDiscontinuous space debris, and discontinuous space fragments need to be accessed one by one using free linked list to find the address that can store newly created objects, but continuous space can be allocated in the way of pointer collision. And because of fragmentation, it is difficult to find contiguous space for large objects. Caused by discontinuous space debrisPromotion Failed. Serial old is a single-threaded garbage collector that collects garbage from the entire heap. Serial uses a single-thread full pause algorithm to collect garbage from the entire heap, so the pause time is longer than CMS

The JVM tuning

After GC optimization, system efficiency can be improved, and system pauses caused by GC are reduced, thus reducing memory jitter

Generational partitioning of JVM memory

In the generation model, the performance of heap GC is greatly influenced by the size of each partition. How to adjust the size of each partition to the appropriate size, analyzing the size of active data is a good starting point. The size of active data is the size of the heap occupied by long-lived objects when the application is running stably. Is the amount of space taken up by older generations in the heap after Full GC. The size of the active data can be calculated by averaging the size of the old data in the GC log, or by retrieving GC data multiple times after the program has stabilized.

space	A multiple
Total size	3 to 4 times the active data size
The new generation	1 to 1.5 times the active data size
The old s	Two to three times the active data size
Permanent generation/meta-space	1.2-1.5 times the space occupied by permanent generation after Full GC

If we get active data size of 300M from old GC logs, The size of each partition can be set to total heap :300M * 4 = 1200M New generation :300M * 1.5 = 450M old generation :1200M – 450M = 750M permanent generation :300M * 1.5 = 450M

Extensional Cenozoic

When garbage collection is carried out by the new generation, the replication algorithm will be adopted to copy objects from Eden area to S area. However, before replication, Eden area needs to be scanned to determine whether the objects are alive.According to the image, set the capacity of Eden area as R. A The lifetime is 600ms and the Minor GC interval is 500msT = T1(scan new generation)+T2(copy object)If the Eden area capacity is expanded to 2R, the survival time of A will still be 600ms(related to our code), and the Minor GC interval will be doubled to 1000ms due to the expansion of Eden area (Eden area is enlarged, objects are doubled, and the number of object copies is enlarged). Although the interval is lengthened, it remains unchanged on the whole), but at this time, the survival time of A is only 600ms, because the time interval is lengthened, resulting in the death of the object, at this pointT = 2T1. I don’t need to copy it anymore. I don’t have enough at this pointTwo T2. This reduces the replication time. So, ifCenozoic objects do not live very longCan improve the efficiency of GC. Not much improvement in GC efficiency for long-lived objects, howeverCapacity expansion does not place too much burden on the JVMInstead, the GC interval is lengthened. Therefore, Eden region enlargement can be tuned. After GC optimization, the system efficiency is naturally improved and the system pause caused by GC is reduced.

The JVM avoids the Minor GC scanning the full heap

When we scan objects in Eden area, if the old age exists in Eden areaAcross generations referenceIn this case, it is necessary to determine whether the object in the old age is GC roots. Therefore, GC Roots information needs to be collected in the old age. At this point, a full heap scan is basically done. To reduce whole heap scans, the JVM usesCard table. Identify the referenced objects as dirty data, and then scan only the objects identified in the new generation and the old generation.Card table: Divides the old space into 512 MB cards.

Cardtable	1	2	3	4	5	6	7	8	9
Cardtable	True	True	false	True	True	True	True	True	True

As shown in the above table, mark the three marks as false. Since the object is dirty data, the object and the new generation should be scanned during scanning.

Reference material

Ergonomics (oracle.com)

The JVM parameter

[HotSpot VM] Various pitfalls of JVM tuning “standard parameters” – Discussion – High-level language Virtual machine – ITeye Group

Advanced Joint Ventures and GC Tuning (Pivotal. IO)

Java – How do card tables and writer barriers work? – Stack overflow (stackOverflow.com)

java (oracle.com)