sequence

This paper mainly analyzes a case of frequent GC (Allocation Failure) and a long time for young GC.

symptoms

  • Gc throughput Percent gradually decreases from the general 99.96% to below 99% and into 98%, and the lowest point can reach 94%
  • Young gc time increases gradually, from the normal tens of milliseconds to more than 50, and then to more than 100,150,200,250
  • Over a period of 8.5 days, more than 9000 GCS occurred, including 4 full GCS with an average of nearly 8 seconds, most of which were Young GCS (Allocation failure is given priority to), with an average of more than 270 milliseconds and a maximum of nearly 7 seconds
  • The average object creation rate was 10.63 MB/SEC, the average promotion rate was 2 KB/SEC, and CPU usage was normal with no significant spike

The JVM parameter

-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:+UseAdaptiveSizePolicy -XX:MaxHeapSize=2147483648 -XX:MaxNewSize=1073741824 -XX:NewSize=1073741824 -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
Copy the code

JDK version

java -version
java version "1.8.0 comes with _66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
Copy the code

full gc

27.066: [Full GC (Metadata GC Threshold) [PSYoungGen: 19211K->0K(917504K)] [ParOldGen: 80K->18440K(1048576K)] 19291K->18440K(1966080K), [Metaspace: 20943K->20943K(1069056K)], 0.5005658 secs] [Times: User =0.24 sys=0.01, real= 0.50secs] 100.675: [Full GC (Metadata GC Threshold) [PSYoungGen: 14699K->0K(917504K)] [ParOldGen: 18464K->23826K(1048576K)] 33164K->23826K(1966080K), [Metaspace: Secs] [Times: user= syssys = 0.04, real= 0.05secs] [Full GC (Metadata GC Threshold) [PSYoungGen: 24843K->0K(1022464K)] [ParOldGen: 30048K->44782K(1048576K)] 3004892K ->44782K(2071040K), [Metaspace: 58220K->58220K(1101824K)], 3.7936515 secs] [Times: User =1.86 sys=0.02, real= 3.79secs] 242605.669: [Full GC (Ergonomics) [PSYoungGen: 67276K->0K(882688K)] [ParOldGen: 1042358K->117634K(1048576K)] 1109635K->117634K(1931264K), [Metaspace: [Times: user=2.50 sys=3.51, real= 2.16secs]Copy the code

The first three full GCS were caused by the Metadata GC Threshold, and only the last one was caused by Ergonomics.

Full GC (Metadata GC Threshold)

Java8 is used here, and the parameter does not explicitly specify the size and upper limit of metaspace. Take a look

Jstat -gcmetacapacity 7 MCMN MCMX MC CCSMN CCSMX CCSC YGC FGC FGCT GCT 0.0 1136640.0 99456.0 0.0 1048576.0 12160.0 38009 16 275.801 14361.992Copy the code
  • Ignore the following FGC, because only a quarter of the logs are parsed
  • MCMX(Maximum metaspace capacity (kB)) has a Full GC Threshold. The Maximum metaspace capacity (kB) has a Full GC Threshold.

Related parameters

  • -xx :MetaspaceSize, the initial size of the space (also the initial threshold, i.e. the initial high-water-mark), which triggers garbage collection for type offloading and is adjusted by GC: if a large amount of space is freed, the value is appropriately lowered; If very little space is freed, increase this value appropriately until MaxMetaspaceSize is exceeded.
  • -xx :MaxMetaspaceSize. The maximum space is unlimited by default, depending on the capacity of the local system.
  • -xx :MinMetaspaceFreeRatio, the minimum percentage of Metaspace free space capacity after GC (That is, the maximum size of metadata allocated to the current size), if the idle ratio is less than this parameter (That is, the maximum occupancy size is exceeded), meta Space will be expanded.
  • -xx :MaxMetaspaceFreeRatio, the maximum percentage of Metaspace free space capacity after GC (That is, the minimum size of metadata allocated to the current size), if the idle ratio is greater than this parameter (That is, less than the minimum occupation size), then meta Space will be scaled down.

Since there is no setting, the default value on the machine is:

java -XX:+PrintFlagsFinal | grep Meta
    uintx InitialBootClassLoaderMetaspaceSize       = 4194304                             {product}
    uintx MaxMetaspaceExpansion                     = 5451776                             {product}
    uintx MaxMetaspaceFreeRatio                     = 70                                  {product}
    uintx MaxMetaspaceSize                          = 18446744073709547520                    {product}
    uintx MetaspaceSize                             = 21807104                            {pd product}
    uintx MinMetaspaceExpansion                     = 339968                              {product}
    uintx MinMetaspaceFreeRatio                     = 40                                  {product}
     bool TraceMetadataHumongousAllocation          = false                               {product}
     bool UseLargePagesInMetaspace                  = false                               {product}
Copy the code

MinMetaspaceFreeRatio = 40, MaxMetaspaceFreeRatio = 70, MetaspaceSize = 20M, and Full Metadata GC Threshold (Metadata GC Threshold) is divided into three times

  • The first time, [Metaspace: 20943K->20943K(1069056K)]
  • Second, [Metaspace: 34777K->34777K(1081344K)]
  • The third time, [Metaspace: 58220K->58220K(1101824K)]

As you can see, the metaspace threshold is constantly adjusted dynamically. As for the logic of the adjustment, it is not mentioned in the official documentation. As long as the Max value is not exceeded, there is no fatal effect, but for low-latency applications, it is important to avoid gc time due to dynamic adjustment, which can be solved by tuning calculations and setting initial thresholds.

Full GC (Ergonomics)

The reason for full GC is Ergonomics, because UseAdaptiveSizePolicy is enabled and the JVM ADAPTS itself to the full GC

GC (Allocation Failure)

After analyzing the full GC, we can look at the young GC and see that 99% of the log is caused by Allocation Failure (GC). Allocation Failure Indicates the minor GC that requests space from young generation(Eden) for a new object, but young generation(Eden) does not have enough suitable space left for the desired size.

-XX:+PrintTenuringDistribution

Desired survivor size 75497472 bytes, new threshold 2 (max 15)
- age   1:   68407384 bytes,   68407384 total
- age   2:   12494576 bytes,   80901960 total
- age   3:      79376 bytes,   80981336 total
- age   4:    2904256 bytes,   83885592 total
Copy the code
  • This Desired survivor size indicates that the maximum size that a survivor region is allowed to hold is 75497472 bytes
  • The following object list is the age distribution of survivor objects after gc. The total size is 83885592 > 75497472, and age1 is 68407384 < 75497472, so the new threshold changes to 2(On the next GC). If the next GC does not release objects, objects exceeding the threshold will be promoted to the old generation.

Age, the list is empty

59.463: [Allocation Failure (GC) Desired survivor size 134217728 bytes, new threshold 7 (Max 15) [PSYoungGen: [Times: user=0.10 sys=0.01, real= 0.20secs] [Times: user=0.10 sys=0.01, real= 0.20secs]Copy the code

The Desired survivor size line does not have a distribution of age objects, which means that after the gc, there are no surviving objects in the survivor region whose age is less than Max threshold. There is no output, indicating that no surviving objects can be copied to a new survivor zone after this GC collection.

Example of survivor having an object after GC

Jstat -gcutil-h10 7 10000 10000 S0 S1 E O M CCS YGC YGCT FGC FGCT GCT 0.00 99.99 90.38 29.82 97.84 96.99 413 158.501 4 14.597 173.098 11.60 0.00 76.00 29.83 97.84 96.99 414 158.511 4 14.597 173.109 11.60 0.00 77.16 29.83 97.84 96.99 414 158.511 4 14.597 173.109 0.00 13.67 60.04 29.83 97.84 96.99 415 158.578 4 14.597 173.176 0.00 13.67 61.05 29.83 97.84 96.99 415 158.578 4 14.597 173.176Copy the code
  • Before YGC young generation = Eden + S1; After yGC, young generation = Eden + S0
  • After YGC, it can be seen that the space of old generation remains unchanged, indicating that no object is promoted to old generation in this YGC.
  • After GC, the surviving object is moved to another survivor region
  • Since sampling is performed every 10 seconds, there is a delay, that is, after GC, new objects are allocated in Eden area immediately, so the Eden area seen here is occupied by objects.

real time > usr time + sys time

722914.974: [Allocation Failure (GC) Desired survivor size 109576192 bytes, new threshold 15 (Max 15) [PSYoungGen: [Times: Times: 32828k -> 32828k] 32828K -> 32828k (32828k)] 32828k -> 32828k (32828k), 328289secs] [Times: 32828k] User =0.03 sys=0.00, real= 0.03secs] 722975.207: [GC (Allocation Failure) Desired survivor size 103284736 bytes, new threshold 15 (max 15) [PSYoungGen: [Times: user= 0.17sys =0.00, real= 0.17secs]Copy the code

There are more than nearly 300 GC real time times greater than usr time + SYS time.

  • Real: refers to the WallClock Time elapsed from the start to the end of the operation.
  • User: indicates the CPU time consumed in user mode.
  • Sys: refers to CPU time consumed by kernel mode.

CPU time does not include all kinds of non-computational wait time, such as waiting for disk I/O and waiting for threads to block, but when a system has multiple cpus or cores, multithreading adds up to this CPU time, so it is perfectly normal to see user or SYS times exceed real time.

User + sys is the actual time spent by the CPU. Note that this value counts the time spent on all cpus. If a process is running in a multithreaded environment, the time spent by multiple threads is added to this value, which would exceed the value recorded by real.

Over 300 real time times greater than usr time + SYS time indicate two possible problems, one is IO intensive and the other is insufficient CPU.

New generation of garbage recycling mechanisms

  • The new object will try to allocate space on the stack. If it fails, TLAB will try to allocate space on the old generation.- XX: PretenureSizeThreshold large object directly into the old generation threshold, when the object size exceeds this value, will be directly allocated in the older generation.). If not, we will finally apply for space in Eden
  • Ask Eden for space to create a new object. Eden does not have the right space, so minor GC is triggered
  • The MINOR GC processes surviving objects from the Eden region and from survivor region
    • If these objects reach the age threshold, they are directly promoted to the old generation
    • If the object to be copied is too large, it is not copied to To survivor, but directly into the aged generation
    • If there is insufficient space in the to survivor region or insufficient space during replication, survivor overflow occurs and the aged generation is directly entered
    • Otherwise, if there is enough space in the to Survivor region, the surviving object is copied to the TO Survivor region
  • In this case, the remaining objects in Eden area and from survivor area are garbage objects, which are directly erased and recycled. The released space becomes the new space that can be allocated
  • After minor GC, if Eden space is sufficient, the new object allocates space in Eden. If Eden space is still insufficient, the new object allocates space directly in the old generation

summary

From the above analysis, it can be seen that young generation seems to be a bit large, with a long YGC time; In addition, the survivor space is basically empty after each YGC, indicating that the new object is generated quickly and its life cycle is short. The survivor space originally designed was not used. Consider reducing the size of Young Generation, or try G1 instead.

About XX: + PrintTenuringDistribution several main points, to clarify:

  • Which region of the object distribution of this print (survivor)
  • Whether to print before or after GC (Print after GC)
  • Whether the age of a newborn object is 0 or 1 when it first reaches survivor

The age of the object is the number of minorgs he has experienced. When the object is first assigned, its age is 0. If the object has not been collected after the first MinorGC, its age is +1. So the object is 1 when it first enters a survivor zone.

  • Promotion threshold (New Threshold) is dynamically adjusted

If the total size of the underlying age is greater than the Desired survivor size, then the survivor space overflows and is filled, and the threshold is recalculated.

doc

  • jstat
  • Size of Huge Objects directly allocated to Old Generation
  • Java object allocation brief flow
  • Remember the JVM optimization process once
  • Survivor Space overflow instance
  • Java garbage collection log, why from and to size different?
  • Useful JVM Flags — Part 5 (Young Generation Garbage Collection)
  • JDK-6453675 : Request for documentation of -XX:+PrintTenuringDistribution output
  • How to read the output of +PrintTenuringDistribution
  • One GC Tuning Notes
  • Metaspace for FullGC in JDK8
  • Where did Java PermGen go?
  • Metaspace
  • The pit of MetaspaceSize
  • Metaspace decryption for JVM source code analysis
  • About G1 Garbage Collector, Permanent Generation and Metaspace
  • Talk about PermGen and Metaspace for the JVM
  • GC LOGGING — USER, SYS, REAL — WHICH TIME TO USE? & GANDHI
  • REAL TIME IS GREATER THAN USER AND SYS TIME
  • GC log time analysis: user + sys < real
  • What is promotion rate?