preface

In the last article, we introduced JVM parameters and the use of some common tools provided by the JDK. With an OOM example, we briefly explained how to use tools to analyze dump files. In this article, we will introduce how to analyze GC logs.

The GC logs generated by different garbage collectors generally follow the same rules, albeit slightly differently, but the GC logs generated by the G1 collector are quite different from those generated by other garbage collectors.

When does garbage collection happen

Let’s start with a question: when does garbage collection happen? In Java, GC is done automatically by the JVM, depending on the JVM system environment, so the timing is uncertain. Of course, you can do garbage collection manually, for example by calling the System.gc() method to tell the JVM to do a garbage collection, but you can’t control when it runs. That is, system.gc () is just a notification to collect, and it’s up to the JVM to decide when to collect. Generally, garbage collection will occur in the following situations: 1. When Eden area or S area is insufficient; 2. When old space is insufficient; 3

Note: there may be some people will think method area is not going to happen, garbage collection method area is actually the garbage collection will happen, but in most cases, the method of area after garbage collection efficiency is not high, most of the memory recovery not to drop, so we usually discuss the recycling also only discuss the recycling in pile

How to get GC logs

After a GC occurs, we need to get the GC log first. As mentioned in the previous article about JVM parameter classification and common parameter analysis, we can print the GC log by using the following command:

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:D:\\gc.log
Copy the code

Start service after configuration:



Find the gc.log file. Note that if the log is empty at first, you can open it after the gc has occurred:



The Parallel Exploiter +Parallel Old collector is used as the avenge avenge.

-XX:+PrintCommandLineFlags
1
Copy the code

To print:

PS+PO log analysis

You should be able to read the first three lines: the first line prints the current HotSpot VIRTUAL machine and its version number; The second line prints memory information related to the operating system. The third line prints the parameters of the lock configuration after the Java service is started:

CommandLine flags: -XX:-BytecodeVerificationLocal -XX:-BytecodeVerificationRemote -XX:InitialHeapSize=131854336 -XX:+ManagementServer -XX:MaxHeapSize=2109669376 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:TieredStopAtLevel=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation -XX:+UseParallelGC 
Copy the code

This includes heap space printing, as well as the garbage collector used and the parameters we configured for printing GC logs ourselves.

Here is our GC log starting at line 4, let’s copy lines 4 and 9 for analysis:

// Line 4 2020-08-23T15:35:30.747+ 0800:5.486: [GC (Allocation Failure)] [PSYoungGen: [Times: Times] [Times: Times] [Times: Times] [Times: Times] Sys =0.00, real= 0.00 secs] //第9行 2020-08-23t15:35:34.635 + 0800:9.374: [Full GC (Metadata GC Threshold) [PSYoungGen: 5092K->0K(136192K)] [ParOldGen: 12221K->12686K(63488K)] 17314K->12686K(199680K), [Metaspace: 20660K->20660K(1067008K)], 0.0890985 secs] [Times: User sys = = 0.25 0.00, real = 0.09 secs]Copy the code
  • 1. The first time 2020-08-23T15:35:30.747+0800 represents the time when garbage collection takes place.
  • 2. The following number: 5.486 represents the number of seconds since the Java Virtual machine was started.
  • The next GC (Allocation Failure) indicates the cause of GC. In this case, GC occurs because space Allocation fails.
  • The YoungGen collector is the PSYoungGen collector. The YoungGen collector is the same as the ParNew collector.
  • 5, a number in parentheses 32768K->3799K(37888K) this represents: the current memory area used before GC -> the current memory area used after GC (total memory space). From here, you can see that after a GC, most of the space is reclaimed.
  • 6, the number outside the brackets 32768K->3807K(123904K) Java heap used before GC -> Java heap used after GC -> Java heap used after GC -> Java heap used after GC -> Java heap used after GC It is clear that GC occurs only in the Cenozoic era, why there is space to release in the old age? I don’t know if you still remember that I mentioned in the previous two articles that if the space in ZONE S is insufficient, it will use the guarantee mechanism to borrow space from the old era, so the borrowed space may be released. If you want to know more about the guarantee mechanism in zone S, you can click here.
  • 0.1129986 secs this is the time taken by GC, secs is in seconds.
  • 8, [Times: User =0.02 sys=0.00, real= 0.11secs] Not all garbage collectors print this part. User =0.02 indicates the CPU time consumed in user mode, and sys=0.00 indicates the CPU time consumed in kernel mode and the wall clock time elapsed from start to finish.
  • Parallel Old collector: ParOldGen; Parallel Old collector: Metaspace;
  • FullGC=Minor +Major +Metaspace (PSYoungGen, ParOldGen, Metaspace) Moreover, the comparison in line 9 is very obvious. All of the new generation are recycled, while a small part of the old generation is recycled, while none of the method area is recycled, which also reflects the difference of objects stored in the three areas.

Wall clock time and CPU time

Wall Clock Time includes non-operational wait Time, such as waiting for disk I/O and waiting for threads to block, while CPU Time does not include Time that does not consume CPU.

CMS Log Analysis

We switched the garbage collector to CMS

-XX:+UseConcMarkSweepGC
Copy the code

Note that CMS is also an old-generation collector, and with this parameter the new generation will use the ParNew collector by default

Then restart the service, wait for garbage collection, and open the gc.log file.



The first two lines are the same as above, let’s copy the third line to see if the garbage collector successfully switched:

CommandLine flags: -XX:-BytecodeVerificationLocal -XX:-BytecodeVerificationRemote -XX:InitialHeapSize=131854336 -XX:+ManagementServer -XX:MaxHeapSize=2109669376 -XX:MaxNewSize=348966912 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:TieredStopAtLevel=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:-UseLargePagesIndividualAllocation -XX:+UseParNewGC 
Copy the code

As you can see, both the CMS and ParNew collectors are enabled. Open log line 4:

2020-08-23T17:00:34.728+ 0800:5.259: [GC (Allocation Failure) 2020-08-23T17:00:34.728+ 0800:5.259: [ParNew: 34432K-> 386K (38720K), secs] 34432K-> 386k (124736K), secs] [Times: User sys = = 0.02 0.00, real = 0.02 secs]Copy the code

This is the same information that we have here, which is the new generation is called ParNew. Let’s look at the GC log of the old CMS. Let’s copy a completed GC log of the old CMS:

2020-08-23T17:00:47.650+ 0800:18.182: [GC (CMS Initial Mark) [1 CMS- Initial Mark: Secs] [Times: 30298K] 30298K [122936k] 0.0014342secs] [Times: Sys =0.00, real=0.00 secs] 2020-08-23T17:00:47.651+ 0800-18.183: [CMS-concurrent-mark-start] 2020-08-23T17:00:47.712+ 0800:18.244: [CMS-concurrent-mark: 0.061/0.06secs] [Times: Sys =0.00, real= 0.06secs] 2020-08-23T17:00:47.712+ 0800:18.244: [CMS - concurrent - preclean - start] 2020-08-23 T17:00:47. 714 + 0800:18.245: [CMS - concurrent - preclean: [Times: user=0.00 sys=0.00, real=0.00 secs] 2020-08-23T17:00:47.714+ 0800:18.246: [CMS - concurrent - abortable - preclean - start] 2020-08-23 T17:00:48. 143 + 0800:18.674: [Allocation Failure) 2020-08-23T17:00:48.143+ 0800:18.674: [ParNew: 38720K->4111K(38720K), secs] 389018K ->38573K(124736K), secs] [Times: [cmS-concurrent-abortable-preclean: user=0.06 sys=0.00, real= 0.01secs] 2020-08-23T17:00:48.451+ 0800:18.983: Secs] [Times: user= sys= Times, real= Times] 2020-08-23T17:00:48.451+ 0800:18.983: [GC (CMS Final Remark) [YG occupancy: 23345K (38720K)]2020-08-23T17:00:48.451+ 08000:18.983: [Rescan (Parallel), 0.0046112 SECs]2020-08-23T17:00:48.456+ 0800:18.987: [Weak Refs Processing, 0.0006259 secs]2020-08-23T17:00:48.457+ 08000:18.988: 2, Class class TBM class class [Scrub Symbol Table, 0.0092387 secs]2020-08-23T17:00:48.472+ 08000:19.004: Scrub scrub [Scrub String table, 0.0006408 secs][1 CMS-remark: 34461K(86016K)] 57806K(124736K), 0.0219024 secs][Times: scrub scrub User =0.05 sys=0.01, real=0.02 secs] 2020-08-23T17:00:48.473+ 0800:19.005: [CMS-concurrent-sweep-start] 2020-08-23T17:00:48.489+ 0800:19.020: [CMS-concurrent-sweep: 0.015/0.015secs] [Times: User =0.01 sys=0.00, real=0.02 secs] 2020-08-23T17:00:48.489+ 0800:19.020: [cmS-concurrent-reset-start] 2020-08-23T17:00:48.492+ 0800:19.023: [CMs-concurrent-reset: 0.003/0.003 secs] [Times: User sys = = 0.00 0.00, real = 0.00 secs]Copy the code

If you don’t know how the CMS garbage collector works, you can check it out here, as it will be covered later.

  • 1. CMS Initial Mark corresponds to the first step of CMS working mechanism Initial Mark, which is mainly to find GCRoot objects
  • 2. 10443K(86016K) in the brackets indicates the size of the current area and the total space
  • 3. 13477K(124736K) outside the brackets represents the size of the used space in the heap and the total space in the heap
  • Cms-concurrent-mark-start corresponds to the second step of the CMS working mechanism concurrency flag. This phase is mainly about traversing the entire reference chain based on the GCRoot object
  • Cms-concurrent-mark: 0.052/0.052 secs, where the first time indicates the duration of the CPU and wall clock
  • Cms-concurrent-preclean and CMs-concurrent-abortable-preclean correspond to the preclean and interruptible preclean phases
  • 7. Cms-concurrent-sweep-start corresponds to the final mark, and STW is required at this stage. As can be seen, a Young GC occurs before this stage to reduce STW time.
  • Cms-concurrent-sweep clears concurrent garbage, and cms-concurrent-reset resets threads

G1 Log Analysis

Switch to G1 garbage collector:

-XX:+UseG1GC 
Copy the code

Then restart the service and wait for garbage collection to happen to open the gc.log file:



As you can see, this file is much more complex than other garbage collectors. Let’s first look at line 3 to verify that the G1 collector is in use:

CommandLine flags: -XX:-BytecodeVerificationLocal -XX:-BytecodeVerificationRemote -XX:InitialHeapSize=131854336 -XX:+ManagementServer -XX:MaxHeapSize=2109669376 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:TieredStopAtLevel=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:-UseLargePagesIndividualAllocation 
Copy the code

You can see that the G1 collector is indeed used. Let’s find a complete G1 log and copy it:

2020-08-23T18:44:39.787+ 0800:2.808: [GC Pause (G1 Evacuation Pause) (young), 0.0029103 secs] [Parallel Time: [GC Worker Start (ms): Min: 2807.7, Avg: 2807.8, Max: 2807.8, Diff: 0.1] [Ext Root Scanning (MS): Min: 0.3, Avg: 0.6, Max: 0.8, Diff: 0.5, Sum: 2.2] [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0] [Processed Buffers: Min: 0, Max: 0, Diff: 0, Sum: 0] 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Code Root Scanning (MS): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Code Root Scanning (MS): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] 0.0] [Termination (ms): Min: 0.9, Avg: 1.2, Max: 1.4, Diff: 0.5, Sum: 4.6] [Termination (MS): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 10] Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2] [GC Worker Total (ms): Min: 1.7, Avg: 1.8, Max: 1.8, Diff: 0.1, Sum: 0.1] 7.1] [GC Worker End (ms): Min: 23995, Avg: 23995, Max: 23995, Diff: 0.0] [Code Root Fixup: 3.95 ms] [Code Root Purge: 3.95] 0.0 ms] [Clear CT: 0.1 ms] [Other: 1.0 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.8ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] 6144.0K(6144.0K)-> 0.0b (12.0m) Survivors: 0.0b ->1024.0K Heap: 6144.0K(126.0m)->1520.0K(126.0m)] [Times: User sys = = 0.00 0.00, real = 0.00 secs]Copy the code

[GC Pause (G1 Evacuation Pause) (young), 0.0029103 secs] Indicates that the area where GC occurs is the Young area, followed by the total elapsed time. Note: although G1 physically removes the partitioning, it still retains it logically, so the logs will still show young and Full GC will be mixed. [Parallel Time: 1.9ms, GC Workers: 4] [GC Worker Start (ms): Min: 2807.7, Avg: 2807.8, Max: 2807.8, Diff: 0.1] This represents the minimum, average and maximum start times of each parallel thread and the time difference (max-min). The following are some specific execution steps, which are not explained here line by line. If you are interested, you can click here to understand. Here is a very detailed explanation, but it is in English, but it should be generally understandable:

Analyze GC logs with tools

Although we can read the GC log from the log, there are two points that we focus on when tuning:

  • Throughput = Running user code time /(running user code time +GC time)
  • 2, GC Pause Time Stop The World Time

It would be hard to see these two metrics directly from the GC log, and I think it would be a nightmare if we had to rely on our own calculations to confirm the problem. So again, we need tools to help us analyze, and here are two common tools.

gceasy

  • 1, open the official website address: gceasy.io/
  • 2. Upload GC logs



    Then you can go to the main page:



    Throughput and GC pause times have been tallies for us, as well as other metrics, so we can compare metrics to determine which collector is right for our system.

GCViewer

  • 1. Download the GCViewer JAR package
  • 2. Run the Java -jar gcviewer -1.36-snapshot. jar command



    Open the main interface:



    Click File – > Open the File



    You can see throughput statistics in the first Summary Summary on the right.

    Switch to the third label Pause:



    You can view statistics of various pause times.

conclusion

This article focuses on how GC logs for common garbage collectors should be analyzed, and introduces two common tools to help you analyze GC logs more intuitively.

In the next article, we will cover some classic interview questions that are common but not mentioned in the previous article.

** Please pay attention to me, learn and progress together **