This is the most comprehensive JVM tuning guide

Q: Do you know the Java memory model?

Which version of the JDK are you referring to?

This article is from mashibing.com, which enriches the original text with technical details and illustrations.

Learn things to learn context first! – horse soldiers

GC basics

1. What is garbage

Memory application in C: malloc free

C + + : new delete

C /C++ manually reclaim memory

Java: new?

Automatic memory reclamation, simple programming, the system is not prone to error, manual release of memory, easy to two types of problems:

Forget about recycling

Many recycling

An object or objects to which there is no reference (circular reference)

2. How to locate garbage

ReferenceCount is not used in Java

Add one for each reference, subtract one for each reference, and judge it as garbage when it reaches 0.
RootSearching algorithm, currently used

For example, starting from root, the references are connected in a line. What is online is not garbage, and what is not online is garbage.

What can be used as root:
- Classes loaded and objects created by the bootstrap loader;
- Objects referenced in JavaStack (objects referenced in stack memory);
- The object to which the static reference points in the method area;
- The object to which the constant reference in the method area points;
- Objects referenced by JNI in Native methods.

3. Common garbage collection algorithms

Mark sweep – Location discontinuities generate low fragmentation efficiency (two scans), old age recovery

The cleaning effect is shown in the figure:

Disadvantages: Need to scan the entire heap area, large time overhead. If you’re interested, check out the PPT I’ve uploaded, which is an improved version of mark-purge.

Reference: blog.csdn.net/m0_37860933…
Copying algorithms – No debris, waste of space, new generation recycling

In the heap area, objects are divided into new generation (young generation), old generation and immortal generation, and the replication algorithm occurs in the new generation. Newly built objects are generally allocated to Eden area of the new generation. When Eden is nearly full, a small garbage collection will be carried out. Surviving objects are moved to Survivor1 (S1).

When GC occurs again, live objects in S1 are copied to previously idle S2 with a lifetime of +1; For each subsequent GC, S1 and S2 will alternate as storage and idle areas for the live object. And if the life of the living object reaches a certain threshold, it will be assigned to the old age.

Note that there are no older concepts in JDK8, instead metaspace is used. If you are interested, please refer to JDK8 Metaspace tuning
Mark Compact – No fragmentation, low efficiency (two scans, Pointers need to be adjusted), old collection

The general idea is similar to mark-purge

(1) In the marking stage, all reachable (directly or indirectly accessible) objects are marked by the root node, which is similar to the mark-purge method;

(2) In the clearance phase, the last round of surviving objects are compressed to one end of memory, and then the boundary is cleaned. (This reduces memory fragmentation and avoids running out of space when allocating large objects)

Cleaning effect as shown in figure!

Comparison of several algorithms

Comparison between the copy algorithm and the mark-compression method:

The copy-collection algorithm will perform more replication operations when the object survival rate is high, and the efficiency will be low. More importantly, if you don’t want to waste 50% of the space, you need to allocate extra space for guarantee, in order to cope with the extreme case that all objects in the memory used are 100% alive, so the replication algorithm is only used in the new generation, and the old generation generally can not directly choose this algorithm, using mark-compression method.

Comparison between mark-compression and mark-clearance:

Typical of the old days, a “mark-compact” algorithm was proposed, in which the marking process was still the same as the “mark-clean” algorithm, but instead of cleaning up the recyclable objects directly, the next step was to move all surviving objects to one end and then clean up the memory directly beyond the end boundary. It also reduces memory fragmentation.

4. Common garbage collectors

Garbage collector metric parameters

(A) Throughput

The ratio of CPU time spent running user code to total CPU consumption;

Throughput = run user code time/(run user code time + garbage collection time);

High throughput means less garbage collection time and longer runtime for user code;

(B) Desired target of garbage collector (concerns)

Pause time

Shorter pauses are suitable for applications that need to interact with the user;

Good response speed can improve user experience;

throughput

High throughput can efficiently use THE CPU time to complete the computation task as soon as possible;

Mainly suitable for tasks that do not require much interaction in the background;

Footprint

Minimize the memory space of the heap while achieving the first two goals;

Better spatial locality can be obtained;
STW

Stop-the-word: Refers to the interruption time when the business process is stopped and the garbage collector is started for garbage collection
PS and PN difference read on: forced docs.oracle.com/en/java/jav…
Garbage collector versus memory size

Serial dozens

PS hundreds of megabytes – several gigabytes (JDK8)

CMS – 20G

G1 – hundreds of grams

Zgc-4t-16t (JDK13)
The difference between concurrent and parallel garbage collection

(A) Parallel

Refers to multiple garbage collection threads working in parallel while the user thread is still in a waiting state.

Be the ParNew, Parallel Avenge, Parallel Old;

(B) Concurrent

Refers to the simultaneous execution of the user thread and the garbage collection thread (but not necessarily in parallel and may be executed alternately);

The user program continues to run, while the garbage collector thread runs on another CPU, making no distinction between the new generation and the old generation.

Such as CMS, G1 (also parallel);
The difference between Minor and Full GC

(A) Minor GC

Also known as Cenozoic GC, refers to the garbage collection in the new generation.

Because Java objects are mostly ephemeral, Minor GC is frequent and generally fast;

(B) Full GC

Also known as Major GC or old GC, refers to GC occurring in the old age;

The occurrence of a Full GC is often accompanied by at least one Minor GC (not always, the Parallel Sacvenge collector can optionally set a Major GC policy);

Major GC is typically 10 times slower than Minor GC.

The evolution of garbage collector

Parallel algorithm Serial algorithm dozens of megabytes Parallel algorithm several G CMS Dozens of G – Start concurrent reclaim -.- Tricolor mark –

In order to improve efficiency, PS was born. In order to cooperate with CMS, PN was born. CMS was introduced later in version 1.4. CMS is a landmark GC, which opens the process of concurrent collection. So any current JDK version defaults to CMS concurrent garbage collection because you can’t stand STW

Serial Young generation Serial reclamation

For the New generation;

Copy algorithm is adopted;

Single-thread collection;

When garbage collection is done, all worker threads must be paused until complete;

“Stop The World”;

SerialOld

Serial corresponds to the old age collector

A “mark-sweep-compact” algorithm (and compression, mark-sweep-compact);

Single-thread collection;

The ParNew young generation is usually collected in parallel with the CMS

The ParNew garbage collector is a multithreaded version of the Serial collector.

The ParNew/Serial Old collector runs as follows:

The characteristics of

With the exception of multithreading, the behavior and characteristics of the Serial collector are the same. For example, Serial collector can control parameters, collection algorithm, Stop The World, memory allocation rules, reclaim strategy, etc. The two collectors share a lot of code; In Server mode, the ParNew collector is a very important collector because it is currently the only one besides Serial that works with the CMS collector; However, in a single CPU environment, it is no better than the Serail collector because of the thread interaction overhead.
Set parameters: “-xx :+UseConcMarkSweepGC” : After CMS is specified, ParNew will be used as the new generation collector by default. “-xx :+UseParNewGC” : specifies ParNew forcibly. “-xx :ParallelGCThreads” : specifies the number of garbage collection threads. ParNew enables the same number of garbage collection threads as the CPU by default.
Why only ParNew works with the CMS collector: CMS is HotSpot’s first truly Concurrent collector in JDK1.5, the first to allow garbage collection threads to work (basically) with user threads; CMS is an older collector that does not work as Parallel Scavenge with JDK1.4. The Parallel Insane (and G1) do not use the traditional GC collector code framework and are implemented independently; The other collectors share some of the framework code;

Avenge the younger generation

The Parallel Collector is also known as a Throughput Collector because of its affinity to the Throughput.

(A) The ParNew collector has some characteristics similar to those of the ParNew collector

Cenozoic collector;

Copy algorithm is adopted;

Multithreaded collection;

(B) The main feature is that its focus is different from other collectors

Collectors such as CMS focus on minimizing the pause time of user threads during garbage collection;

The goal of the Parallel Insane is to achieve a controlled Throughput.

ParallelOld

Parallel is the default collector for java8

For the old age;

“Mark-collation” algorithm is adopted.

Multithreaded collection;

The Parallel Avenge /Parallel Old collector is illustrated as follows:

Set the parameters

“-xx :+UseParallelOldGC” : Specifies the use of ParallelOld collector;

ConcurrentMarkSweep old age concurrent reclaim

Garbage collection and application run simultaneously, reducing STW time (200ms) CMS problems are many, so now no version of the default CMS, you have to manually specify CMS since it is MarkSweep, there will be fragmentation problems, fragmentation reaches a certain point, the CMS old age allocation object can not allocate time, Use SerialOld for age collection

Imagine: PS + PO -> Add memory for garbage collector -> PN + CMS + SerialOld (STW of a few hours – a few days)

Dozens of gigabytes of memory, single thread collection -> G1 + FGC

Dozens of G -> on T memory server ZGC algorithm: tri-colored markup + Incremental Update

CMS

A Concurrent Mark Sweep (CMS) Collector is also called a Concurrent Low Pause Collector or low-latency garbage Collector;

The characteristics of

For the old age;

Based on the “mark-clean” algorithm (no compression operation, memory fragmentation);

To obtain the shortest recovery pause time as the goal;

Concurrent collection, low pause;

Requires more memory (see disadvantages below);

HotSpot is the first truly Concurrent collector in JDK1.5.

For the first time, garbage collection threads work (basically) at the same time as user threads;

Floating garbage cannot be processed and a “Concurrent Mode Failure” may occur

steps

(A) CMS Initial Mark

Mark only objects to which GC Roots can be directly associated;

Very fast;

But you need to “Stop The World”;

(B) CMS Concurrent Mark

GC Roots Tracing process;

The surviving object is marked in the collection just generated;

The application is also running;

There is no guarantee that all live objects will be marked;

(C) re-marking (CMS remark)

To correct the mark record of that part of the object whose mark changes because the user program continues to operate during concurrent marking;

You need to “Stop The World”, and The pause is slightly longer than The initial tag, but much shorter than The concurrent tag;

Using multi-thread parallel execution to improve efficiency;

(D) CMS Concurrent sweep

Recycle all garbage objects;

G1(200ms – 10ms)

Algorithm: three color mark + SATB

G1 (garbage-first) is the commercially available collector of JDK7-U4.

The characteristics of

(A) Parallelism and concurrency

Can make full use of multi-CPU, multi-core environment hardware advantages;

Can be parallel to shorten The “Stop The World” pause time;

You can also have garbage collection run concurrently with the user program;

(B) Collection by generation, including Cenozoic and old age

The ability to manage the entire GC heap (young and old) independently without needing to be paired with other collectors;

Being able to deal with objects of different eras in different ways;

While the generational concept remains, the memory layout of the Java heap varies considerably;

Divide the whole heap into independent regions of equal size.

Cenozoic and oleozoic are no longer physically separate; they are collections of regions (which do not need to be continuous);

(C) Combined with a variety of garbage collection algorithms, spatial integration, no debris generation

As a whole, it is based on mark-collation algorithm.

Locally (between two regions), it is based on the replication algorithm.

This is an implementation of a train-like algorithm;

Will not generate memory fragmentation, is conducive to a long time running;

(D) Predictable pauses: high throughput with low pauses

G1 can not only pursue low pause, but also build a predictable pause time model.

You can explicitly specify that within a time slice of M milliseconds, garbage collection takes no more than N milliseconds.
Application scenarios

Service-oriented applications, for machines with large memory, multi-processor;

The primary application is to provide a solution for applications that require low GC latency and have a large heap;

For example, when the heap size is about 6GB or larger, predictable pause times can be less than 0.5 seconds;

To replace the CMS collector in JDK1.5;

G1 may be better than CMS when:

(1) More than 50% of the Java heap is occupied by active data;

(2) The frequency of object allocation or chronological lifting varies greatly;

(3) GC pause time is too long (longer than 0.5 to 1 second).

If there are no problems with the current collector, don’t rush to G1;

If your application is looking for low pauses, try G1;

Whether or not to replace the CMS requires actual scenario testing.
Set the parameters

“-xx :+UseG1GC” : specifies to use G1 collector;

“- XX: InitiatingHeapOccupancyPercent” : when the Java heap utilization rate of parameter values, began to concurrent mark phase; The default value is 45.

“-xx :MaxGCPauseMillis” : sets the pause time target for G1. The default value is 200 ms.

“-xx :G1HeapRegionSize” : set the Region size, ranging from 1MB to 32MB. The goal is to have about 2048 regions at the minimum Java heap;

Running steps

(A) Initial Marking

Mark only objects to which GC Roots can be directly associated;

Next Top at Mark Start (TAMS) is modified so that when the Next stage is run concurrently, the user program can create new objects in the correct available Region.

You need to “Stop The World”, but very fast;

(B) Concurrent Marking

GC Roots Tracing process;

The surviving object is marked in the collection just generated;

It takes longer, but the application is running;

There is no guarantee that all live objects will be marked;

(C) Final Marking

To correct the mark record of that part of the object whose mark changes because the user program continues to operate during concurrent marking;

Changes made to objects in the last phase are recorded in the thread Remembered Set Log.

Merge the Remembered Set Log into the Remembered Set;

You need to “Stop The World”, and The pause is slightly longer than The initial tag, but much shorter than The concurrent tag;

Using multi-thread parallel execution to improve efficiency;

(D), Live Data Counting and Evacuation

Firstly, the recovery value and cost of each Region are sorted.

Then make a collection plan based on the expected GC pause time of the user;

Finally, recycle garbage objects in some high-value regions according to plan;

The “copy” algorithm is used to copy living objects from one or more regions to another empty Region on the heap, and compress and release memory in the process.

Can be done concurrently, reducing pause times, and increasing throughput;

ZGC (10ms – 1ms) PK C++

Algorithm: ColoredPointers + LoadBarrier

Shenandoah

Algorithm: ColoredPointers + WriteBarrier

Eplison

5.JVM memory generation model (for generational garbage collection algorithm)

If someone asks you: JVM memory management model? Just say: What kind of garbage collector are you referring to?

The model used by part of the garbage collector

All GC except Epsilon ZGC Shenandoah use logical generation model

G1 is logical generation, not physical generation

In addition not only logical generation, but also physical generation
New Generation + Old age + Permanent Generation (1.7) Perm Generation/ metadata area (1.8) Metaspace
1. Permanent metadata – Class
2. Permanent generation must specify a size limit, metadata can be set or not set, no upper limit (limited by physical memory)
3. String constants 1.7 – permanent generation, 1.8 – heap
4. MethodArea Logical concepts – persistent generation, metadata
Cenozoic = Eden + 2 Suvivor regions
1. After YGC is collected, most of the objects are collected and enter S0 alive
2. Again YGC, the living object Eden + s0 -> s1
3. YGC again, Eden + S1 -> S0
4. Old enough -> Old age (15 CMS 6)
5. It won’t fit in section S -> Old age
The old s
1. diehard
2. The old age is FGC Full GC
GC Tuning (Generation)
1. Minimize FGC
2. MinorGC = YGC
3. MajorGC = FGC
Object allocation process diagram

New object to see if there is an escape analysis, it doesn’t exist and can be replaced by a scalar and put on the stack.
Dynamic age: (not important) www.jianshu.com/p/989d3b06a…
Distribution during the period of guarantee: (not important) YGC survivor area space is not enough space guarantee directly into old age Reference: cloud.tencent.com/developer/a…

Common garbage collector combination parameter Settings :(1.8)

-XX:+UseSerialGC = Serial New (DefNew) + Serial Old
- Small programs. This is not the default option, and HotSpot automatically selects the collector based on calculation and configuration and JDK version
-XX:+UseParNewGC = ParNew + SerialOld
- This combination is rarely used (deprecated in some versions)
- Stackoverflow.com/questions/3…
-XX:+UseConc

(urrent)

MarkSweepGC = ParNew + CMS + Serial Old
-xx :+UseParallelGC = elinsane + elinsane (insane)
-XX:+UseParallelOldGC = Parallel Scavenge + Parallel Old
-XX:+UseG1GC = G1
Linux does not find a way to view the default GC, while Windows prints UseParallelGC
- java +XX:+PrintCommandLineFlags -version
- Tell by GC logs
What is the default garbage collector for Linux 1.8?
- 1.8.0_181 默认（看不出来）Copy MarkCompact
- 1.8.0_222 Default PS + PO

The first step in JVM tuning is to learn about common JVM command-line arguments

The JVM command-line parameters reference: docs.oracle.com/javase/8/do…
HotSpot Parameter classification

Standard: – At the beginning, all HotSpot support

Non-standard: starting with -x, certain versions of HotSpot support certain commands

Unstable: -xx starts, may be cancelled in the next version

java -version

java -X

Java -xx :+PrintFlagsWithComments // only debug version works

Test procedure:
```
import java.util.List; import java.util.LinkedList; public class HelloGC { public static void main(String[] args) { System.out.println("HelloGC!" ); List list = new LinkedList(); for(;;) { byte[] b = new byte[1024*1024]; list.add(b); }}}Copy the code
```
1. The memory leaks out of memory
2. java -XX:+PrintCommandLineFlags HelloGC
3. java -Xmn10M -Xms40M -Xmx60M -XX:+PrintCommandLineFlags -XX:+PrintGC HelloGC PrintGCDetails PrintGCTimeStamps PrintGCCauses
4. java -XX:+UseConcMarkSweepGC -XX:+PrintCommandLineFlags HelloGC
5. Java -xx :+PrintFlagsInitial Default parameter value
6. Java -xx :+PrintFlagsFinal Final parameter value
7. Java – XX: + PrintFlagsFinal | grep XXX to find the corresponding parameters
8. java -XX:+PrintFlagsFinal -version |grep GC
9. Java – XX: + PrintFlagsFinal – version | wc -l altogether 728 parameters

PS GC logs

The log format is different for each garbage collector!

PS Log Format

Heap dump section:

Eden space 5632 k, 94% informs [x00000000fff00000 x00000000ff980000 0, 0 x00000000ffeb3e28, 0) at the back of the memory address is, the starting address, using the address space is over, the overall end address spaceCopy the code

Total = Eden + 1 survivor

Basic concepts before tuning:

Throughput: user code time/(user code execution time + garbage collection time)
Response time: The shorter the STW, the better the response time

First of all, what is tuning? Throughput first or response time first? Or in the case of meeting a certain response time, how much throughput is required…

Question:

Scientific computing, throughput. Data mining, Thrput. Average throughput priority :(PS + PO)

Response time: Website GUI API (1.8 G1)

What is tuning?

Plan and pre-tune the JVM as required
Optimizing the running JVM runtime environment (slow, sluggish)
(OOM) Resolve various JVM runtime issues (OOM)

Tuning starts with planning

Tuning, starting with a business scenario, without a business scenario tuning is rogue
No monitoring (pressure test, results can be seen), no tuning
Steps:
1. Familiarize yourself with business scenarios (there is no best garbage collector, only the best garbage collector)
  1. Response time, pause time [CMS G1 ZGC] (need to respond to user)
  2. Throughput = User time /(User time + GC time) [PS]
2. Select the collector combination
3. Calculate memory requirements (experience 1.5GB 16GB)
4. Select CPU (the higher the better)
5. Set age and upgrade age
6. Setting Log Parameters
  1. -Xloggc:/opt/xxx/logs/xxx-xxx-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=20M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCCause
  2. Or one log file per day
7. Observing logs
Case 1: vertical e-commerce, maximum daily million orders, what server configuration is required for order processing system?

This problem is somewhat amateurish, as many different server configurations can support it (1.5GB 16GB)

1 hour 360000 concentrated time, 100 orders/SEC, (find the peak within one hour, 1000 orders/SEC)

Experience value,

Must calculate: How much memory does it take to generate an order? 512K x 1000 500M memory

Ask professionally: request response time of 100ms

Pressure test!
Case 2:12306 How to support large-scale ticket snatching during the Spring Festival?

12306 should be the most concurrent seckill website in China:

Known as the highest concurrency 100W

CDN -> LVS -> NGINX -> Business Systems -> 1W concurrency per machine (10K problem) 100 machines

General e-commerce order -> place an order -> Order System (IO) reduce inventory -> wait for users to pay

One possible model for 12306: order -> destock and order (Redis kafka) simultaneously asynchronously -> etc payment

Destocking also ends up putting pressure on a server

Can do distributed local inventory + a separate server to do inventory balance

The handling method of heavy traffic: divide and conquer
How much memory does it take to get a transaction?
1. Get a machine, see how much TPS it can handle? Are you meeting your goals? Expand or tune it so that it reaches
2. Use manometry to determine

Optimizing the environment

There is a 500,000-PV data website (extracting documents from disk to memory). The original server has 32 bits and 1.5G heap, but the user feedback is relatively slow. Therefore, the company decides to upgrade the new server to 64 bits and 16G heap memory, but the user feedback is very serious and the efficiency is lower than before
1. Why is the original site slow? Lots of users browse data, lots of data load into memory, low memory, frequent GC, long STW, slow response time
2. Why is it more sluggish? The larger the memory, the longer the FGC time
3. Do how? PS -> PN + CMS or G1
System CPU is often 100%, how to tune? CPU100% Then there must be threads occupying system resources.
1. Find which process has the highest CPU (top)
2. Which thread in the process has the highest CPU (top-HP)
3. Export the stack for this thread (jStack)
4. Find which method (stack frame) consumes time (jStack)
5. Worker threads of high | gc thread high proportion
System memory is high, how to find the problem? (Interview frequency)
1. Exporting heap memory (JMAP)
2. Analysis (Jhat JVisualVM Mat jprofiler…)
How to monitor the JVM
1. jstat jvisualvm jprofiler arthas top…

Fix problems in JVM running

A common tool for case understanding

Test code:

package com.mashibing.jvm.gc; import java.math.BigDecimal; import java.util.ArrayList; import java.util.Date; import java.util.List; import java.util.concurrent.ScheduledThreadPoolExecutor; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; /** * get credit data from database, apply model, Public class T15_FullGC_Problem01 {private static class CardInfo {BigDecimal price = new BigDecimal (0.0); String name = "zhang 3 "; int age = 5; Date birthdate = new Date(); public void m() {} } private static ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(50, new ThreadPoolExecutor.DiscardOldestPolicy()); public static void main(String[] args) throws Exception { executor.setMaximumPoolSize(50); for (;;) { modelFit(); Thread.sleep(100); } } private static void modelFit(){ List<CardInfo> taskList = getAllCardInfo(); taskList.forEach(info -> { // do something executor.scheduleWithFixedDelay(() -> { //do sth with info info.m(); }, 2, 3, TimeUnit.SECONDS); }); } private static List<CardInfo> getAllCardInfo(){ List<CardInfo> taskList = new ArrayList<>(); for (int i = 0; i < 100; i++) { CardInfo ci = new CardInfo(); taskList.add(ci); } return taskList; }}Copy the code

java -Xms200M -Xmx200M -XX:+PrintGC com.mashibing.jvm.gc.T15_FullGC_Problem01
The operations team is usually the first to receive an alarm message (CPU Memory)
The top command displays the following problems: The memory continues to grow and the CPU usage remains high
Top-hp Looks at the threads in the process to see which thread has the highest CPU and memory ratio
JPS locates specific Java processes. Jstack locates thread status, focusing on: WAITING BLOCKED eg. WAITING on < 0x0000000088CA3310 > (a java.lang.object) We have to find out which thread is holding the lock how do we find that? 1: write a deadlock program, jstack observe 2: write a program, one thread holds the lock does not release, other threads wait
Why does ali specification require thread names (especially thread pools) to be written with meaningful names? (Custom ThreadFactory)
Jinfo PID Displays process information. Pid is found through JPS
Arthas observations/jconsole/jvisualVM/ Jprofiler jstat -gc 4655 500: If the interviewer asks you how do you position the OOM question? If you answer using a GRAPHICAL interface (wrong) 1: What does an already live system use instead of a graphical interface? Cmdline arthas 2: What exactly are graphical interfaces used for? The test! Monitor while testing! (Manometry observation)
Jmap histo – 4655 | head – 20, find how many objects
Jmap – dump: the format = b, the file = XXX pid:

In online system, the memory is very large, which will have a great impact on the process during the execution of jmap, and even lag (not suitable for e-commerce). 1: when the parameter HeapDump is set, the HeapDump file will be generated automatically in OOM. Many servers backup (high availability), stopping this server does not affect other servers 3: online positioning (generally not used by small companies) 4: pressure test in the test environment (similar to memory growth problem, dump when the heap is not very large)Copy the code

java -Xms20M -Xmx20M -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError com.mashibing.jvm.gc.T15_FullGC_Problem01

* - XX: + HeapDumpOnOutOfMemoryError OOM occurs in export pile and can be used for analysisCopy the code

Use MAT, the jhat or to analyze the dump file jvisualvm www.cnblogs.com/baihuitests… The jhat – J – mx512M XXX dump http://192.168.17.11:7000 at the end of the day: to find the corresponding link can use OQL find specific problems
Find the problem with the code

Jconsole Remote connection

Program start add parameters:

Java – Djava. Rmi. Server hostname = 192.168.17.11 – Dcom. Sun. Management jmxremote – Dcom. Sun. Management jmxremote. Port = 11111 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false XXX
If you encounter an error with Local host name unknown: XXX, modify the /etc/hosts file and add XXX to it

192.168.17.11 basic localhost localhost. Localdomain localhost4 localhost4. Localdomain4: : 1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Disable the Linux firewall.

Service iptables stopchkconfig iptables off
On Windows, open jConsole remote connection 192.168.17.11:11111

Jvisualvm remote connection

www.cnblogs.com/liugh/p/762… (Simple method)

Jprofiler (charge)

Arthas online search tool

Why online screening? In production, we often encounter some problems that are difficult to troubleshoot, such as thread safety problems, using the simplest threaddump or heapdump can not find the cause of the problem. In order to troubleshoot these problems, we sometimes add temporary logs, such as printing the entry and exit parameters in some key functions, and then repackage and publish. If we log and still find no problems, we add more logs and repackage and publish. For companies with complex on-line process and strict audit, there are layers of circulation from code modification to on-line, which will greatly affect the progress of problem investigation.
JVM Observe JVM information
Thread Locate a thread fault
Dashboard Observes the system status
Heapdump + JHAT analysis
Problem with jad decomcompiling dynamic proxy generated classes Locate third party class versions (observe code) problem (determine if your latest submitted version is in use)
Re-define hot replace Re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace re-define hot replace
sc – search class
watch – watch method
What’s not included: JMap, which comes with Linux and is similar to arthas for threaddump analysis

Basic concepts of GC algorithms

The efficiency of CardTable is very low because the whole OLD region needs to be scanned during YGC, so THE JVM designs CardTable. If there are objects in the OLD region CardTable pointing to Y region, it is set as Dirty. In the next scan, only Dirty cards need to be scanned. Card Table is implemented by BitMap

CMS

The problem of the CMS

Memory Fragmentation

+ UseCMSCompactAtFullCollection – – XX: XX: CMSFullGCsBeforeCompaction defaults to 0 refers to how many times before FGC is compressed
Floating Garbage

Concurrent Mode Failure causes: if the concurrent collector is unable to finish reclaiming the unreachable objects before the tenured generation fills up, or if an allocation cannot be satisfiedwith the available free space blocks in the tenured generation, then theapplication is paused and the collection is completed with all the applicationthreads stopped

Solution: Lower the threshold for triggering the CMS

PromotionFailed

The solution is similar, keeping the old age with enough space

– XX: CMSInitiatingOccupancyFraction 92% can reduce the value, keep the CMS old s enough space

CMS Log Analysis

Run the Java -xms20m -XMx20m -xx :+PrintGCDetails -xx :+UseConcMarkSweepGC com.mashibing.jvm.gc.t15_fullgC_problem01 command

[ParNew: 6144K->640K(6144K), 0.0265885 secs] 6585K->2770K(19840K), 0.0268035 secs] [Times: User sys = = 0.02 0.00, real = 0.02 secs]

ParNew: young generation collector

6144->640: Comparison before and after collection

(6144) : Capacity of the entire young generation

6585 -> 2770: Whole heap condition

(19840) : The entire heap size

[GC (CMS Initial Mark) [1 CMS-initial-mark: 8511K(13696K)] 9866K(19840K), 0.004032secs] [Times: User =0.01 sys=0.00, real=0.00 secs] //8511 (13696) : Total heap usage (Max) [CMS-concurrent-mark-start][CMS-concurrent-mark: 0.018/0.018secs][Times: User =0.01 sys=0.00, real=0.02 secs] 0.000/0.000 secs] [Times: User =0.00 sys=0.00, real=0.00 secs] // mark Card as Dirty [GC (CMS Final Remark) [YG occupancy: Class consolidation, class consolidation, class consolidation][Rescan (PARALLEL), 0.0002236secs][weak refs processing, class consolidation, 0.0005404 secS][Scrub Symbol table, 0.0006169 SECs][Scrub String table, 0.0004903 secs][1 CMS-remark: Secs] [Times: User =0.00 sys=0.00, real=0.00 secs] //STW phase, YG occupancy: young occupancy //[Rescan (parallel) : Class semantics: unclass class semantics: scrub symbol(string) table //cleaning up symbol and string tables which hold class-level metadata and //internalized string respectively //CMS-remark: 8511K(13696K): old age occupation and capacity after phase [CMS-concurrent-sweep-start][CMS-concurrent-sweep: 0.005/0.005 secs] [Times: User =0.00 sys=0.00, real=0.01 secs] // Mark completed, concurrent cleanup [cms-concurrent-reset-start][CMs-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] // Reset internal structure for next GCCopy the code

G1

Forced the www.oracle.com/technical-r…

G1 Log details

[GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.0015790 secS]// Young -> Young generation Evacuation-> Replication of survival objects // Initial-mark phase of hybrid recovery, here is YGC hybrid old generation recovery [Parallel Time: // A GC thread [GC Worker Start (MS): 92635.7] [Ext Root Scanning (MS): 1.1] [Update RS (MS): 0.0] [Processed Buffers: 1] [Scan RS (MS): 0.0] [Code Root Scanning (MS): 0.0] [Object Copy (MS): 0.1] [Termination (MS): 0.0] [Termination Attempts: 1] [GC Worker Other (MS): 0.0] [GC Worker Total (MS): 1.2] [GC Worker End (ms): 92636.9] [Code Root Fixup: 0ms] [Code Root Purge: 0ms] [Clear CT: 0ms] [Other: [Choose CSet: 0.0 ms] [Ref Proc: 0.0 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.0 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] [Eden: 0.0B(1024.0K)->0.0B(1024.0K) Survivors: 0.0B->0.0B Heap: 18.8 M (20.0 M) - > 18.8 M (20.0 M)] [Times: The user sys = = 0.00 0.00, [GC concurrent-root-region-scan-start][GC concurrent-root-region-scan-end, [GC concurrent-mark-start]// Unable to extend, FGC[Full GC (Allocation Failure) 18M->18M(20M), 0.0719656 SECs] [Eden: 0.0B(1024.0K)->0.0B(1024.0K) Survivors: 0.0b -> 0.0b Heap: 18.8m (20.0m)-> 18.8m (20.0m)], [Metaspace: 3876K->3876K(1056768K)] [Times: User sys = = 0.07 0.00, real = 0.07 secs]Copy the code

Case summary

There are a variety of reasons for OOM generation, some programs do not necessarily generate OOM, constantly FGC(high CPU, but very little memory reclamation) (above case)

Hardware upgrade system instead of the problem of lag (see above)
Thread pool misuse causes OOM problems (see above) adding objects to List repeatedly (too LOW)
Smile JIRA problem The actual system keeps rebooting to solve the problem add memory + replace garbage collector G1 What’s the real problem? I don’t know
Tomcat HTTP-header-size problem (Hector)

Lambda expressions cause MethodArea overflow problems (MethodArea/Perm Metaspace) lambdagc.java -xx :MaxMetaspaceSize= 9m-xx :+PrintGCDetails

"C:\Program Files\Java\jdk1.8.0_181\bin\java.exe" -XX:MaxMetaspaceSize=9M -XX:+PrintGCDetails "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\lib\idea_rt.jar=49316:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_181\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_181\jre\lib\rt.jar;C:\work\ijprojects\JVM\out\production\JVM;C:\work\ijprojects\ObjectSize\out\artifacts\ObjectSize_jar\ObjectSize.jar" com.mashibing.jvm.gc.LambdaGC[GC (Metadata GC Threshold) [PSYoungGen: 11341K->1880K(38400K)] 11341K->1888K(125952K), 0.0022190 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [Full GC (Metadata GC Threshold) [PSYoungGen: 1880K->0K(38400K)] [ParOldGen: 8K->1777K(35328K)] 1888K->1777K(73728K), [Metaspace: 8164K->8164K(1056768K)], 0.0100681 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC (Last ditch collection) [PSYoungGen: 0K->0K(38400K)] 1777K->1777K(73728K), 0.0005698 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [Full GC (Last ditch collection) [PSYoungGen: 0K->0K(38400K)] [ParOldGen: 1777K->1629K(67584K)] 1777K->1629K(105984K), [Metaspace: 8164K->8156K(1056768K)], 0.0124299 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] java.lang.reflect.InvocationTargetException    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:498)    at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:388)    at sun.instrument.InstrumentationImpl.loadClassAndCallAgentmain(InstrumentationImpl.java:411)Caused by: java.lang.OutOfMemoryError: Compressed class space    at sun.misc.Unsafe.defineClass(Native Method)    at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)    at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)    at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)    at java.security.AccessController.doPrivileged(Native Method)    at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)    at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)    at sun.reflect.ReflectionFactory.generateConstructor(ReflectionFactory.java:398)    at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:360)    at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1574)    at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:79)    at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:519)    at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)    at java.security.AccessController.doPrivileged(Native Method)    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:494)    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)    at javax.management.remote.rmi.RMIConnectorServer.encodeJRMPStub(RMIConnectorServer.java:727)    at javax.management.remote.rmi.RMIConnectorServer.encodeStub(RMIConnectorServer.java:719)    at javax.management.remote.rmi.RMIConnectorServer.encodeStubInAddress(RMIConnectorServer.java:690)    at javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:439)    at sun.management.jmxremote.ConnectorBootstrap.startLocalConnectorServer(ConnectorBootstrap.java:550)    at sun.management.Agent.startLocalManagementAgent(Agent.java:137)
Copy the code

Unsafe. direct memory overflow (rare) The Deep Understanding of the Java Virtual Machine (P59), using Unsafe to allocate direct memory, or using NIO
Stack overflow problem -Xss setting is too small

Compare the similarities and differences between these two programs and see which is better:

Object o = null; for(int i=0; i<100; i++) { o = new Object(); } for(int I =0; i<100; i++) { Object o = new Object(); }Copy the code

In xiaomi cloud, HBase synchronization system, the system accesses timeout alarm through nginx. Finally, C++ programmer rewriting finalize will cause frequent GC problems. Why C++ programmer rewriting finalize will cause frequent GC problems? (New Delete) Finalize Takes a long time (200ms)
If you have a system that consistently consumes less than 10% of memory, but you look at the GC log and see that FGC occurs frequently, what causes it? System.gc() (this is Low)
Distuptor has a feature that allows you to set the length of the chain. If it is too large, the object will overflow if it is not released after consuming.
1.6.5 a temporary version of SQL subquery parsing algorithm has a problem, 9 exists combined SQL generated millions of objects (from dead objects)
(low) should use the thread pool. Solution: Reduce the heap space (too low) and reserve more memory to generate native threads
Recent student case SQLLite class library, batch processing will load all the results into memory, some people suddenly update hundreds of thousands of data, the result of memory overflow, location is used to eliminate the module is no problem, add the module will be a problem
Memory overflow caused by online Java decompression and compressed files. Procedure
Java using OpencV causes lag and slowness
The most likely reporting system to crash
System crash caused by database and table

GC common parameters

– xMn-xms-xmx-XSS Minimum heap maximum stack space of the young generation

The minimum heap and maximum heap are generally configured the same in a production environment
-xx :+UseTLAB Uses TLAB and is enabled by default
-xx :+PrintTLAB Displays the usage of TLAB
-xx :TLABSize Sets the TLAB size
-xx :+DisableExplictGC system.gc () does not work, FGC
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-XX:+PrintGCTimeStamps
– XX: + PrintGCApplicationConcurrentTime (low) to print the application time
– XX: + PrintGCApplicationStoppedTime (low) to print the pause time
-xx :+PrintReferenceGC (low importance) records how many references of different reference types are recycled
-verbose: indicates the detailed process of class loading
-XX:+PrintVMOptions
-xx :+PrintFlagsFinal -xx :+PrintFlagsInitial must be used
-Xloggc:opt/log/gc.log
-xx :MaxTenuringThreshold Indicates the ascending age. The maximum value is 15
-xx :PreBlockSpin Hot code detection parameter -xx :CompileThreshold Escape analysis scalar replacement… These are not recommended

Parallel common parameters

-XX:SurvivorRatio
How – XX: PreTenureSizeThreshold large object to the end
-XX:MaxTenuringThreshold
-xx :+ParallelGCThreads Specifies the number of parallel threads for the collector, which is also applicable to CMS
-xx :+UseAdaptiveSizePolicy Automatically selects the size ratio of each area

CMS Common Parameters

-XX:+UseConcMarkSweepGC
-xx :ParallelCMSThreads Number of CMS threads
– XX: CMSInitiatingOccupancyFraction began after the CMS how much proportion of the old s collection, the default is 68% (approximation), if frequent SerialOld caton, should be small, frequent (CMS)
– XX: + UseCMSCompactAtFullCollection during FGC is compressed
How many times – XX: CMSFullGCsBeforeCompaction FGC after compressing
-XX:+CMSClassUnloadingEnabled
– XX: CMSInitiatingPermOccupancyFraction reaches what proportion in Perm
GCTimeRatio Sets the percentage of program runtime that GC time occupies
-xx :MaxGCPauseMillis pause time, which is a suggested time that GC will try to achieve by various means, such as reducing the young generation

G1 Common Parameters

-XX:+UseG1GC
-xx :MaxGCPauseMillis recommended value. G1 will try to adjust the number of blocks in the Young block to reach this value
– XX: GCPauseIntervalMillis? GC interval time
-xx :+G1HeapRegionSize Partition size. You are advised to increase the value gradually. 1 2 4 8 16 32 As size increases, garbage lives longer, GC intervals are longer, but each GC takes longer. ZGC has been improved (dynamic block size)
G1NewSizePercent Specifies the minimum size of the new generation. The default value is 5%
G1MaxNewSizePercent Specifies the maximum percentage of the new generation. The default value is 60%
GCTimeRatio GC time recommended ratio, which G1 adjusts the heap space based on
ConcGCThreads Number of threads
InitiatingHeapOccupancyPercent start G1 heap space utilization ratio

Practical use of common tools

Jmap and Heapdump cannot be used in the production environment because there may be several GIGABytes of memory in the production environment. Can only be used in test environments or high availability environments. – XX: + HeapDumpOnOutOfMemoryError OOM occurs in export pile and can be used for analysis. You may not be able to analyze the problem. The best way is to write a log.

Linux Built-in Commands

View process information jinfo

Used to view process information, often used to query JVM parameters

[root@instance-m33tfvmh ~]# jps10632 nacos-server.jar27101 Jps[root@instance-m33tfvmh ~]# jinfo 10632Attaching to process ID 10632, please wait... Debugger Attached successfully.Server Compiler detected.JVM version is 25.162-B12Java System Properties: the Java runtime. Name = Java (TM) SE runtime Environmentjava. Vm. Version = 25.162 - b12sun. Boot. If the path = / home/jdk1.8.0 _162 / jre/lib/amd64java. Protocol handler. PKGS = org. Springframework. Boot. Loaderjava. Vendor. Url = http://java.oracle.com/... VM Flags:Non-default VM flags: -XX:CICompilerCount=2 -XX:GCLogFileSize=104857600 -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 -XX:MaxNewSize=268435456 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=268435456 -XX:NumberOfGCLogFiles=10 -XX:OldSize=268435456 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseGCLogFileRotation Command line: -Xms512m -Xmx512m -Xmn256m -Dnacos.standalone=true - Djava. Ext dirs = / home/jdk1.8.0 _162 / jre/lib/ext: / home/jdk1.8.0 _162 / lib/ext - Xloggc: / soft/nacos/logs/nacos_gc log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dloader.path=/soft/nacos/plugins/health,/soft/nacos/plugins/cmdb,/soft/nacos/plugins/mysql -Dnacos.home=/soft/nacosCopy the code

Listing stack information jstack:

Jstack can list threads to check for deadlocks and which thread has a high CPU usage.

This command is used to check the high CPU usage.

Thread information in the current process:

thread

Thread name: Attach Listener, HTTP-NIO-8848-exec-28

Number: #252, #220

Priority: prio=9, prio=5

Status: RUNNABLE, TIMED_WAITING (parking)

[root@instance-m33tfvmh ~]# jps10632 nacos-server.jarf569 Jps[root@instance-m33tfvmh ~]# jstack 10632 | more2020-08-17 20:43:37Full Thread Dump Java HotSpot(TM) 64-bit Server VM (25.162-B12 mixed mode):"Attach Listener" #252 Prio =9 os_prio=0 tid=0x00007f2fc808a800 nid=0x599 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE"http-nio-8848-exec-28" #220 daemon prio=5 os_prio=0 tid=0x00007f2fc4997800 nid=0x3b06 waiting on condition [0x00007f2faa63d000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f27a7d38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85) at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748)Copy the code

java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) – parking to wait for <0x00000000f27a7d38>

The above code means blocked, waiting for a lock 0x00000000F27a7d38.

The thread names above are not standard. According to the development specification of Alibaba, the thread pool or thread name must be specified to facilitate problem tracing.

jmap

The effect is similar to the threaddump command under arthas, which lists the most memory-hogging objects. The following command lists the objects in the top 20 rows that occupy the most memory.

[root@instance-m33tfvmh tmp]# jmap -histo 10632 | head -20 num     #instances         #bytes  class name----------------------------------------------   1:        168374      106102736  [B   2:        592794      100008712  [C   3:        319497       12779880  org.apache.derby.impl.sql.conn.GenericStatementContext$CancelQueryTask   4:        456634       10959216  java.lang.String   5:         75096        7996136  [I   6:        480161        7682576  java.lang.Object   7:        125709        5383840  [Ljava.util.HashMap$Node;   8:        105478        4613816  [Ljava.lang.Object;   9:        135823        4346336  java.util.HashMap$Node  10:        106824        4272960  java.util.LinkedHashMap$Entry  11:         56638        3171728  java.util.LinkedHashMap  12:         82954        2654528  java.util.concurrent.ConcurrentHashMap$Node  13:         29979        2638152  java.lang.reflect.Method  14:         16799        2419056  org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper  15:         17954        2010848  sun.nio.ch.SocketChannelImpl  16:         41720        2002560  java.util.HashMap  17:        103232        1835216  [Ljava.lang.Class;
Copy the code

arthas

An overview of the

There are two types of diagnostic analysis tools:

Graphical interface tools:

Jvisulvm (in Java /bin)

Jprofiler (charge)
Command line interface tools (emphasis) :

arthas

Scenarios for graphical user interface analysis are as follows:

A cluster environment can be fusing a machine for analysis
Use the tcpdump command to copy the traffic to the standby machine for analysis.
During the pressure test.

Otherwise, use the command-line interface tool.

Document Address:

Github.com/alibaba/art…

Download and run:

Download arthas-boot.jar and start it with java-jar:

curl -O https://arthas.aliyun.com/arthas-boot.jarjava -jar arthas-boot.jar
Copy the code

Print help information:

java -jar arthas-boot.jar -h
Copy the code

If the download is slow, you can use aliyun’s image: Java -jar arthas-boot.jar –repo-mirror aliyun –use-http

[root@ ~]# java -jar arthas-boot.jar[INFO] arthas-boot version: 3.3.9[INFO] Found Existing Java process, please choose one and input the serial number of the process, eg: 1. Then hit ENTER.* [1]: 10632 /soft/nacos/target/nacos-server.jar

When you start arthas, you will find the process 10632 of nacOS. Next, you need to attach arthas to the NacOS program for inspection. On the above screen, you can list the programs started by the server.

The complete command and log are as follows:

[root@instance-m33tfvmh ~]# java -jar arthas-boot.jar[INFO] arthas-boot version: 3.3.9[INFO] Found Existing Java process, please choose one and input the serial number of the process, eg: 1\. Then hit ENTER.* [1]: 10632 /soft/nacos/target/nacos-server.jar1[INFO] Start download arthas from remote server: https://arthas.aliyun.com/download/3.3.9?mirror=aliyun/INFO File size: 11.44 MB, downloaded the size: 2.05 MB, downloading... [INFO] File size: 11.44 MB, downloaded size: 4.26 MB, downloading... [INFO] File size: 11.44 MB, downloaded size: 6.05 MB, downloading... [INFO] File size: 11.44 MB, downloaded size: 7.44 MB, downloading... [INFO] File size: 11.44 MB, downloaded size: 9.32 MB, downloading... [INFO] File size: 11.44 MB, downloaded size: 11.20 MB, downloading... [INFO] Download arthas success.[INFO] arthas home: / root /. Arthas/lib / 3.3.9 arthas [INFO] Try to attach the process 10632 [INFO] attach process 10632 success. [INFO] arthas - client Connect 127.0.0.1 3658, -. -- -- -- -- -- -, -- -- -- -- -- -- -- -.,.,.,. -, -. / O \ | -. '-.. -' | '-' | / O \ '. - '|. -. | | '-' | |. | -. | |. -. | `. ` - | | | | | | \ \ | | | | | | | | | |. - '| ` - `' ` - '-' - '` - ` - `' ` - '-' `--'`-----' wiki https://arthas.aliyun.com/doc tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html version Pid 10632 time 2020-08-17 21:34:19 [arthas@10632]$Copy the code

You can change the name in the system to [arthas@10632].

Arthas’s command:

help

View the commands arthas supports through help

[arthas@10632]$ help NAME DESCRIPTION help Display Arthas Help keymap Display all the available keymap for the specified  connection. sc Search all the classes loaded by JVM sm Search the method of classes loaded by JVM ...Copy the code

dashboard

Using the dashboard tool, you can view the thread id, name, group, priority, CPU, and running time

Group can see whether it is a system thread or a business thread. Observe if memory or CPU continues to increase, indicating a problem with the program.

jvm

You can see that the garbage collector name is COLLECTORS, the new generation algorithm is Copy, and the old algorithm is MarkSweepCompact

...GARBAGE-COLLECTORS                                                                                                                     ---------------------------------------------------------------------------------------------------------------------------------------- Copy                                   name : Copy                                                                                      [count/time (ms)]                      collectionCount : 35340                                                                                                                 collectionTime : 452285                                                                                                     MarkSweepCompact                       name : MarkSweepCompact                                                                          [count/time (ms)]                      collectionCount : 3                                                                                                                     collectionTime : 374...
Copy the code

thread

View the list of threads

Add the thread number to view the stack information of a specific thread

View deadlocked threads by thread

heapdump

Exporting the heap to a file is used to analyze the cause of full GC

[arthas@10632]$ heapdumpDumping heap to /tmp/heapdump2020-08-17-22-06145210892456962923.hprof ... Heap dump file created[arthas@10632]$Copy the code

Download to the local, and then open the tool for analysis, find the memory occupying objects to analyze the code.

jad

Decompile class online to analyze the contents of a running program

redefine

After analyzing the inconsistency of the published book by JAD, re-define class replacement can be carried out to achieve non-stop effect.

homework

-xx: What does MaxTenuringThreshold control? A: the age at which the object is promoted to the old age B: the percentage of memory garbage when FGC is triggered in the old age
In production environments, you tend to set maximum heap memory and minimum heap memory to :(why?) A: Same B: Different
The default garbage collector for JDK1.8 is: A: ParNew + CMS B: G1 C: PS + ParallelOld D: none of the above
What is response time first?
What is throughput First?
What’s the difference between ParNew and PS?
What’s the difference between ParNew and ParallelOld? (Different era, different algorithm)
The scenario for long computation should be: A: pause time B: throughput
Large-scale e-commerce sites should choose: A: pause time B: throughput
What are the most commonly used garbage collectors for HotSpot?
What are the common combinations of HotSpot garbage collectors?
What is the default garbage collector for JDK1.7 1.8 1.9? How to check?
What exactly is tuning?
If you use PS + ParrallelOld, what can you do to make the system almost FGC free
If you use the ParNew + CMS combination, how can you make your system almost FGC free

1. Increase the memory of JVM 2. Increase the proportion of Young 3. 5. Avoid code memory leaksCopy the code

Is G1 generational? Does the G1 garbage collector produce FGC?
If G1 produces FGC, what should you do?

1. Memory expansion 2. CPU performance improvement (Fast garbage collection, fixed object generation speed of service logic, faster garbage collection, larger memory space) 3. Lower the threshold for MixedGC to occur earlier (default: 45%)Copy the code

Q: Can you dump casually in a production environment? Small heaps do not have much impact, large heaps can have service pauses or delays (alleviated by live), and FGC before dump
Q: What are the common OOM problems? Stack heap MethodArea direct memory
What if the JVM process exits quietly?

-xx :ErrorFile=/var/log/hs_err_pid<pid>. Log Super complex files include: Crash-thread information SafePoint information lock information Native code cache, compile events, GC related records JVM memory mapping, etc. The log is located in /var/log/messages. 2. Egrep -i 'killed process' /var/log/messages. Hardware or kernel q.1 dmesg | grep Java 5. Look for the horse soldiers teacher * * ^ ^! **Copy the code

How to check direct memory?

1. Open -- -xx :NativeMemoryTracking=detail. 2Copy the code

What are the common log analysis tools?

1.  gceasy
Copy the code

How to troubleshoot CPU surge?

1. top -Hp jstack 2. arthas - dashboard thread thread XXXX 3. There are two scenarios: 1: service thread 2: GC thread - GC logCopy the code

How to check deadlocks?

Arthas-thread-b arthas-thread-bCopy the code

The resources

blogs.oracle.com/jonthecollector/our-collectors
Docs.oracle.com/javase/8/do…
Java.sun.com/javase/tech…
The JVM tuning reference documentation: docs.oracle.com/en/java/jav…
www.cnblogs.com/nxlhero/p/1… Online Screening tool
www.jianshu.com/p/507f7e0cc… Arthas common command
Arthas handbook:
1. Arthas java-jar arthas-boot.jar
2. Binding a Java process
3. The dashboard command displays the overall system status
4. Help View help
5. Help xx View the help information about a command
Jmap command reference: www.jianshu.com/p/507f7e0cc…
1. jmap -heap pid
2. jmap -histo pid
3. jmap -clstats pid
Blog.csdn.net/chenssy/art… Analyze hotspot Error file
www.cnblogs.com/cxxjohnson/… Garbage collector