This is the third day of my participation in the August More text Challenge. For details, see: August More Text Challenge
The first step in JVM tuning is to understand common JVM command-line arguments
-
The JVM command-line parameters reference: docs.oracle.com/javase/8/do…
-
HotSpot parameter Classification
Standard: – Initially, all hotspots are supported
Non-standard: Starting with -x, certain versions of HotSpot support certain commands
Unstable: starts with -xx and may be cancelled in the next version
java -version
java -X
Test procedure
import java.util.List;
import java.util.LinkedList;
public class HelloGC {
public static void main(String[] args) {
System.out.println("HelloGC!");
List list = new LinkedList();
for(;;) {
byte[] b = new byte[1024*1024]; list.add(b); }}}Copy the code
-
Memory leaks out of memory
-
java -XX:+PrintCommandLineFlags HelloGC
-
java -Xmn10M -Xms40M -Xmx60M -XX:+PrintCommandLineFlags -XX:+PrintGC HelloGC
PrintGCDetails PrintGCTimeStamps PrintGCCauses PrintGCDetails PrintGCTimeStamps PrintGCCauses
-
java -XX:+UseConcMarkSweepGC -XX:+PrintCommandLineFlags HelloGC
-
Java -xx :+PrintFlagsInitial Default value
-
Java -xx :+PrintFlagsFinal Final parameter value
-
Java – XX: + PrintFlagsFinal | grep XXX to find the corresponding parameters
-
java -XX:+PrintFlagsFinal -version |grep GC
Description of PS GC logs
The logging format is different for each garbage collector!
PS Log Format
Heap dump section:
eden space 5632K, 94% used [0x00000000ff980000.0x00000000ffeb3e28.0x00000000fff00000The memory address after the address refers to the start address, the end address of the space used, and the end address of the whole spaceCopy the code
Total = Eden + 1 Survivor
Basic concepts before tuning:
- Throughput: User code time/(User code execution time + Garbage collection time)
- Response time: The shorter the STW, the better the response time
So tuning, first of all, what do you want? Throughput or response time? Or how much throughput is required to meet certain response times…
Question:
Scientific calculation, throughput. Data Mining, ThrPUT. Throughput priority general :(PS + PO)
Response time: Site GUI API (1.8 G1)
What is tuning?
- JVM planning and pre-tuning based on requirements
- Optimize the JVM runtime environment (slow, slow)
- Solve various problems during JVM running (OOM)
Tuning starts with planning
-
Tuning starts with a business scenario. Tuning without a business scenario is rogue
-
No monitoring (pressure test, results can be seen), no tuning
-
Steps:
- Familiarity with business scenarios (there is no best garbage collector, only the most appropriate garbage collector)
- Response time, pause time [CMS G1 ZGC] (need to respond to user)
- Throughput = User time /(User time + GC time) [PS]
- Select the collector combination
- Calculate memory requirements (experience 1.5GB 16GB)
- Select the CPU (the higher the better)
- Set age and upgrade age
- Setting log Parameters
- -Xloggc:/opt/xxx/logs/xxx-xxx-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=20M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCCause
- Or generate a log file every day
- Observing logs
- Familiarity with business scenarios (there is no best garbage collector, only the most appropriate garbage collector)
-
Case 1: Vertical e-commerce, up to million orders per day, what server configuration is required to process the order system?
This is amateurish, as many different server configurations can support it (1.5g, 16GB)
1 hour 360,000 concentrated time, 100 orders/second, (find the peak time within an hour, 1000 orders/second)
Experience value,
Have to calculate: How much memory does an order need to be generated? 512K x 1000 500 MB memory
A professional question: the response time is required to be 100ms
Pressure test!
-
Case 2: How should 12306 be supported when encountering large-scale ticket grabbing during Spring Festival?
12306 should be China’s largest number of concurrent seckill website:
It claims to have the highest concurrent volume of 100W
CDN -> LVS -> NGINX -> Service System -> each machine 1W concurrent (10K problem) 100 machines
General e-commerce order -> order -> Order system (IO) reduce inventory -> wait for user payment
A possible model of 12306: order -> destocking and order (Redis Kafka) simultaneously asynchronously -> wait for payment
Destocking also ends up putting pressure on a server
You can do distributed local inventory + separate server for inventory balancing
Treatment of large flow: divide and conquer
-
How to find out how much memory a transaction consumes?
-
Get a machine, see how much TPS it can take? Did you achieve your goals? Expand or tune it to reach
-
Use a pressure test to determine
-
Optimizing the environment
-
There is a 500,000pv data type website (extract documents from disk to memory). The original server is 32 bit, 1.5G heap, user feedback site is slow, so the company decided to upgrade, the new server is 64 bit, 16G heap memory, the result of user feedback card is very serious, but the efficiency is lower than before
-
Why is the original site slow?
Lots of users browsing data, lots of data loading into memory, insufficient memory, frequent GC, long STW, slow response time
-
Why is it more slow?
The larger the memory, the longer the FGC time
-
Do how?
PS -> PN + CMS 或者 G1
-
-
The system CPU is often 100%, how to tune? (Interview frequency)
If the CPU is 100%, there must be some thread occupying the system resources.
- Find out which process has the highest CPU (top)
- Which thread in the process has the highest CPU (top-HP)
- Export the thread’s stack (jStack)
- Find which method (stack frame) consumes time (jStack)
- Worker threads of high | gc thread high proportion
-
System memory is soaring. How to find the problem? (Interview frequency)
- Exporting heap memory (JMAP)
- Analysis (Jhat JVisualvm mat Jprofiler…)
-
How to monitor the JVM
- jstat jvisualvm jprofiler arthas top…
Resolve problems with JVM running
A case study to understand common tools
-
Test code:
package com.mashibing.jvm.gc; import java.math.BigDecimal; import java.util.ArrayList; import java.util.Date; import java.util.List; import java.util.concurrent.ScheduledThreadPoolExecutor; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; /** * Read the credit data from the database, apply the model, and record and transmit the results */ public class T15_FullGC_Problem01 { private static class CardInfo { BigDecimal price = new BigDecimal(0.0); String name = "Zhang"; int age = 5; Date birthdate = new Date(); public void m(a) {}}private static ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(50.new ThreadPoolExecutor.DiscardOldestPolicy()); public static void main(String[] args) throws Exception { executor.setMaximumPoolSize(50); for(;;) { modelFit(); Thread.sleep(100); }}private static void modelFit(a){ List<CardInfo> taskList = getAllCardInfo(); taskList.forEach(info -> { // do something executor.scheduleWithFixedDelay(() -> { //do sth with info info.m(); }, 2.3, TimeUnit.SECONDS); }); } private static List<CardInfo> getAllCardInfo(a){ List<CardInfo> taskList = new ArrayList<>(); for (int i = 0; i < 100; i++) { CardInfo ci = new CardInfo(); taskList.add(ci); } returntaskList; }}Copy the code
-
java -Xms200M -Xmx200M -XX:+PrintGC com.mashibing.jvm.gc.T15_FullGC_Problem01
-
Operations team is usually the first to receive the alarm message (CPU Memory)
-
A problem is observed with the top command: the memory is increasing and the CPU usage is high
-
Top-hp Watches the threads in the process and determines which thread has the highest CPU and memory ratio
-
JPS locates the specific Java process
Jstack locates thread status
Focus: WAITING BLOCKED
Eg. Waiting on <0x0000000088ca3310> (a java.lang.object)
Find out which thread holds the lock RUNNABLE in jstack dump
Homework:
1: write a deadlock program, using jStack to observe
2: write a program, one thread holds the lock does not release, other threads wait
-
Why does the Ali specification require that thread names (especially thread pools) be given meaningful names? (Custom ThreadFactory)
-
jinfo pid
-
Jstat-gc Observe GC dynamically/read GC logs to find frequent GC/Arthas observe/jconsole/jvisualVM/ Jprofiler (best used) Jstat-GC 4655 500: Print GC for each 500 milliseconds
If an interviewer asks you how to identify an OOM question? If you answer with a graphical interface (wrong)
- What does a system that is already online use without a graphical interface? (cmdline arthas)
- Where is the graphical interface used? The test! Monitor the test! (Pressure observation)
-
Jmap histo – 4655 | head – 20, find how many object 】 【 view OOM problem, has influence to the process, but not quite, can use online
-
jmap -dump:format=b,file=xxx pid
On the online system, the memory is very large, jMAP can have a big impact on the process during the heap dump, even lag (not suitable for e-commerce).
- When HeapDump is set to OOM, the HeapDump file will be generated automatically
- Many servers are backed up (high availability) and stopping this server does not affect other servers
- Online location (generally smaller companies do not use)
-
java -Xms20M -Xmx20M -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError com.mashibing.jvm.gc.T15_FullGC_Problem01
-
Dump file analysis using MAT/Jhat/JVisualVM
www.cnblogs.com/baihuitests…
jhat -J-mx512M xxx.dump
http://192.168.17.11:7000
Pull to the end: find the corresponding link
You can use OQL to find specific problem objects
-
Find the problem with the code
Jconsole remote connection
-
Program start add parameter:
Java - Djava. Rmi. Server hostname = 192.168.17.11 - Dcom. Sun. Management jmxremote - Dcom. Sun. Management jmxremote. Port = 11111 - Dcom. Sun. Management. Jmxremote. Authenticate = false - Dcom. Sun. Management jmxremote. SSL = false (XXX) Java command 】 【 runtimeCopy the code
-
If you encounter an error indicating Local host name unknown: XXX, modify the /etc/hosts file and add XXX to it
192.16817.11. basic localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Copy the code
-
Turn off the Linux firewall (should open the corresponding port in actual combat)
Service iptables stop chkconfig iptables offCopy the code
-
Open the JConsole remote connection to 192.168.17.11:11111 on Windows
Jvisualvm remote connection
www.cnblogs.com/liugh/p/762… (Simple)
Jprofiler (charge)
Arthas Online troubleshooting tool
Gitee looking up
-
Why online screening?
In production, we often encounter some problems that are difficult to troubleshoot, such as thread safety problems, and it is difficult to find the cause of the problem with the simplest threaddump or heapdump. In order to troubleshoot these problems, we sometimes add some temporary logs, such as printing the input parameters in some key functions, and then repackage the release. If the log is still not found, then continue to add the log, and repackage the release. For companies with complex online process and strict audit, the process from code change to online needs layers of circulation, which will greatly affect the progress of problem investigation.
-
JVM Observe JVM information
-
Thread Indicates the thread fault
-
Dashboard Monitors the system
-
Heapdump + Jhat analysis
-
Jad decompiling
Problem location of dynamic proxy generation classes
Third party classes (observe code)
Versioning issues (to determine if your latest commit is being used)
-
Redefine hot replacement
Currently, there are some restrictions: you can only change the method implementation (the method has already run), you can’t change the method name, and you can’t change the properties
m() -> mm()
-
sc -search class
-
watch -watch method
-
Features not included: jMAP-histo
Case summary
There are a variety of reasons for OOM generation, some programs may not generate OOM generation, but FGC(CPU surge, but memory reclamation is particularly low) (example above)
-
Hardware upgrade system instead of the problem (see above)
-
(see above) Adding objects to the List (too LOW)
-
Smile [trainee] Jira [analysis tool] Problem Actual system restart constantly solve the problem add memory + replace the garbage collector G1 Where is the real problem? I don’t know
-
Tomcat http-header-size is too large (Hector)
-
Java -xx :MaxMetaspaceSize= 9m-xx :+PrintGCDetails
public class LambdaGC { public LambdaGC(a) {}public static void main(String[] args) { while(true) { LambdaGC.I var1 = LambdaGC.C::n; }}public static class C { public C(a) {}static void n(a) { System.out.println("hello"); }}public interface I { void m(a); }}Copy the code
The test results
"C: \ Program Files \ Java \ jdk1.8.0 _181 \ bin \ Java exe" -XX:MaxMetaspaceSize=9M -XX:+PrintGCDetails "-javaAgent :C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\lib\idea_rt.jar= 49316:c :\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.1\bin" -Dfile.encoding=UTF-8 -classpath "C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ charsets jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ deploy the jar. C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ access - bridge - 64. The jar. C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ cldrdata jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ DNSNS jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ jaccess jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ JFXRT jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ localedata jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ nashorn jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ sunec jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ sunjce_provider jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ sunmscapi jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ sunpkcs11 jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ ext \ zipfs jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ javaws jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ jce jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ JFR jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ JFXSWT jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ jsse jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ management - agent jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ plugin jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ resources jar; C: \ Program Files \ Java \ jdk1.8.0 _181 \ jre \ lib \ rt jar; C:\work\ijprojects\JVM\out\production\JVM; C:\work\ijprojects\ObjectSize\out\artifacts\ObjectSize_jar\ObjectSize.jar" com.mashibing.jvm.gc.LambdaGC [GC (Metadata GC Threshold) [PSYoungGen: 11341K->1880K(38400K)] 11341K->1888K(125952K), 0.0022190 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [Full GC (Metadata GC Threshold) [PSYoungGen: 1880K->0K(38400K)] [ParOldGen: 8K->1777K(35328K)] 1888K->1777K(73728K), [Metaspace: 8164K->8164K(1056768K)], 0.0100681 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] [GC (Last ditch collection) [PSYoungGen: 0K->0K(38400K)] 1777K->1777K(73728K), 0.0005698 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [Full GC (Last ditch collection) [PSYoungGen: 0K->0K(38400K)] [ParOldGen: 1777K->1629K(67584K)] 1777K->1629K(105984K), [Metaspace: 8164K->8156K(1056768K)], 0.0124299 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:388) at sun.instrument.InstrumentationImpl.loadClassAndCallAgentmain(InstrumentationImpl.java:411) Caused by: java.lang.OutOfMemoryError: Compressed class space at sun.misc.Unsafe.defineClass(Native Method) at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java: 63).at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java: 399).at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java: 394).at java.security.AccessController.doPrivileged(Native Method) at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java: 393).at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java: 112).at sun.reflect.ReflectionFactory.generateConstructor(ReflectionFactory.java: 398).at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java: 360).at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java: 1574).at java.io.ObjectStreamClass.access$1500 (ObjectStreamClass.java: 79).at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java: 519).at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java: 494).at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass. <init> (ObjectStreamClass.java: 494).at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java: 391).at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java: 1134).at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java: 1548).at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java: 1509).at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java: 1432).at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java: 1178).at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java: 348).at javax.management.remote.rmi.RMIConnectorServer.encodeJRMPStub(RMIConnectorServer.java: 727).at javax.management.remote.rmi.RMIConnectorServer.encodeStub(RMIConnectorServer.java: 719).at javax.management.remote.rmi.RMIConnectorServer.encodeStubInAddress(RMIConnectorServer.java: 690).at javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java: 439).at sun.management.jmxremote.ConnectorBootstrap.startLocalConnectorServer(ConnectorBootstrap.java: 550).at sun.management.Agent.startLocalManagementAgent(Agent.java: 137).Copy the code
-
Deep Understanding of the Java Virtual Machine P59, using Unsafe to allocate direct memory, or using NIO issues
-
Stack overflow problem -Xss setting is too small
-
Compare the similarities and differences between these two programs, and analyze which is the better way to write:
Object o = null; for(int i=0; i<100; i++) { o = new Object(); // Business processing } Copy the code
for(int i=0; i<100; i++) { Object o = new Object(); } Copy the code
-
Overwriting finalize causes frequent GC mirecloud and HBase synchronization system, and the system raises an alarm through nginx access timeout. Finally, we check that C++ programmers overwriting finalize causes frequent GC problems
Why would C++ programmers rewrite finalize? Finalize (New Delete) Takes a long time (200ms)
-
If you have a system that consumes no more than 10% of memory all the time, but you look at the GC log and you see that FGC is always occurring frequently, what causes it? System.gc() (this is Low)
-
Distuptor has an option to set the length of the chain. If it is too large, and the object is too large, it will overflow
-
A temporary version of 1.6.5 has problems resolving the SQL subquery algorithm. The combined SQL of 9 exists causes millions of objects to be generated.
-
The JVM memory is 50% to 80% of the physical memory. If the number of threads is too high, native Thread OOM will be generated. If the number of threads is too high, native thread OOM will be generated
fibers
Resource switching costs too much resources, optimize the process: program -> process -> thread -> fiber
Fibers are in user space and threads are in kernel space
The number of fiber can be started is more, suitable for a large number of calculations; The number of threads that can start is smaller
Each Fiber corresponds to a stack