This article mainly collates the JVM teaching materials and “Deep Understanding of Java Virtual Machine”, part of the materials collated from the network, has been unknown sources
JVM specification
1.1 bit operations
1.1.1 integer int
- Source code: the first sign bit, 0 is positive and 1 is negative
- Reverse code: the symbol bit does not move, the original code to reverse
- complement
- Positive complement: and the same source code
- Negative complement: if the sign bit does not move, the inverse is added by 1
example
-6 Source code: 10000110 Reverse code: 11111001 Complement code: 11111010Copy the code
- Why complement
- 0 can be represented unambiguously
So instead of using the complement, let’s say 0 is zero
Positive: 0000 0000 Negative: 1000 0000Copy the code
Inconsistent use of complement:
Negative: 1000 0000 Negative: 1111 111 Complement: 0000 0000 = positiveCopy the code
Positive and negative numbers using the complement is the same thing as adding and doing things with the complement
1.1.2 Single-precision Float
- representation
When the index of a
- All zeros, mantissa additional bit is 0
- If they are not all zeros, then the additional mantissa bits are 1 so that the mantissa bits make up 24 bits
S*M*2^(e-127) eg: -5 single precision 1 10000001 01000000000000000000000 its sign bit S is 1, indicating negative -1 index bit E: 10000001, E =129 Mantissa additional bit: index bit is not all 0, it is 1 mantissa M: 1+2^-2; (- 2, the second number mantissa bits from right to left) results: – 1 * (1 + 2 ^ 2) * 2 ^ (129-127) = 5
Ii.JVM operation mechanism
2.1 JVM Startup Process
2.2 Basic Structure of JVM
The method area physically exists in the heap, and in the persistent generation of the heap; The method area is just a concept defined in the JVM specification for storing class information, constant pools, static variables, JIT-compiled code, and so on. Different implementations can put it in different places. The persistent generation is a Hotspot VIRTUAL machine specific concept, an implementation of the method area that no other JVM has
After some version of Java 8 and Java 7, Perm Gen was removed in favor of Metaspace.
The differences are: Perm Gen contains class metadata, class static variables, and Interned String Metaspace contains only class metadata. Class static variables and Interned String have been moved to the Java heap (so Java heap usage must be larger)
The JVM manages two main types of memory: heap and non-heap. Simply put, the heap is the memory that is accessible to Java code and is reserved for developers. Non-heap is where the JVM keeps all method areas for its own use, internal MEMORY for processing or optimization (such as jIT-compiled code caches), and code for each class structure (such as runtime data pools, field and method data), as well as methods and constructors.
2.2.1 PC Register
- Each thread has a PC register
- Created when the thread is created
- Point to the next instruction
- When executing local methods, the value of PC is undefined?
2.2.2 method area
- Holds the loaded class information: fields, method information, method bytecode
- It is usually associated with perm
2.2.3 Java heap
- Objects are stored in the heap
- All threads share the Java heap
- GC workspace
2.2.4 Java stack
-
Thread private
-
A stack consists of a series of frames (hence also called frame stack)
-
Frames hold each method’s local variator table, operand stack, constant pool pointer, and program counter
-
Each method call creates a frame and presses it
-
There is a local variable table in the frame
-
Operand stack Java has no registers and all arguments are passed using the operand stack
Allocate space on the stack
-
Small objects (tens of bytes) can be allocated directly on the stack without escape
-
It is distributed directly on the stack and can be recycled automatically, reducing GC pressure
-
Large objects or escape objects cannot allocate escape objects on the stack: an in-stack object is referenced by an external object and its scope is removed from the current method stack
public class AppMain {
// At runtime, the JVM puts appMain's information into the method area
public static void main(String[] args) {
// The main method itself goes into the method area.
Sample test1 = new Sample( "Test 1" );
Test1 is a reference, so put it in the stack. Sample is a custom object that should be put in the heap
Sample test2 = new Sample( "Test 2"); test1.printName(); test2.printName(); }}public class Sample {
// At runtime, the JVM puts appMain's information into the method area
private name;
// After the new Sample instance, the name reference is placed in the stack, and the name object is placed in the heap
public Sample(String name) {
this .name = name;
}
// The print method itself goes into the method area
public void printName(a) { System.out.println(name); }}Copy the code
3. Memory model
Each thread has one working memory and main memory. Separate working memory holds values and copies of variables in main memory
3.1 Memory model features
- Visibility: When one thread changes a variable, other threads know about it immediately
- Ways to ensure visibility:
- volatile
- Synchronized (before unlock, write variable value back to main memory)
- Final (once initialization is complete, other threads are visible)
- Orderliness Within a thread, operations are ordered and observed outside the thread, and operations are out of order (instruction reordering or synchronization delay between main and thread memory)
- Instruction rearrangement In order to improve the efficiency of the program, instructions that are adjacent to the write cannot be rearranged: read after write, write after read, write after write The compiler does not consider semantics between multiple threads
- Instruction rearrangement – Breaks order between threads
class OrderExample {
int a = 0;
boolean flag = false;
public void writer(a) {
a = 1;
flag = true;
}
public void reader(a) {
if (flag) {
int i = a +1; ... }}}Copy the code
Thread A executes writer() first, thread B executes reader() then thread B executes int I = A +1
- Instruction rearrangement – methods that ensure orderliness add the synchronized keyword to methods
- Basic principles of instruction rearrangement
- Program sequence principle: guarantee semantic serialization within a thread
- Volatile rule: Writes to volatile variables occur first before reads
- Lock rule: An unlock must occur before a subsequent lock
- Transitivity: IF A precedes B and B precedes C, then A must precede C
- A thread’s start method precedes each of its actions and/methods
- All operations of a thread precede its termination
Thread.join()
“And finally the end - Thread interruption
interrupt()
The interrupt stops immediately before the code of the interrupted thread - Object’s constructor completes before
finalize()
methods
3.2 Configuring Common JVM Parameters
- Tract trace parameters
-XX:+TraceClassLoading
: Monitors class loading
-XX:+PrintClassHistogram
: Press Ctrl+Break to print the class information - Allocation parameters for the heap
XX:+HeapDumpOnOutOfMemoryError
: Export heap to file in OOM
-XX:OnOutOfMemoryError
In OOM, execute a script
The official recommendation is that the new generation make up 3/8 of the heap
The surviving generation accounts for 1/10 of the Cenozoic - Stack allocation parameters
Xss- Usually only a few hundred kelvin
- Determines the depth of a function call
- Each thread has its own stack space
- Local variables and parameters are allocated on the stack
Iv.GC algorithm and types
4.1 GC algorithm
- Reference counting: not used in Java
- Mark cleared: old age
- Tag compression: old age
- Replication algorithms: The New generation
- Generational idea
- According to the life cycle of objects, short-lived objects are classified as Cenozoic, long-lived objects are classified as old
- According to the characteristics of different generations, appropriate collection algorithms are selected
- A small number of objects survive, suitable for replication algorithms
- A large number of objects survive and are suitable for tag cleaning or tag compression
4.2 Accessibility
- palpable
- This object is accessible from the root node
- Root :(related to method stack)
- Objects referenced in the stack
- Objects referenced by static members or constants in a method area (global objects)
- Reference objects in the JNI method stack
- Can the resurrection
- Once all references are released, the state is resurrectable, that is, unreachable
- But it is possible to resurrect the object in Finalize ()
- untouchable
- After Finalize (), it may enter the untouchable state
- Untouchable objects cannot be resurrected
- Can be recycled
public class CanReliveObj {
public static CanReliveObj obj;
public static void main(String[] args) throws InterruptedException{
obj=new CanReliveObj();
obj=null; / / can be raised
System.gc();
Thread.sleep(1000);
if(obj==null){
System.out.println("Obj is null");
}else{
System.out.println(Obj "available");
}
System.out.println("Second GC");
obj=null; // Cannot be resurrected
System.gc();
Thread.sleep(1000);
if(obj==null){
System.out.println("Obj is null");
}else{
System.out.println(Obj "available"); }}@Override
// Override the destructor method
protected void finalize(a) throws Throwable {
super.finalize();
System.out.println("CanReliveObj finalize called");
obj=this;
}
@Override
public String toString(a){
return "I am CanReliveObj"; }}Copy the code
- Avoid using finalize methods
- Object can only be called once, and careless operations can lead to errors
- The priority is low, it is uncertain when it is called, and it is uncertain when GC occurs
- You can use try-catch-finally instead
GC does not necessarily reclaim objects that cannot be found using reachability analysis. To fully reclaim an object, you need to go through at least two markup processes. First flag: For an object that has no other references, filter whether it is necessary to finalize(). If it is not necessary, it means that it can be collected directly. (Filter criteria: whether the Finalize () method has been copied or executed; Because finalize methods can only be executed once). Second flag: If the filtered decision bit is necessary to execute, it will be put into the FQueue, and a Low-priority Finalize thread will be automatically created to execute the release operation. If an object is referenced by another object before it is released, the object is removed from the FQueue
4.3 Stop – The – World
A global pause is a phenomenon in Java where all Java code stops and native code can execute but cannot interact with the JVM. This is mostly due to GC. It can also be a Dump thread, deadlock checking, heap Dump
4.4 Serial collector
- The oldest, the most stable
- High efficiency
- Long pauses may occur
- It is suitable for small applications with small amount of data and no requirement on response time
- -XX:+UseSerialGC
- The new generation and the old generation use serial recycling
- New generation replication algorithm
- Old age mark – compression
4.5 Parallel Collector
It is suitable for medium and large applications that have high requirements on throughput, multiple cpus, and no requirements on application response time. Example: background processing, scientific computing throughput := Run user code time /(run user code time +GC time)
- Concurrent Concurrent:alternateDo not colleague the ability, user programs can not pause, not necessarily in parallel, but can be executed alternately
- Parallel to the Parallel:At the same timeDo not colleague the ability for garbage collection threads to work in parallel, but the application waits to be paused
4.5.1 ParNew
-XX:+UseParNewGC
- Cenozoic parallel
- Old age serial
- Serial is a parallel version of the new generation of the collector
- Replication algorithm
- Multithreading requires multi-core support
-XX:ParallelGCThreads
Limit the number of threads
[GC 0.834: [GC 0.834: [ParNew: 13184K->1600K(14784K), 0.0092203secs] 13184K->1921K(63936K), 0.0093401secs] [Times: User sys = = 0.00 0.00, real = 0.00 secs]Copy the code
4.5.2 Parallel Collector (customizable, flexible)
- Similar ParNew
- New generation replication algorithm
- Old age mark – compression
- Focus more on throughput
-XX:+UseAdaptiveSizePolicy
Adaptive adjustment strategy is an important difference between Parallel and ParNew-XX:+UseParallelGC
- The new generation uses Parallel collector + old serial
-XX:+UseParallelOldGC
- The new generation uses Parallel collector + old generation parallelism
It’s just different in the old days
1.500: [Full GC [PSYounhttps://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2017/12/3/1601bd5a57d6924fen~tplv-t2oaga2asx-image .image: 2682K->0K(19136K)] [ParOldGen: 28035K->30437K(43712K)] 30717K->30437K(62848K) [PSPermGen: [Times: user= sys=0.03, real= 0.03secs]Copy the code
- Special parameters
-XX:MaxGCPauseMills
- Maximum pause time, in milliseconds
- The GC tries to ensure that the collection time does not exceed the set value
-XX:GCTimeRatio
- The value ranges from 0 to 100
- The ratio of garbage collection time to total time
- The default is 99, which means a maximum of 1% of GC time is allowed
- These two parameters are contradictory. It is impossible to tune pause times and throughput simultaneously
4.6 CMS Concurrent Collector
It is suitable for medium and large applications with high requirements on response time, multiple cpus, and application response time. Examples: Web server/application server, telecom switching, integrated development environment
-
Features Concurrent Mark Sweep Mark-sweep algorithms (not Mark compression) The Concurrent phase reduces throughput. -xx :+UseConcMarkSweepGC for older collectors only (new generation uses ParNew/ or serial)
-
Operation process
- The initial tag root can be directly associated with the fast exclusive CPU, global pauses of the object
- Concurrent marking (along with the user thread) is the main process of marking all objects
- Re-mark re-fix mark exclusive CPU, global pause
- Concurrent cleanup (along with the user thread)
Based on the result of the tag, the object is cleared directly
-
Excellent: Keep pauses as low as possible and no global pauses are required during concurrent tagging
-
Bad:
- Overall system throughput and performance are affected
- For example, if half of the CPU is allocated to GC while the user thread is running, the system performance is reduced by half during GC
- Incomplete cleaning
- During the cleanup phase, the user thread is still running and generates new garbage that cannot be cleaned up because it is running with the user thread and cannot be cleaned up until the space is almost full
-XX:CMSInitiatingOccupancyFraction
Sets the threshold for triggering GC- Unfortunately, a concurrent mode failure occurs when there is not enough reserved memory space, and the serial collector should be used as a backup, usually with long pauses due to insufficient space
- Debris removal problem
CMS uses the mark-clear algorithm, after the clear heap memory effective object address is not continuous, there is memory fragmentation, so memory compression can be set, cleaning memory fragmentation is used by CMS in the old age for performance considerationMark-clearAlgorithms, but can still be set using mark-compression algorithms
-XX:+ UseCMSCompactAtFullCollection
After Full GC, do a collation- The collation process is exclusive and causes pauses to become longer
-XX:+CMSFullGCsBeforeCompaction
- After setting up a few Full GC’s, do a defragmentation
-XX:ParallelCMSThreads
- Set the number of threads in the CMS, generally set to approximately the number of CPU cores, the default definition is (number of cpus +3)/4, that is, at least 25%
4.7 Collecting GC Parameters
4.7.1 Memory Allocation
The parameter name | meaning | note |
---|---|---|
-Xms | Initial heap size | When the default free heap is less than 40%, the JVM increases the heap to the maximum limit of -xmx |
-Xmx | Maximum heap size | By default (the MaxHeapFreeRatio parameter can be adjusted) when free heap memory is greater than 70%, the JVM reduces the heap to the minimum limit of -xms |
-Xmn | Young generation size | Eden + 2 Survivor space. Increasing the size of the young generation will reduce the size of the old generation. Sun officially recommends setting it to 3/8 of the entire heap |
-XX:PermSize | Set the initial perm gen value | Persistent generation is an implementation of method areas |
-XX:MaxPermSize | Set the maximum number of persistent generations | |
-Xss | Stack size per thread | After JDK5.0, each thread has a stack size of 1M. The larger the stack, the fewer the threads, and the deeper the stack |
-XX:NewRatio | Ratio of young generation (including Eden and two Survivor zones) to old generation (excluding persistent generation) | -xx :NewRatio=4 Indicates that the ratio of the young generation to the old generation is 1:4, and the young generation accounts for 1/5 of the entire stack. If Xms is Xmx and Xmn is set, this parameter does not need to be set. |
-XX:SurvivorRatio | Size ratio of Eden zone to Survivor zone | If set to 8, the ratio of two Survivor zones to one Eden zone is 2:8, and one Survivor zone accounts for 1/10 of the whole young generation |
-XX:MaxTenuringThreshold | Maximum age of garbage | This parameter is valid only for serial GC |
-XX:PretenureSizeThreshold | Objects over size are allocated directly in the generation | The Parallel Avenge avenge is invalid when the generation of unit bytes is used. Another way to allocate resources directly in the older generation is to have large array objects with no external reference objects in the array. |
4.7.2 Parameters of the parallel collector
The parameter name | meaning | note |
---|---|---|
-XX:+UseParallelGC | The new generation uses Parallel collector + old serial | |
-XX:+UseParNewGC | Use parallel collectors in the new generation | |
-XX:ParallelGCThreads | Number of threads for the parallel collector | This value is best configured to equal the number of processors also applies to CMS |
-XX:+UseParallelOldGC | The new generation uses Parallel collector + old generation parallelism | |
-XX:MaxGCPauseMillis | Maximum time (maximum pause time) for each young generation garbage collection | If this time cannot be met, the JVM automatically resizes the young generation to meet this value |
-XX:+UseAdaptiveSizePolicy | Automatically selects the young zone size and the corresponding Survivor zone ratio | When this option is set, the parallel collector automatically selects the size of the young generation region and the corresponding Survivor region ratio to achieve the minimum corresponding time or collection frequency specified by the target system. This value is recommended to keep the parallel collector open when using it. |
4.7.3 CMS Concurrency Parameters
The parameter name | meaning | note |
---|---|---|
-XX:+UseConcMarkSweepGC | Use CMS memory collection | The new generation uses the parallel collector ParNew, while the old generation uses the CMS+ serial collector |
-XX:CMSFullGCsBeforeCompaction | How many times after memory compression | Because the concurrent collector does not compress or defragment the memory space, it can become “fragmented” after running for a while, making it less efficient. This value sets the number of GC runs after the memory space is compressed and collated |
-XX+UseCMSCompactAtFullCollection | Compression of the tenured generation during FULL GC | CMS does not move memory, so it is very easy to fragment and run out of memory, so memory compression is enabled at this point. Adding this parameter is a good habit. Performance may be affected, but fragmentation can be eliminated |
-XX:CMSInitiatingPermOccupancyFraction | When the permanent area occupancy reaches this percentage, CMS reclamation is started |
4.7.4 Auxiliary Information
The parameter name | meaning |
---|---|
-XX:+PrintGC | |
-XX:+PrintGCDetails | |
-XX:+PrintGCTimeStamps | |
-XX:+PrintGCApplicationStoppedTime | Prints the time the program is paused during garbage collection. Can be mixed with the above |
-XX:+PrintHeapAtGC | Prints detailed stack information before and after GC |
–Xloggc:filename | Record relevant log information in a file for analysis |
-XX:+HeapDumpOnOutOfMemoryError | |
-XX:HeapDumpPath | |
-XX:+PrintCommandLineFlags | Prints the names and values of the detailed XX parameters that have been set |
4.8 Tuning Summary
project | Response time priority | Throughput priority |
---|---|---|
The young generation | -xMN as large as possible, until close to the system’s minimum response time limit -xx :MaxGCPauseMillis, reduce the young generation GC, reduce the arrival of old age objects | As far as possible – Xmn |
Young generation garbage collector | Concurrent collector | Parallel collector |
Old generation | If the heap is set too small, it can cause memory fragmentation, high recycle rates, and application pauses instead of traditional token cleanup. If the heap is large, it takes longer to collect | |
Refer to the young generation and the old generation garbage collection time and frequency | -xx :NewRatio The value of the aged generation is set to a smaller value. In this way, most short-term objects can be reclaimed and the number of mid-term objects can be reduced. The aged generation stores long-term objects | |
Old Generation garbage collector | The tenured generation uses concurrent collectors | Because there is no response time requirement, garbage collection can be done in parallel or serial |
A typical configuration
- Throughput first parallel collector The parallel collector is mainly to achieve a certain throughput as the goal, suitable for science and technology and background processing young generation are using the parallel collector, the old generation does not require the young generation to use parallel collection, and the old generation still use serial collection
-Xmx3800m -Xms3800m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20
Copy the code
Aged generation parallelism
-Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC
Copy the code
Set the maximum time for each young generation garbage collection. If this time cannot be met, the JVM automatically resizes the young generation to meet this value
-Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:MaxGCPauseMillis=100
Copy the code
- Response-time-first concurrent collectors The concurrent collectors are designed to ensure system response time and reduce pauses during garbage collection. Applicable to application servers and telecommunications
-Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:ParallelGCThreads=20 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Copy the code
-xx :+UseConcMarkSweepGC: enables concurrent collection for the old user. -xx :+UseParNewGC: enables concurrent collection for the young user
-Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5
-XX:+UseCMSCompactAtFullCollection
Copy the code
– XX: CMSFullGCsBeforeCompaction: as the concurrent collector wrong memory space is compressed, sorting, so run after a period of time will produce “fragments”, results in lower operation efficiency. This value sets the number of GC runs after which the memory space is compressed and collated. – XX: + UseCMSCompactAtFullCollection: open to the compression of old generation. Performance may be affected, but fragmentation can be eliminated
4.9 the GC log
5.617: [GC 5.617: [ParNew: 43296K->7006K(47808K), secs] 44992K->8702K(252608K), secs] [Times: User sys = = 0.03 0.00, real = 0.02 secs]Copy the code
explain
5.617 (timestamp) : [GC (Young GC) 5.617 (timestamp) : [ParNew (using ParNew as garbage collector for the Young generation) : 43296K (size before young generation garbage collection) ->7006K (size after young generation garbage collection) (47808K) (total size of young generation) Secs] 44992K ->8702K, 0.0137904 secs] [Times: User =0.03 (Young GC user) sys=0.00 (Young GC system), real= 0.02secs (Young GC system)]Copy the code
[GC [DefNew: 3468K->150K(9216K), 0.0028638 secs][Tenured: 1562K->1712K(10240K), 0.0084220 secs] 3468K->1712K(19456K), [Perm: 377K->377K(12288K)], 0.0113816 secs] [Times: User sys = = 0.02 0.00, real = 0.01 secs]Copy the code
Tenured: Persistent generation/old age Serial collector: DefNew: used -xx :+UseSerialGC (New generation, old generation used serial collector) Parallel collector: ParNew: either use -xx :+UseParNewGC (the new generation uses the parallel collector, the old generation uses the serial collector) or -xx :+UseConcMarkSweepGC(the new generation uses the parallel collector, the old generation uses CMS). -xx :+UseParallelOldGC; -xx :+UseParallelGC; -xx :+UseParallelGC; Is to use -xx :+UseG1GC (G1 collector)
4.10 GC trigger conditions
The trigger condition is that the region of a GC algorithm is full, or the prediction is about to be full (for example, the region usage ratio reaches a certain ratio – parallelism/concurrency, or not enough promotion).
4.10.1 GC classification
For the HotSpot VM implementation, there are actually only two types of GC that are accurate:
- Partial GC: Mode that does not collect the entire GC heap
Young GC: Only Young Gen’s GC is collected
Old GC: Only Old Gen GC is collected. Only concurrent collection from CMS is in this mode
Mixed GC: Collects GC for all young Gen and part of old Gen. onlyG1There’s this pattern - The Full GC collects the entire heap, including young Gen, Old Gen, PERM Gen (if any), and so on. Collection is a whole collection, it doesn’t matter whether old or young is collected first. Marking is done as a whole, then compaction happens when old Gen comes and then young Gen comes
Serial GC for 4.10.2 HotSpot VM
The Major GC is usually equivalent to the full GC, collecting the entire GC heap. The simplest generational GC strategy, according to HotSpot VM’s implementation of serial GC, triggers the following:
- Young GC: Triggered when Young Gen is full. Note that some of the surviving objects in young GC will be promoted to Old Gen, so the charge of old Gen will usually increase after young GC.
- full GC
- When preparing to trigger a Young GC, if the statistics show that the average promotion size of the previous young GC is larger than the current amount of space left in old Gen, the young GC will not trigger but will trigger full GC (because in the HotSpot VM GC, Other than concurrent collections from CMS, GCS that collect old Gen collect the entire GC heap at the same time, including young Gen, so there is no need to trigger a separate Young GC beforehand.
- If there is a Perm Gen, the full GC is also triggered when the PerM Gen allocates space but runs out of space.
- System.gc(), heap dump with GC, and full GC by default.
4.10.3 HotSpot VM Parallel GC(Parallel GC)
Triggering conditions are more complex, but the general principle is the same as serial GC. Exception: The Parallel Avenge (-xx :+UseParallelGC) framework is the default to execute a Young GC before triggering the Full GC and to allow the application to run a little bit in between. In order to reduce the pause time of Full GC (because Young GC cleans up dead young Gen objects as much as possible, reducing full GC’s workload). The VM parameter that controls this behavior is -xx :+ScavengeBeforeFullGC
4.10.4 HotSpot VM Concurrent GC
The trigger conditions for concurrent GC are not quite the same. Take THE CMS GC as an example, it mainly checks the usage of old Gen periodically. When the usage exceeds the trigger ratio, the CMS GC will start a time to collect the old Gen concurrently
- XX: CMSInitiatingOccupancyFraction = 80 / / old 80% collectionCopy the code
Or during the GC process, due to a concurrent mode failure caused by insufficient reserved memory, the serial Old is temporarily used for Full GC
4.10.5 HotSpot VM G1 Collection
Initial marking of G1 GC is triggered when the Heap usage ratio exceeds a certain value and is collected according to the priority of the reclaimed value, not according to the Young old zone
G1 GC: Young GC + Mixed GC + Full GC for G1 GC
Class loaders
5.1 Class load verification process
5.1.1 load
Conversion to the method area data structure generates the corresponding Java.lang. Class object in the Java heap
- ClassLoader ClassLoader
- ClassLoader is an abstract class
- An instance of the ClassLoader will read Java bytecode to load the class into the JVM
- Classloaders can be customized for different bytecode stream retrieval methods (such as networks)
Tomcat and OSGi have been changed
Example: the classes loaded down on A. is added in the project directory ava, compiled automatically generated A.c load lass, and specify the root directory path, – Xbootclasspath/a: path, put a name to A.c lass at the specified root loading class files in the directory
Note: This is the JDK’s default class loading mode, but Tomcat and OSGi have their own version of tomcat: Tomcat’s WebappClassLoader will load its own classes first, and then delegate the parent OSGi ClassLoader to load classes as needed
5.1.2 link
- Verification purpose: To ensure that the Class stream is formatted correctly
- File format validation
- Does it start with 0xCAFEBABE
- Reasonable version: What version of the JDK is the class file compiled from, and is it compatible with the JDK that executes the class
- Metadata validation (Basic information validation)
- If there is a parent class: Check if there is a parent class specified in class
- Inheriting the Final class?
- Non-abstract classes implement all abstract methods
- Bytecode validation (complex)
- Run to check
- Stack data types match opcode data parameters
- The jump instruction specifies a reasonable location
- Symbolic reference verification
- The presence or absence of a description class in the constant pool: the referenced class must exist
- Whether the access method or field exists and has sufficient permissions: private…
- File format validation
- To prepare
- Allocate memoryAnd set the initial value for the class (in the method area)
- public static int v=1;
- In the preparation phase, v is set to 0
- It is set to 1 in initialization
- For the staticfinalType, which is assigned the correct value in the preparation phase – before initialization
- public static final int v=1;
- Allocate memoryAnd set the initial value for the class (in the method area)
- Parse symbol references are replaced with direct references: that is, class names are applied and replaced directly with memory address Pointers
5.1.3 initialization
- Execute the class constructor
- Static variable assignment statements: Note that static final is already assigned in the preparation phase
- The static {} statement
- A subclass is guaranteed to be called before its parent class is called
- Is thread-safe, that is, single thread execution
Vi. Performance analysis
6.1 Java provides performance analysis tools
Enter the command directly on the console and use the -help command
6.1.1 the JPS
JPS is the first step to list Java processes, similar to the ps command parameter -q, which can specify that JPS outputs only process IDS. The short name argument of the non-output class -m can be used to output the arguments passed to the Java process (main function) -l can be used to output the full path argument of the main function -v can display the arguments passed to the JVM
6.1.2 jinfo
To check the process parameters can be used to check the running Java application expansion parameters, even support at run time, modify some parameters – flag process ID: print to specify the JVM parameter values – flag [+ | -] process ID: setting specifies the JVM parameter of Boolean value – flag = process ID: Sets the value of the specified JVM parameter
Also 6.1.3 jmap
Generate heap snapshots and object statistics for the Java application
num #instances #bytes class name---------------------------------------------- 1: 370469 32727816 [C 2: 223476 26486384 <constMethodKlass> 3: 260199 20815920 Java. Lang. Reflect. Method... 8067:1 8 sun. Reflect. GeneratedMethodAccessor35 Total 4431459, 255496024Copy the code
6.1.4 jstack
Print thread dump -l print lock information -m Print Java and native frame information -f force dump, when jstack does not respond using Jdk1.6 only -L option
6.1.5 JConsole
Graphical monitoring tools give you a running overview of your Java application, monitoring heap information, persistent usage, class loading, and more
6.1.6 Visual VM
Visual VM is a powerful all-in-one Visual tool for fault diagnosis and performance monitoring
6.1.7 MAT
6.2 Java heap Analysis
- Memory overflow OOM cause
Jvm memory ranges: heap, permanent, thread stack, and direct memory
Heap + thread stack + direct memory <= space available for the operating system
- Heap overflow takes up a lot of heap space and overflows directly
public static void main(String args[]){
ArrayList<byte[]> list=new ArrayList<byte[] > ();for(int i=0; i<1024; i++){ list.add(new byte[1024*1024]); }}Copy the code
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at geym.jvm.ch8.oom.SimpleHeapOOM.main(SimpleHeapOOM.java:14)
Copy the code
Solution: increase heap space, timely release memory, batch processing
- Permanent zone overflow
// Generate a large number of classes
public static void main(String[] args) {
for(int i=0; i<100000; i++){ CglibBean bean =new CglibBean("geym.jvm.ch3.perm.bean"+i,newHashMap()); }}Copy the code
Under Caused by: Java. Lang. OutOfMemoryError: 【 PermGen space] [Full GC [Tenured: 2523K->2523K(10944K), secs] 2523K->2523K(15936K), [Perm: [4095K->4095K(4096K)], secs] [Times: User =0.02 sys=0.00, real=0.01 secs] Heap def new generation total 4992K, used 89K [0x28280000, 0x287e0000, 0x2d7d0000) eden space 4480K, 2% used [0x28280000, 0x282966d0, 0x286e0000) from space 512K, 0% used [0x286e0000, 0x286e0000, 0x28760000) to space 512K, 0% used [0x28760000, 0x28760000, 0x287e0000) tenured generation total 10944K, used 2523K [0x2d7d0000, 0x2e280000, 0x38280000) the space 10944K, 23% used [0x2d7d0000, 0x2da46cf0, 0x2da46e00, 0x2e280000) compacting perm gen total 4096K, used 4095K [0x38280000, 0x38680000, 0x38680000) the space 4096K, 【99%】 used [0x38280000, 0x3867FFF0, 0x38680000, 0x38680000) ro space 10240K, 44% used [0x38680000, 0x38AF73f0, 0x38af7400, 0x39080000) rw space 12288K, 52% used [0x39080000, 0x396cdd28, 0x396cde00, 0x39c80000)Copy the code
Solution: Avoid dynamically generating classes, increase the Perm section, and allow class recycling
- Java stack overflow
-Xmx1g -Xss1m
public static class SleepThread implements Runnable{
public void run(a){
try {
Thread.sleep(10000000);
} catch(InterruptedException e) { e.printStackTrace(); }}}public static void main(String args[]){
for(int i=0; i<1000; i++){new Thread(new SleepThread(),"Thread"+i).start();
System.out.println("Thread"+i+" created"); }}Copy the code
Exception in thread "main" java.lang.OutOfMemoryError:
unable to create new native thread
Copy the code
When a thread is created, it needs to allocate stack space to the thread. This stack space is requested from the operating system. If the operating system cannot provide enough stack space, it will throw OOM eg: 1 gb heap space, 1 MB stack space for each thread
Note: heap + thread stack + direct memory <= operating system available space
- Direct memory overflow
ByteBuffer.allocateDirect()
: Requests direct memory outside the heap
Direct memory can also be reclaimed by GC
-Xmx1g -XX:+PrintGCDetails
// It throws oom, but has enough heap memory
for(int i=0; i<1024; i++){ ByteBuffer.allocateDirect(1024*1024);
System.out.println(i);
System.gc();
}
Copy the code
Seven locks.
7.1 Thread Security
public static List<Integer> numberList =new ArrayList<Integer>();
public static class AddToList implements Runnable{
int startnum=0;
public AddToList(int startnumber){
startnum=startnumber;
}
@Override
public void run(a) {
int count=0;
while(count<1000000){
numberList.add(startnum);
startnum+=2; count++; }}}public static void main(String[] args) throws InterruptedException {
Thread t1=new Thread(new AddToList(0));
Thread t2=new Thread(new AddToList(1));
t1.start();
t2.start();
while(t1.isAlive() || t2.isAlive()){
Thread.sleep(1);
}
System.out.println(numberList.size());
}
Copy the code
Exception in thread "Thread-0" java.lang.ArrayIndexOutOfBoundsException: 73
at java.util.ArrayList.add(Unknown Source)
at simpleTest.TestSome$AddToList.run(TestSome.java:27)
at java.lang.Thread.run(Unknown Source)
1000005
Copy the code
An ArrayList is not a thread-safe collection object. When two threads are adding elements, and the array is filling up and expanding automatically, another thread is still adding elements, and the underlying ArrayList is an immutable array, the following out-of-bounds exception is thrown
7.2 Object Header Mark
In the HotSpot virtual machine, the layout of objects stored in memory can be divided into three areas: object headers, Instance Data, and Padding.
7.3 biased locking
Competition with the lock, the lock can be biased locking to upgrade to the lightweight lock, and then upgrade the heavyweight lock (lock upgrade is one-way, that is to say, can only upgrade from low to high, there will be no lock downgrade) in most cases the lock does not exist a multithreaded competition not only, and always by the same thread for many times, in order to get the thread lock cost lower and biased locking is introduced. Biased locking only works on a single thread. Bias lock has a ThreadId field in the object header of the lock object. If this field is empty, when the lock is acquired for the first time, the ThreadId field of the lock is written into the ThreadId field. The state position of bias lock is 1. Directly check whether ThreadId is the same as its ThreadId. If so, the current thread is considered to have acquired the lock, so it does not need to acquire the lock again, skipping the locking phase of lightweight and heavyweight locks. Improved efficiency.
- In most cases there is no competition, so you can use bias to improve performance
- The so-called bias, is eccentric, that is, the lock will be biased in favor of the current thread that has the lock
- Sets the Mark of the object header to bias and writes the thread ID to the object header Mark
- As long as there is no contention, the thread that acquired the bias lock will enter the synchronized block in the future and need not do synchronization
- The bias mode ends when another thread requests the same lock, and the bias lock is revoked at the global safe point in time (at which no bytecode is being executed) and the other lock is adopted
-XX:+UseBiasedLocking
- Enabled by default
- In competitive situations, biased locking can increase the burden of the system
Open bias lock-XX:+UseBiasedLocking -XX:BiasedLockingStartupDelay=0
Bias locking will not be enabled immediately after the system starts, but will be delayed. You can set the delay time to 0
7.4 Lightweight Lock
Normal lock processing performance is not ideal, lightweight locking is a fast locking method lightweight locking is designed to improve performance when threads alternately execute synchronized blocks
-
If the object is not locked, store the Mark pointer to the object header in the lock object. Set the object header to a pointer to the lock (in thread stack space) that is, both the object and the lock hold references to each other
The JVM will create a space for storing lock records in the stack frame of the current thread and copy the Mark Word in the object header into the lock record before the lightweight lock and lock thread executes the synchronized block. The JVM will copy the Mark Word in the object header into the lock record. The thread then tries to use CAS to replace the Mark Word in the object header with a pointer to the lock record. If it succeeds, the current thread acquires the lock; if it fails, other threads compete for the lock and the current thread attempts to acquire the lock using spin.
The product of the lightweight unlock will use atomic CAS operation to replace the product Mark Word back to the object head. If successful, no competition will occur. If it fails, it indicates that the current lock is competing, and the lock expands to a heavyweight lock
-
If a thread is holding a lightweight lock, you can determine whether the pointer to the object’s header is in the thread’s stack space
-
features
- If the lightweight lock fails, it indicates that there is a competition and upgrades to the heavyweight lock (regular lock, OS lock, process lock)
- Reduce the performance cost of traditional locks using OS mutex without lock contention
- Lightweight locks do a lot of extra work when there is a lot of competition, leading to performance degradation
The Lock Record in Mark Word refers to the lock Record of the nearest thread on the stack, which is a lightweight lock in first-come-first-served mode
7.5 Spin lock Spin lock
- Minimize system level thread hangs
- When contention exists, if the thread can acquire the lock quickly, it can suspend the thread at the OS layer and wait for the lock to be acquired by doing several empty operations (spins)
- In the JDK1.6
-XX:+UseSpinning
open - In JDK1.7, remove this parameter and replace it with a built-in implementation
- If the synchronization block is too long, the spin fails, which degrades system performance – emptying thread operations and eventually hanging at the OS layer, spread-locking emptying resources
- If the synchronization block is very short, the spin succeeds, saving the time of thread suspension switching and improving system performance
When a contention occurs, if the Owner thread can release the lock for a very short period of time, the contending threads (not blocking) can wait a little (spin), and the contending threads may acquire the lock as soon as the Owner thread releases the lock, preventing the thread from blocking
7.6 Biased locks vs. Lightweight locks vs. spin locks
- Not a Java language level lock optimization approach
- Optimized methods and steps for acquiring locks built into the JVM
- Biased locking will be tried first when available
- Lightweight locks will be tried first
- If all else fails, try spin locking
- Failing again, try a normal lock (heavyweight lock), suspended at the operating system level using OS mutex
The lock | advantages | disadvantages | Applicable scenario |
---|---|---|---|
Biased locking | Locking and unlocking require no additional cost, and there is only a nanosecond difference compared to implementing asynchronous methods. | If there is lock contention between threads, there is additional lock cancellation cost. | Applicable to onlyaThread access synchronous block scenario. |
Lightweight lock | Competing threads do not block, improving the response time of the program. | If the lock is never used by the contended threadThe spinIt consumes CPU. Weightweight locks are slower when contested | Pursue response time. Synchronous block execution is very fast. |
Heavyweight lock | Thread contention does not use spin and does not consume CPU. | Threads are blocked and response time is slow. | Pursue throughput. The synchronization block execution speed is long. |
The conceptual differences between biased locks and lightweight locks are as follows: Lightweight locks: use CAS to eliminate mutex used in synchronization without contention. Biased locks: eliminate the entire synchronization without contention
Don’t even do CAS operation?
7.7 Java Language Locking optimization
7.7.1 Reducing the lock holding time
Synchronization range reduction
7.7.2 Reducing the lock Granularity
By splitting large objects into small ones, the parallelism is increased, the competitive bias lock is reduced, and the success rate of lightweight lock is improved — with large granularity and fierce competition, the probability of lightweight lock failure is high
- ConcurrentHashMap
Several segments:Segment<K,V>[] segments
Segment of the maintenanceHashEntry<K,V>
The put operation
Locate the Segment, lock a Segment, and execute put
When the lock granularity is reduced, ConcurrentHashMap allows several threads to enter simultaneously
7.7.3 lock separation
- Read-write lock ReadWriteLock
The lock type | Read lock | Write lock |
---|---|---|
Read lock | Can be accessed | inaccessible |
Write lock | inaccessible | inaccessible |
- LinkedBlockingQueue Locks can be separated as long as the operations do not affect each other
7.7.4 lock coarsening
If the same lock is repeatedly requested, synchronized, and released, it will itself consume valuable resources of the system, but not conducive to performance optimization
- Example1:
public void demoMethod(a){
synchronized(lock){
//do sth.
}
// Do other synchronization work that is not required, but can be performed quickly
synchronized(lock){
//do sth.}}Copy the code
Direct extension
public void demoMethod(a){
// consolidate into a lock request
synchronized(lock){
//do sth.
// Do other synchronization work that is not required, but can be performed quickly}}Copy the code
- Example2
for(int i=0; i<CIRCLE; i++){synchronized(lock){
}
}
/ / lock coarsening
synchronized(lock){
for(int i=0; i<CIRCLE; i++){ } }Copy the code
7.7.5 lock elimination
In the real-time compiler, if you find objects that cannot be shared, you can eliminate the lock operation on these objects. The lock is not introduced by the programmer. Some libraries in the JDK may have built-in locks on the stack, which will not be globally accessed, so there is no need to lock
- Example
public static void main(String args[]) throws InterruptedException {
long start = System.currentTimeMillis();
for (int i = 0; i < CIRCLE; i++) {
craeteStringBuffer("JVM"."Diagnosis");
}
long bufferCost = System.currentTimeMillis() - start;
System.out.println("craeteStringBuffer: " + bufferCost + " ms");
}
public static String craeteStringBuffer(String s1, String s2) {
//StringBuffer thread-safe object with built-in lock
StringBuffer sb = new StringBuffer();
sb.append(s1);
sb.append(s2);
return sb.toString();
}
Copy the code
-server -XX:+DoEscapeAnalysis -XX:+EliminateLocks
- Objects on the stack (method local variables) are not globally accessible and do not need to be locked
7.7.6 unlocked
An implementation of non-lock CAS non-blocking synchronization
CAS (V, E, N) :if V==E then V=N
Copy the code
CAS algorithm process: CAS(V,E,N). V represents the variable to be updated, E represents the expected value, and N represents the new value. V is set to N only if V is equal to E. If V and E are different, another thread has done the update, and the current thread does nothing. Finally, CAS returns the true value of the current V. The CAS operation is carried out with optimism, always assuming that it can successfully complete the operation. When multiple threads operate on a variable using CAS at the same time, only one will win and update successfully, while the rest will fail. Failing threads are not suspended, but are simply notified of their failure and allowed to try again, as well as aborting the operation. Based on this principle, CAS can detect interference from other threads to the current thread even without locking, and take appropriate action.
Java. Util. Concurrent. Atomic package use lock-free implementation, performance is higher than general have lock operation
7.8 Thread Status and Reload
When multiple threads request an object monitor at the same time, the object monitor sets several states to distinguish between the requesting threads:
- Contention List: All threads requesting locks will be placed first on the Contention queue
- Entry List: The threads in the Contention List that qualify as candidates are moved to the Entry List
- Wait Set: Threads that block calling the Wait method are placed in the Wait Set
- OnDeck: At most one thread is competing for a lock at any time, called OnDeck
- Owner: The thread that acquires the lock is called Owner
- ! Owner: the thread that releases the lock
Threads in ContetionList, EntryList, and WaitSet are all blocked by the operating system (pthread_mutex_lock under Linxu).
A blocked thread enters the kernel (Linux) scheduling state, which can cause the system to switch back and forth between user and kernel states, severely affecting lock performance - Synchronized locks each thread as it prepares to acquire a shared resource:
- Check if MarkWord has its own ThreadId in it. If so, the current thread is in a “biased lock”.
- If MarkWord is not its own ThreadId, the lock is upgraded. In this case, CAS is used to perform the switch. The new thread will suspend the previous thread based on the existing ThreadId in MarkWord, and the previous thread will empty the MarkWord content.
- Both threads copy the HashCode of the object into their newly created record space for storing locks, and then start competing with MarKword by changing the contents of the shared object’s MarKword to the address of their newly created record space via CAS.
- Those that successfully perform CAS in step 3 acquire resources, those that fail enter spin
- If a spinning thread succeeds in acquiring a resource while spinning (i.e., the thread that acquired the resource completes execution and releases the shared resource), the entire state remains in the lightweight lock state, if the spin fails
- Enter the heavyweight lock state, in which the spinning thread blocks, waiting for the previous thread to complete execution and wake itself up
Class file structure
U4: unsigned integer, 4 bytes
type | The name of the | The number of | note |
---|---|---|---|
u4 | magic | 1 | 0xCAFEBABE: Indicates the Java Class file type |
u2 | minor_version | 1 | Jdk compiled version |
u2 | major_version | 1 | Jdk compiled version |
u2 | constant_pool_count | 1 | |
cp_info | constant_pool | constant_pool_count – 1 | The chain-referenced base type – referenced everywhere – is reduced by 1 |
u2 | access_flags | 1 | Access modifier & Class type |
u2 | this_class | 1 | Class pointing to the constant pool |
u2 | super_class | 1 | Class pointing to the constant pool |
u2 | interfaces_count | 1 | |
u2 | interfaces | interfaces_count | Each interface points to the constant pool CONSTANT_Class index |
u2 | fields_count | 1 | |
field_info | fields | fields_count | access_flags,name_index ,descriptor_index ,attributes_count,attribute_info attributes[attributes_count] |
u2 | methods_count | 1 | |
method_info | methods | methods_count | |
u2 | attribute_count | 1 | |
attribute_info | attributes | attributes_count |
JVM bytecode execution
9.1 javap
Thread frame stack data:
- Program counter: One for each thread that points to the instruction field executed by the current thread
- Local variable scale
- The operand stack
9.2 JIT and related parameters
- JIT just-in-time bytecode execution performance is poor, so Hot Spot Code can be compiled into machine Code for execution. Compilation at runtime when the virtual machine finds that a method or block of Code is being run particularly frequently, This Code is identified as “Hot Spot Code,” which the VIRTUAL machine compiles to local platform-specific machine Code at run time to make it more efficient.)
- Identify hot code method call counter: method call count: method call count: method loop count: method loop count, which can be replaced directly with machine code on the stack
- Compiler Settings
-XX:CompileThreshold=1000
: Hot code is executed more than a thousand times
-XX:+PrintCompilation
: Prints code compiled to machine code
-Xint
: Explain execution
-Xcomp
: All compiles and executes
-Xmixed
: Default, mixed