First, memory management mechanism
1.1 Runtime data area
1. Program counter
The PC register is used to store the address pointing to the next instruction and the instruction code to be executed. The execution engine reads the next instruction.
2. Java VM stack
Function: Manages the operation of Java programs, it saves local variables of methods, partial results, and participates in method calls and returns.
Q: What is stored in the stack?
- Each thread has its own stack, and the data in the stack is in the form of stack frames
- Each method being executed on this thread corresponds to a stack frame
- A stack frame is a block of memory, a data set that holds information about the execution of a method.
Q: What is the internal structure of stack frames?
The stack frame stores:
- Local variables table (defines an array of numbers used primarily to store method parameters and local variables defined in the method body)
- Operand stack (during method execution, data is written to or extracted from the stack according to bytecode instructions, i.e. pushed/unloaded)
- Dynamic linking (method references to runtime constant pools)
- Method return address
- Some additional information
3. Local method stack
The Java virtual machine stack is used to manage the invocation of Java methods, and the local method stack is used to manage the invocation of local methods.
4. The Java heap
Object allocation procedure
- The object of new is first placed in Eden, which has a limited size.
- When Eden space fills up, the program needs to create objects again. The GARBAGE collector of JVM will perform Minor GC for Eden, destroy the objects in Eden that are no longer referenced by other objects, and then load new objects into Eden space.
- Move surviving objects in Eden to Survivor 0 zone.
- If garbage collection is triggered again, objects that were in the Survivor 0 zone last time and have not been collected are moved from Survivor 0 to Survivor 1.
- If you go through a garbage collection again, it goes from 1->0 again, and so on.
- The default object is 0->1 when the number of copies is 15, it can be placed in the pension area.
- When the endowment area runs out of memory, a Major GC is triggered.
- An OOM exception will be raised if there is not enough memory after the Major GC.
Memory allocation policy
- Eden is assigned priority
- Large objects are allocated directly to the old age
- Long-lived objects are assigned to the old age
- Dynamic object age determination
- Space allocation guarantee
Note: The JVM allocates a private cache area (TLAB) for each thread, contained in Eden space, to avoid multiple threads operating on the same address. Using TLAB can avoid a number of non-thread-safe issues.
Q: Is the heap the only option for allocating objects?
Objects that have not escaped can be allocated to the stack, and stack space is removed at the end of the method execution.
5. Methods area
Concept: A method area is considered a piece of memory separate from the Java heap.
The internal structure
- The type information
- Runtime constant pool (the constant pool table is used to store various literals and symbolic references generated at compile time, which are stored in the runtime constant pool in the method area after the class is loaded)
- A static variable
- JIT code cache
- Domain information
- Methods information
The evolution details of the method area
Q: Why replace PermGen with MetaSpace?
- The entire permanent generation has a fixed size cap set by the JVM itself, which cannot be adjusted, while the metaclass uses direct memory and is limited by the native available memory. Although the metaclass may still overflow, it occurs less often than before.
- Metadata space is stored in the metadata of the class, so how much metadata loading class does not matter
MaxPermSize
Controlled by the actual available space of the system, so that more classes can be loaded. - JRockit never had such a thing as a permanent generation when merging HotSpot and JRockit code in JDK8. There is no need to set up such a permanent generation after merging.
Second, GC (garbage collection) mechanism
Q1: What is garbage?
Garbage is an object that is not pointed to by any pointer in the running program. This object is garbage that needs to be collected.
Q2: Why recycling?
If garbage objects in the memory are not cleaned up in time, the memory space occupied by these garbage objects will be reserved until the end of the application. The reserved space cannot be used by other objects, and even memory overflow may occur
Q3: Garbage collection area?
- Methods area
- Stack (Focus of work)
- Frequent collection of young generation
- Less collection of endowment generation
- Basically fixed element space
2.1 Algorithms related to garbage collection
1. Marking stage (judging object survival)
A. Reference counting algorithm
Principle: Holds an integer reference counter property for each object. Used to record when an object is referenced.
Advantages: simple implementation, garbage object easy to identify; High judgment efficiency and no delay in recovery.
Disadvantages:
- Separate fields are required to store counters, which adds to the storage overhead
- Each assignment requires updating the counter, with addition and subtraction adding time overhead.
- Unable to handle the circular reference problem
B. Accessibility analysis algorithm
Principle:
- With GC Roots as the starting point, search from top to bottom whether the target object connected by the root object set is reachable.
- After using the reachability analysis algorithm, the living objects in memory are directly or indirectly connected by the root object set, and the path searched is called the reference chain
- If the target object is not connected by any reference chain, it is unreachable, which means that the object is dead and can be marked as an object.
- In the reachability analysis algorithm, only the objects connected directly or indirectly by the root object set are the viable objects.
Q: What are GC Roots made of?
- Object referenced in the virtual machine stack
- Objects referenced by JNI (local methods) in the local method stack
- An object referenced by a class static attribute in a method area, such as a Java class reference type static variable
- An object referenced by a constant in a method area, such as a reference in a string constant pool
- All objects held by synchronized
- Internal references of the Java VIRTUAL machine
- Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, native code caches, and so on.
Addendum: In addition to these fixed COLLECTION of GC Roots, other objects can be added “temporarily” to form a complete collection of GC Roots, depending on the garbage collector selected by the user and the memory region currently reclaimed.
2. Cleanup phase (after separating memory alive objects from dead objects, the next task of the GC is to perform garbage collection to free up memory space occupied by useless objects)
A. Mark-sweep algorithm
Process:
- Tag: Collector traverses from the reference root node, marking all referenced objects.
- Cleanup: The Collector traverses the heap memory linearly from beginning to end, reclaiming objects that have not been marked reachable.
Disadvantages:
- Not very efficient
- When GC is performed, the entire application (STW) needs to be stopped, resulting in a poor user experience
- The free memory cleared in this way is discontinuous, resulting in memory fragmentation and the need to maintain a free list
B. Replication algorithm
Idea: Divide the living memory space into two pieces, use one piece at a time, copy the living objects in the used memory to the unused memory block during garbage collection, then clean up all objects in the used memory block, swap the roles of the two memory blocks, and finally complete garbage collection.
Advantages:
- No marking and cleaning process, simple implementation, efficient operation
- After copying the past to ensure the continuity of space, there will be no “fragmentation” problem
Disadvantages:
- It takes twice as much memory
- Replication means that GC needs to maintain object reference relationships between regions, resulting in high memory and time consumption
Add: The young generation is suitable for the replication algorithm because the young generation generally has fewer living objects and fewer copies.
C. Mark compression algorithm
Tag: Collector traverses from the reference root node, marking all referenced objects.
Compress: Compress all living objects to one end of memory, drain them in order, and then clean up all space outside the boundary.
Also known as mark-purge-compression algorithm.
Advantages:
- The disadvantage of scattered memory region in mark-clear algorithm is eliminated.
- Eliminates the high cost of halving memory in the replication algorithm.
Disadvantages:
- The efficiency is lower than that of the replication algorithm.
- When you move an object, you need to adjust the address of the reference if it is referenced by another object.
- The user application (STW) needs to be suspended throughout the movement.
D. Generational collection algorithm
Almost all current GCS use a generational collection algorithm to perform garbage collection.
In the HotSpot
- Young generation (replication algorithm)
- Old age (tag cleanup/tag cleanup and tag compression hybrid implementation)
E. Incremental collection algorithm
Idea: The garbage collector thread collects only a small area of memory and then switches to the application thread. Repeat until garbage collection is complete.
Disadvantages: The overall cost of garbage collection increases and system throughput decreases due to the consumption of thread switches and context transitions.
F. Partition algorithm
Idea: divide the whole heap space into continuous different cells, and each cell can be used independently and recycled independently. You can control how many cells are recycled at a time.
2.2 Memory Overflow (OOM) and Memory leakage
Q1: What causes memory overflow?
- The Heap memory setting of the Java virtual machine is insufficient
- A lot of large objects are created in the code and cannot be collected by the garbage collector for a long time (there are references)
Q2: What is a memory leak?
Strictly speaking, a memory leak is when objects are no longer used by the program, but the GC cannot reclaim them.
But in many cases, poor practices can lead to long object lifetimes and even OOM, which can be called a “memory leak” in the broad sense.
For example:
- The singleton pattern
The life of a singleton is as long as that of an application, so a singleton that holds a reference to an external object cannot be recycled, resulting in a memory leak.
2.3 References
Supplement:
Strong references allow direct access to the target object
Strong references to objects will not be recycled at any time (even in OOM)
Strong references can cause memory leaks
2.4 Garbage collector
- Serial collector: Serial collector
Features: Using replication algorithm, serial collection and STW mechanism, used for young generation garbage collection.
Serial Old is used for Old age collection, using Serial collection and STW mechanism, marking compression algorithm.
Advantages: Simple and efficient, suitable for single CPU environment.
- ParNew collector: Parallel collection
Features: The difference with Serial collector is that one is parallel, one is Serial, but also uses the copy algorithm, STW mechanism.
Old age recycling use Serial Old or CMS.
- Parallel collector: Throughput first
Features: Replication algorithm, parallel recovery and STW mechanism. The ability to control throughput sizes is an important difference from ParNew.
Parallel Old uses mark-compression algorithm, Parallel collection and STW mechanism.
- CMS collector: Low latency
Features: using mark-clear algorithm, STW mechanism, is the old collector, can not work with Parallel, the young generation can only choose ParNew or Serial collector of one of the.
- Initial tagging phase: STW, the task is simply to tag objects that GC Roots can be directly associated with, very fast.
- Concurrent marking: The process of traversing the entire object graph from the directly associated object of GC Roots. It takes a long time, but does not require the user thread to be paused. The garbage collector thread runs concurrently with the user thread.
- Relabelling: Corrects the marking record of the part of the object whose mark changed as the user program continued to operate during concurrent marking. The pause event is slightly longer than the initial mark, but much shorter than the concurrent marking phase.
- Concurrent cleanup: Clears dead objects judged by the marking phase, frees memory, and concurrency with user threads.
Q: Why doesn’t CMS change the cleanup algorithm to the cleanup compression algorithm to eliminate the effects of memory fragmentation?
In a concurrent environment, the token compression algorithm changes the reference address of the object, affecting the user thread.
CMS faults:
- Memory fragmentation is generated
- The CMS collector is very sensitive to CPU resources
- The CMS collector cannot handle floating garbage
- G1 collector: Regionalization generation type
G1 tracks the amount of value accumulated in each Region (the size of the control obtained by the collection and the experience value required for the collection), maintains a priority list in the background, and collects the Region with the highest value according to the allowed collection time each time.
The G1 uses a new partitioning algorithm with the following features:
- Parallelism and concurrency
- Generational collection
G1 work mainly includes the following three links
- Young generation GC
- The old days of concurrent marking procedures
- Hybrid recycling
Class loading mechanism
Class loading process
1. The load
- Gets the binary byte stream that defines the class by its fully qualified name. (Can come from a variety of sources)
- Converts the static storage structure represented by this byte stream to the runtime storage structure of the method area.
- A Class object representing the Class is generated in memory as an access point for the various data of the Class in the method area.
2. Verify
Ensure that the byte stream in the Class file meets the requirements of the current VM and does not compromise vm security.
Instance variables do not allocate memory at this stage; they are allocated in the heap along with the object when it is instantiated. It should be noted that instantiation is not a class loading process, class loading takes place before all instantiation operations, and class loading takes place only once, while instantiation can take place multiple times.
3. Prepare
Class variables are static variables. The preparation phase allocates memory for class variables and sets their initial values, using memory in the method area.
4. The parsing
The process of replacing a symbolic reference to a constant pool with a direct reference.
In some cases, the parsing process can begin after the initialization phase to support Java’s dynamic binding.
5. The initialization
The initialization phase actually starts executing the Java program code defined in the class. The initialization phase is when the virtual machine executes the class constructor
() method. In the preparation phase, class variables have already been assigned the initial values required by the system. In the initialization phase, class variables and other resources are initialized according to a subjective plan made by the programmer through the program.
() is generated by the compiler’s automatic collection of all class variable assignments in a class and the combination of statements in a static statement block, in the order in which the statements appear in the source file. In particular, a static block can only access class variables defined before it, and class variables defined after it can only be assigned, not accessed.
Reference:
Silicon Valley JVM complete tutorial, millions of playback, the peak of the entire network (Song Hongkang details Java virtual machine)
In-depth Understanding of the Java Virtual Machine