Brief analysis of Java memory management mechanism

Memory management is an important problem in computer programming. Generally speaking, memory management mainly includes two parts: memory allocation and memory reclamation. Different programming languages have different memory management mechanisms. Based on the comparison of C++ and Java language memory management mechanisms, this paper analyzes the Java memory allocation and memory reclamation mechanism, including Java object initialization and memory allocation, memory reclamation methods and matters for attention…

Java vs. C++ memory management mechanisms

In C++, all objects are destroyed. Destruction of local objects occurs at the end of the object scope bounded by close curly braces, and the program should call the delete operator to reclaim the object’s memory. However, the direct manipulation of memory in C++ has a large risk of memory leakage, and the manual management of memory is complex and difficult.

In Java, memory management is completely responsible for by JVM, Java “garbage collector” is responsible for automatic recycling of memory resources occupied by useless objects, which can greatly reduce the time spent on memory management, can be more focused on business logic and specific function implementation; But that’s not to say that having a Java garbage collector program means you can forget about memory management. On the one hand, there are actually cases in Java where the garbage collector can’t reclaim memory allocated in a “special way” (which we’ll describe in more detail below). Java garbage collection, on the other hand, is not guaranteed to happen unless the JVM is running out of memory. Therefore, some object memory in Java still needs to be released manually by the program. Reasonable management of some objects can reduce memory consumption and resource consumption.

Java memory allocation

Java program execution process

Java source code files (.java suffix) are compiled into bytecode files (.class suffix) by the Java compiler, and the bytecode files of each class are loaded by the JVM class loader. After loading, the JVM execution engine executes the files (the execution process also includes compiling bytecode into machine code). The JVM execution engine first scans the class file four times to ensure that the type defined is safe when executing bytecode, then checks for empty references, data out of bounds, automatic garbage collection, and so on. The Runtime Data Area (Runtime Data Area) is used by the JVM to store Data and information needed during the execution of the program

2. Class loaders are divided into boot class loaders (which do not inherit classLoader and are part of virtual machines; Responsible for loading the Java core libraries implemented by native code, including loading all classes in jre/lib/rt.jar in JAVA_HOME); The extension classloader is responsible for finding and loading Java extension libraries in the JVM extension library directory, including jar packages in the JAVA_HOME jre/lib/ext/xx.jar or -djava.ext.dirs specified directory. The application class loader (this getSystemClassLoader () is responsible for loading the Java classpath class) in the classpath

1. The process of class loading mechanism: it includes five stages: loading, connecting (verification, preparation, parsing) and initialization

Load: Finds the load binary, obtains the binary byte stream of a class by its fully qualified name, and converts the static storage structure represented by the byte stream into the runtime data structure of the method area; A Java.lang. Class object representing this Class is generated in the Java heap as an access point to the data in the method area.

Verification: To ensure that the byte stream in the Class file contains information that meets the requirements of the current VIRTUAL machine, the following four stages of verification are completed: file format verification, metadata verification, bytecode verification, and symbol reference verification.

Preparation: The preparation phase is the phase where memory is formally allocated and initial values are set for class variables, which will be allocated in the method area

Parsing: The parsing phase is the process by which the virtual machine converts symbolic references in the constant pool into direct references

Initialization: The initialization phase is the process of executing class constructor () methods to initialize class variables and other resources according to a subjective plan specified by the program by the programmer

Modern hardware memory architecture

1. It is possible to run multiple threads at the same time on a modern computer with two or more cpus. If your Java program is multithreaded, one thread on each CPU in your Java program may execute simultaneously

2. The EXECUTION speed of the CPU on registers is slightly faster than that of the CPU cache layer, and much faster than that on main memory

3. The stack in the Java memory model is distributed in the CPU registers in the hardware memory structure, the CPU cache layer, the CPU main memory, and most of the stack is distributed in main memory

Java memory model partitioning

In general, we divide Java memory into the following areas, as shown in the figure below:

Note: the GC

Young objects are stored in the young generation, and Minor GC is used to collect memory from the young generation space (including Eden and Survivor regions). Long-lived old objects and large objects are stored directly in the old generation, using Full GC(Full GC == Major GC), which is slow to collect; The JVM maintains the age of an object to perform the memory region transfer of the object from the Eden-survivor age

2. The new generation includes one Eden zone and two survivor from and to zones (8:1:1), which are responsible for the recovery of young objects; Eden area stores a large number of newly created objects and is frequently reclaimed, so the area is large. Survivor Stores objects that survive each garbage collection

3. Member variables of an object may be stored in the heap along with the object itself

4, an Object size calculation method: a reference to 4byte+ empty Object itself occupies 8byte+ other data type occupies its own size byte(e.g. char occupies 2byte); However, since the system allocates 8 bytes, each Object must occupy a multiple of 8. For example, an empty Object should occupy 4+8=12, that is, 16 bytes

The memory allocation and reclamation described below mainly refers to the release and reclamation of heap memory occupied by objects.

Java object creation and initialization

Once a Java object is created, it has its own area of heap memory, followed by the initialization of the object. Objects are typically initialized by a constructor, a special method that returns no value, the same as the class name; If no constructor is defined in a class, a default constructor is automatically generated that takes no arguments. But if a constructor has been defined (with or without arguments), the compiler no longer automatically creates the default constructor; We can overload the constructor multiple times (that is, pass a different number or order of argument lists), or we can call another constructor from within one constructor, but only once, and we must place the constructor at the beginning or the compiler will report an error.

What about class member initialization? What’s the order? All variables in Java should be properly initialized before use, even if they are local to a method. If they are not initialized, a compilation error will occur. If a member variable is a member of a class, it will be assigned an initial value even if you do not initialize it. For example, char and int are both initialized to 0, and object references default to NULL if they are not initialized.

Class member initialization order summary: first static and then common to construct, first parent class after subclass, peer class to see the writing order


      
       1. Execute superclass static variables and static code blocks first, then subclass static variables and static code blocks
       2. Execute the parent ordinary variable and code block first, then execute the parent constructor (static method)
       3. Execute the subclass ordinary variables and code blocks first, then execute the subclass constructor (static method)
       4. Static methods are initialized prior to normal methods. Static initialization occurs only once when necessary.
       
       Note: the constructor of a subclass, regardless of whether the constructor takes arguments, by default looks first for the parameterless constructor of the parent class. If the parent class does not have an argument constructor, the subclass must call the parent class argument constructor with the supper key, or the compilation will fail.
      Copy the code

Java memory reclamation

Garbage collector (4 types of collectors) and Finalize () methods

The garbage collector in Java can help programs automatically reclaim the memory occupied by useless objects, but it is only responsible for releasing all the memory occupied by objects created in Java. The memory space allocated for objects by some way other than object creation cannot be reclaimed by the garbage collector. Also, garbage collection has its own overhead, the GC has a low priority, so if the JVM is not running out of memory, it will not waste resources on garbage collection to restore memory. Finally, we find that as long as the program is not at the point of running out of storage, the space occupied by objects is never freed. We can use system.gc () to start a garbage collector (although the JVM will not collect it right away), and finalize() will free the space allocated by other methods before releasing new.

Serial collector: a single-threaded generational collector that must suspend all other worker threads until it finishes collecting garbage. A simple and efficient

Parallel collector: The JVM default collector, which has the greatest advantage of using multiple threads to scan and compress the heap. The serial collector will stop all other worker threads stop-the-world during GC. The CPU utilization is the highest, so it is suitable for applications requiring high throughput. However, the pause time is relatively long, so it is not suitable for Web applications. This means that users will have to wait longer. The parallel collector can be understood as multi-threaded serial collection. Based on the serial collection, the multi-threaded GC is used to make up for the deficiency of serial collection and can greatly shorten the pause time. Therefore, for areas with small space (such as young generation), the parallel collector has a very short pause time and high recovery efficiency. Suitable for high frequency execution.

3. CMS collector: based on the “mark-clean” algorithm, it uses multi-threaded algorithm to scan the old generation heap (mark) and recycle (clear) the objects to be recovered. It is easy to generate a large amount of memory fragmentation so that large objects cannot be created and then have to trigger full GC in advance. The CPU usage is too high, and floating garbage is easily generated after marking, which can only be saved for the next GC

G1 collector: THE G1 collector is based on the mark-tidy algorithm, that is, it does not generate space debris. G1 is a server-side garbage collector with large memory for multiple processors. Its goal is to achieve high throughput while meeting garbage collection pause times as much as possible. It can control pauses very precisely, allowing the user to explicitly specify that no more than N milliseconds should be spent in garbage collection within a time segment of M milliseconds, which has some of the characteristics of the real-time Java (RTSJ) garbage collector. Garbage collector

The finalize() method works like this: Once the garbage collector is ready to release the storage space occupied by the object, the Finalize () method of the object will be called first and only once (through the code system.gc ()), and the memory occupied by the object will be reclaimed only when the next garbage collection action occurs. So if we override Finalize (), we can do some important cleanup at garbage collection time or save the object for once (just reassociate the object with any object in the reference chain in Finalize ()). The Finalize () method is used to free up memory space allocated ina special way because we may call non-Java code in Java to allocate memory, such as NDK in Android development. Then, when we call malloc() in C to allocate storage space, we can only use the free() function to free the memory, so we need to call it in the local method of finalize() function.

Object memory state && reference form and collection timing

Java object memory state transition diagram

How do I determine if a Java object needs to be reclaimed? GC determination method

1, reference counting, reference counting method records the number of references held by each object by other objects, is referenced once plus one, reference failure minus one; A reference counter of 0 indicates that the object is no longer available; When an object is reclaimed, the reference count of other objects referenced by the object should be reduced accordingly. It is difficult to solve the problem of cyclic references between objects

2. Reachability analysis algorithm: The path from the GC Root object down is called the reference chain. When an object is no longer connected by any GC Root object reference chain, the object is no longer available. Objects referenced by constants and static variables in the method area, objects referenced by variables in the virtual machine stack, and objects referenced by local method stacks; Circular references are addressed because the GC Root is usually a specially managed set of Pointers that are the starting point for tracing the Tracing GC. They are not objects in the object graph, and objects cannot refer to these “external” Pointers.

3. Systems using reference counting algorithms simply count all references by a counter at the beginning of each instance object’s creation. The reachability algorithm, on the other hand, needs to traverse the entire GC root node to determine whether to recycle

1. Strong references: create an object and assign it directly to a variable, eg: Person Person = new Person(” sunny “); No matter how strained the system resources are, strongly referenced objects are never recycled, even if they will never be used again. 2. SoftReference: use SoftReference, eg: SoftReference p = new SoftReference(new Person(” Rain “)). Memory is collected when it is very tight, but not at other times, so check for null before using it to see if it has been collected. 3. WeakReference: through the WeakReference class, eg: WeakReference p = new WeakReference(new Person(” Rain “)); 4. Virtual reference: Cannot be used alone. It is mainly used to track the status of an object being garbage collected. The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is garbage collected. It is implemented through the PhantomReference class and the ReferenceQueue class

Reference graph of common garbage collection algorithms

(https://yq.aliyun.com/articles/14411)

Stop-copy algorithm This is a non-background reclamation algorithm. The available memory is divided into two pieces of equal size according to the capacity. Only one piece is used at a time, resulting in a serious memory waste. It first suspends the program and then copies all live objects from the current heap to another heap. Dead objects that are not copied are all garbage. The live objects are copied to the new heap and all are packed tightly together to allocate new space. This method is space-consuming and inefficient, so it is suitable for few living objects.

The mark-sweep algorithm is also a non-background collection algorithm, which starts from the stack area and the static field, traverses each reference to find all objects that need to be reclaimed, and marks each object that needs to be reclaimed. At the end of the marking, the cleanup begins. All marked objects are released. If continuous heap space is required, the remaining inventory objects need to be cleaned up. Otherwise, a large amount of memory fragmentation will be generated

The mark-collation algorithm marks the objects that need to be reclaimed, but instead of cleaning up the recyclable objects directly, it moves the living objects to one end of the memory area, and then cleans up the memory beyond that end. Applies to multiple living objects.

Generational algorithm in Cenozoic, die each time the garbage collection will be found to have a large number of objects, and only a small amount of live, so stop copying algorithm is optional to complete the collection, and in old age because objects high survival rate, no extra space allocated to it, you must use the tag – sweep algorithm or tags – sorting algorithm for recycling.

JVM performance tuning

1. The JVM allocates large heaps (if the physical machine has enough memory) to improve server response times, but only if it can be sure that the application’s Full GC frequency is low enough because a Full GC takes a long time to pause. The key to controlling the frequency of Full GC is to ensure that the lifetime of most objects in your application is not too long, especially large objects that are produced in batches with a long lifetime, so as to ensure the stability of the old age

If NIO is used to allocate a large amount of Direct Memory, OutOfMemoryError may occur in Direct Memory. You can adjust the DirectMemory size by using the -xx :MaxDirectMemorySize parameter

3. Adjust thread stack, socket buffer, MEMORY occupied by JNI and memory consumed by VIRTUAL machine and GC

4, “-xms and -xmx (or: -xx :InitialHeapSize and -xx :MaxHeapSize)” Xms usually represents the minimum heap size. JVM can adjust the heap size dynamically at run time. If we set Xms=Xmx, we have set a fixed heap size. For example, “java-xMS128m -XMx2G MyApp” starts a Java application named “MyApp” with an initial heap of 128M and a maximum heap of 2G; When we set the -xmx inappropriate is easy to occur the maximum heap memory of memory, so that we can by setting – XX: + HeapDumpOnOutOfMemoryError let the JVM automatically at the time of onset of memory heap memory snapshot, By default, it is stored in a file named java_pid.hprof in the JVM’s startup directory, and analyzing it is a good way to locate the overflow

Linux below view Jvm performance information command

Jstat: Jstat is used to view the Jvm stack information. It can view the Eden,survivor,old,perm, etc heap capacity, utilization information. It is useful for checking whether the system has a memory leak and whether the parameters are set properly. For example, “jstat -gc 12538 5000” will display the gc status of process 12538 every 5 seconds.

Jstack: Used to view the Jvm’s current thread dumps, which can be used to view the status of threads in the Jvm. This is useful for finding blocked threads

Jmap: jmap is used to view the current HEAP dump of the Jvm, showing the number of various objects in the current Jvm, the amount of space occupied, etc. In particular, this command can export a binary heap dump bin file that can be analyzed directly with Eclipse Memory Anayliser to find potential Memory leaks.

Non-jvm command – netstat: Use this command to view the current connection status of the Linux system on various ports, such as database connections

Memory related problems

A memory leak is when allocated memory is not reclaimed, resulting in a waste of resources due to loss of control over the memory area (e.g., you have lost its address). Memory leaks generally don’t happen in Java because garbage collectors automatically collect garbage, but that’s not always the case. Memory leaks can happen in the Java heap. When we new an object and save its reference, but never use it, and the garbage collector does not collect it, there will be a memory leak

Memory overflow is when a program needs more memory than the system can allocate (including dynamic expansion)

Symbolic reference: A symbolic reference describes the referenced object as a set of symbols, which can be any literal, as long as it is used to unambiguously locate the object. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily already loaded into memory.

Direct reference: A direct reference can be a pointer to a target, a relative offset, or a handle that can be indirectly located to the target. A direct reference is related to the memory layout implemented by the VIRTUAL machine. The translation of a symbolic reference on different virtual machine instances will not be the same. If there is a direct reference, the target of the reference must already exist in memory.

Parent delegate model: Represents the top-down load order hierarchy between class loaders. The parent-child relationship between loaders is generally implemented through composition rather than inheritance. This prevents multiple copies of the same bytecode from appearing in memory and ensures load order

The working process of the parent delegate model is as follows: in the loadClass function, the first step is to determine whether the class has been loaded. If it has been loaded, the next step is parsing; otherwise, the parent delegate model is loaded. If a classloader receives a request from a classloader, instead of trying to load the class itself, it delegates the request to the parent classloader. This is true at every level of classloaders, so all load requests should eventually be passed to the top level of the starting classloader. The child class tries to load itself only if the parent class loader reports that it cannot complete the load request (it cannot find the desired class in its search scope).

Static dispatch and dynamic dispatch: Static dispatch occurs at compile time. It is used to determine the execution version of a method based on the static type (the variable type defined when a variable is declared). For example, in method overloading, the method should be executed based on the defined type of the parameter. Dynamic dispatch occurs at runtime to determine the execution version of a method, such as method rewrite, based on the actual type of the variable at instantiation time; The current Java language (JDK1.6) is a static multi-dispatch, dynamic single-dispatch language.

Dynamic dispatch implements the Java virtual machine by creating a virtual method table in the method area and improving performance by using the index of the method table instead of metadata lookup. The virtual method table stores the actual entry address of each method. If the subclass does not override the method of the parent class, the address entry in the virtual method table of the subclass is the same as that of the parent class. If you override a parent class’s method, the address of the subclass’s method table will be replaced by the address of the subclass’s implementation version. The method table is initialized during the join phase (validation, preparation, parsing) of the class load. After the initialization value of the subclass is prepared, the virtual machine initializes the virtual method table of that class.

In JDK7, we moved the String constant pool from the permanent generation to the heap and used intern methods to ensure that no object is created in the heap repeatedly. JDK7 starts using the G1 collector instead of the CMS collector. JDK8 uses a meta-space to replace the original method area and provides string de-duplication, meaning that the G1 collector can identify recurring strings in the heap and point them to the same internal char[] array rather than having multiple copies in the heap

∑ edit | Gemini

Source | candyguy242 blog

The beauty of algorithmic mathematics wechat public number welcome to write

The manuscript involves mathematics, physics, algorithm, computer, programming and other related fields.

Email: math_alg@163.com

Brief analysis of Java memory management mechanism

Java program execution process

Garbage collector (4 types of collectors) and Finalize () methods

Object memory state && reference form and collection timing

(https://yq.aliyun.com/articles/14411)

Related Posts

Spring Cloud Gateway

LeetCode (12)

Java code Generator (with dead code)