The Java virtual machine
Introduction to THE JVM
The JVM is an imaginary computer that can run Java code, including a set of bytecode instructions, a set of registers, a stack, a garbage collection, a heap, and a storage method field. The JVM runs on top of the operating system and has no direct interaction with the hardware.
We all know that Java source files, through the compiler, produce the corresponding.class files, known as bytecode files, which are in turn compiled into machine code for a particular machine by the interpreter in the Java virtual machine. Namely, as follows:
Java source file –> compiler –> bytecode file
Bytecode files –> JVM –> machine code
The interpreter for each platform is different, but the virtual machine implemented on each platform is the same, which is why Java can be cross-platform. When a program is started, the virtual machine is instantiated, and multiple programs are started with multiple virtual machine instances. If the program exits or is stopped, the VM instance disappears and data cannot be shared between multiple VM instances.
2. JVM memory area
- Division by function
- Partition based on whether the memory is shared
The JVM memory area is mainly divided into thread private area (program counters, virtual stack, local method area), thread shared area (JAVA heap, method area), and direct memory.
Thread-private data areas have the same life cycle as threads, and are created/destroyed depending on the start/end of the user thread (in Hotspot VM, each thread maps directly to the operating system’s local thread, so the memory of this part of the memory area follows the life/death of the local thread).
Thread shared areas are created/destroyed with the startup/shutdown of the virtual machine.
Direct memory is not part of the JVM runtime data area, but is frequently used: NIO, introduced in JDK 1.4, provides Channel and Buffer based I/O. It can use Native libraries to allocate out-of-heap memory directly, and then use DirectByteBuffer objects as references to this memory. Java I/O extensions), which avoids copying data back and forth between the Java heap and Native heap, and thus can significantly improve performance in some scenarios.
2.1 Program counter (thread private)
A small area of memory that is a line number indicator of the bytecode being executed by the current thread. Each thread has a separate program counter. This type of memory is also called thread-private memory. If the Java method is being executed, the counter records the address of the virtual machine bytecode instruction (the address of the current instruction). Null if the Native method is used. This memory region is the only one in the virtual machine that does not specify any OutOfMemoryError cases.
2.2 virtual Stack (thread private)
Is a memory model that describes the execution of Java methods. Each method creates a Stack Frame for storing information such as local variable table, operand Stack, dynamic link, method exit, etc. The process of each method from invocation to completion corresponds to the process of a stack frame being pushed into and out of the virtual machine stack.
Stack frames are data structures used to store data and partial process results. They are also used to handle Dynamic Linking, method return values, and Dispatch exceptions. Stack frames are created as the method is called and destroyed as the method terminates — method completion counts whether the method completes normally or if an exception completes (throwing an exception that was not caught within the method).
Stack frame structure diagram:
2.3. Local Method area (thread private)
The local method Stack is similar to the Java Stack, except that the VM Stack serves the execution of Java methods, while the Native method Stack serves the execution of Native methods. If a VM implementation uses the C-linkage model to support Native calls, the Stack will be a C Stack. But HotSpot VM simply blends the local method stack with the virtual machine stack.
2.4. Heap (Heap-Thread Sharing)
The runtime data area, which is an area of memory shared by threads, where objects and arrays are created and stored in Java heap memory, is the most important area of memory for garbage collection by the garbage collector.
Since modern VMS use generational collection algorithms, the Java heap can also be subdivided from a GC perspective into: new generation (Eden, SurvivorFrom, and SurvivorTo) and old age.
- The new generation
It’s used to store new objects. It takes up about a third of the heap. Because objects are created frequently, MinorGC is frequently triggered for garbage collection by the new generation. The Cenozoic era can be divided into Eden area, SurvivorFrom area and SurvivorTo area.
- **Eden: ** Birthplace of a new Java object (if the newly created object takes up a lot of memory, it is allocated directly to the old age). This is triggered when Eden space is out of memoryMinorGC (Light GC)To the Cenozoic area for a garbage collection.
- **SurvivorFrom (survivor0) : ** The survivor of the last GC as the scanned of this GC.
- **SurvivorTo: ** Retains survivors of a MinorGC process.
- MinorGC process (copy -> empty -> swap)
- ** Copy Eden, servicorFrom to ServicorTo, age +1: ** First, copy the living objects in Eden and ServivorFrom to the ServicorTo region (if any objects are old and meet the criteria of old, then assign them to the old age region). Also add the age of these objects +1 (if the ServicorTo is out of place, put it in the old section)
- ** Empty Eden and servicorFrom: ** Then empty the objects in Eden and servicorFrom
- **ServicorTo and ServicorFrom are swapped: ** Finally, ServicorTo and ServicorFrom are swapped so that the original ServicorTo becomes the ServicorFrom section for the next GC.
- The old s
It mainly stores memory objects with long lifetime in application programs.
Older objects are more stable, so MajorGC (heavy GC) is not executed frequently. Before MajorGC is generally carried out a MinorGC, so that there is a new generation of objects into the old age, resulting in space is not enough time to trigger. MajorGC is also triggered early for garbage collection to free up space when a large contiguous space cannot be found for a newly created larger object.
MajorGC uses a mark-clearing algorithm: it scans all the ages once, marks the surviving objects, and then recycles the unmarked objects. MajorGC takes a long time because it is scanned and recycled. MajorGC generates memory fragmentation, and in order to reduce memory consumption, we usually need to merge or mark it for direct allocation next time. An OOM (Out of Memory) exception is raised when the old age is too full.
- Test heap memory
Public class TestJVM {public static void main(String[] args) {// Returns the maximum memory the virtual machine is trying to use long Max = Runtime.getRuntime().maxMemory(); Long total = Runtime.geTrunTime ().totalMemory(); System.out.println(" Max "+ Max +" bytes \t"+(Max /(double)1024/1024)+"MB"); System. The out. Println (" total "+ total + byte" \ t "+ (total / 1024/1024 (double)) +" MB "); // add: VM options // -xms1024m -xmx1024m -xx :+PrintGCDetails}Copy the code
Max1873805312 bytes total126877696 bytes 121.0MBCopy the code
Add VM options as shown below:-Xms1024m -Xmx1024 -XX:+PrintGCDetails
, execute the program again:
2.5. Method area/Persistent generation (thread sharing)
Method area ** is simply a concept defined in the JVM specification to store data about classes loaded by the JVM, constants, static variables, code compiled by the just-in-time compiler, and so on. Where exactly, different implementations can be put in different places. HotSpot VM extends GC generation collection to the method area, using persistent generations of the Java heap to implement the method area so that HotSpot garbage collector can manage this part of memory as well as the Java heap. You don’t have to develop a special memory manager for the method area (the main goal of memory reclamation for permanent bands is collection and type offloading for constant pools, so the benefits are generally small).
- Constant pool
The Runtime Constant Pool is part of the method area. The Constant Pool Table is used to store various literals and symbolic references generated at compile time. This part of the Constant Table is stored in the runtime Constant Pool of the method area after the Class is loaded. The Java virtual machine has strict rules on the format of each part of a Class file (including, of course, the constant pool), and each byte must be used to store what data must conform to the specification before it is accepted, loaded, and executed by the virtual machine.
Constant pool is to avoid frequent creation and destruction of objects that affect system performance, and it realizes object sharing. The string constant pool, for example, puts all string literals into one constant pool at compile time. Most of the wrapper classes for primitive types in Java implement constant pooling techniques, namely Byte, Short, Integer, Long, Character, Boolean. By default, the first five wrapper classes create the corresponding type of cached data with a value of [-128,127], but new objects are still created beyond this range. The wrapper class Float, Double, for two floating-point types, does not implement constant pooling technology.
- The permanent generation
The permanent generation is a concept specific to the Hotspot VIRTUAL machine. It refers to the permanent area of memory where classes and Meta (Meta) information is stored. Classes are placed in the permanent area when they are loaded. This also causes the permanent generation area to swell as more classes are loaded, resulting in an OOM exception. The method area is an implementation that no other JVM has. Although the Java Virtual Machine specification describes the method area as a logical part of the Heap, it has an alias called non-heap, which is supposed to distinguish it from the Java Heap.
In Java8, the persistent generation has been removed and replaced by an area called the “metadata area” (meta space). The essence of a meta-space is similar to that of a permanent generation. The biggest difference between a meta-space and a permanent generation is that the meta-space does not exist in a VIRTUAL machine but uses local memory. Therefore, by default, the size of the meta-space is limited only by local memory. The metadata of the class is put into native memory, and the string pool and static variables of the class are put into the Java heap, so that the amount of metadata of the class can be loaded is not controlled by MaxPermSize, but by the actual available space of the system.
3. JVM class loaders
The virtual machine design team put the loading action outside the JVM implementation to let the application decide how to get the required classes, and the JVM provides three types of loaders:
- ** Bootstrap ClassLoader: ** Loads classes in the JAVA_HOME\lib directory or in the path specified by the -xbootclasspath parameter that are recognized by the virtual machine (by filename, such as rt.jar).
- The Extension ClassLoader is responsible for loading libraries in the JAVA_HOME\lib\ext directory, or in the path specified by the java.ext.dirs system variable.
- ** The Application ClassLoader is responsible for loading the class libraries on the user’s classpath. The JVM byParental delegation modelClass loading, of course we can also implement a custom ClassLoader by inheriting java.lang.ClassLoader.
4. Parent delegation mechanism
When a class receives a classload request, it first does not attempt to load the class itself. Instead, it delegates the request to the parent class. This is true of every hierarchical classloader, so all load requests should be passed to the starting classload. The subclass loader will try to load the request itself only if the parent Class loader reports that it is unable to complete the request (the desired Class is not found in its load path).
One advantage of using parental delegation is that, for example, loading the java.lang.Object class in the rt.jar package, whichever loader loads the class ends up delegating to the top bootstrap class loader, thus ensuring that different classloaders will end up with the same Object.
5. Sandbox safety mechanism
The core of the Java security model is the Java sandbox. What is a sandbox? The sandbox is an environment that restricts the execution of programs. The sandbox mechanism ensures effective code isolation by limiting Java code to a specific running scope of the VIRTUAL machine (JVM) and strictly limiting code access to local system resources. Sandboxes mainly restrict access to system resources. What do system resources include? CPU, memory, file system, network. Different levels of sandboxes can limit access to these resources differently.
All Java program runs can specify the sandbox, can customize the security policy.
In Java, executables are divided into local code and remote code, with local code being trusted by default and remote code being untrusted. For trusted native code, you have access to all local resources. For untrusted remote code, security in early Java implementations relied on a Sandbox mechanism. The JDK1.0 security model is shown below:
But such a strict security mechanism also brings obstacles to the extension of the program’s functions, for example, when users want remote code access to the local system’s files, it cannot be achieved. In subsequent Java1.1 releases, the security mechanism was improved by adding security policies that allowed users to specify code access to local resources. The JDK1.1 security model is shown below:
In Java1.2, security was improved again and code signing was added. Regardless of the local code or remote code, according to the user’s security policy Settings, the class loader will be loaded into the VIRTUAL machine with different permissions to achieve differentiated code execution permission control. JDK1.2 safety horizontal type as shown below:
The latest implementation of security mechanism introduces the concept of Domain. The virtual machine loads all the code into different system domains and application domains, where the system domain part is responsible for interacting with key resources, and the application domain part accesses various needed resources through the system domain part proxy. Different Protected domains in VMS have different permissions. Class files that exist in different domains have full permissions for the current domain, as shown in the latest security model (JDK 1.6) below:
The basic components of a sandbox:
- ** Bytecode verifier: ** Ensures that Java class files conform to Java language specifications. This helps Java programs achieve memory protection. But not all class files are bytecode checked, such as core classes.
- ** Class loaders: ** Where class loaders work on the Java sandbox in three ways
- It prevents malicious code from interfering with well-meaning code, parental delegation
- It guards the trusted library boundaries
- It classifies code into protected domains and determines what it can do
The virtual machine provides different namespaces for classes loaded by different class loaders. The namespace consists of a series of unique names, each loaded class will have a name. This namespace is maintained by the Java Virtual machine for each class loader, and they are not even visible to each other. The mechanism used by class loaders is parental delegation:
- The innermost JVM class loader starts loading, and the outer malicious class with the same name cannot be loaded and thus cannot be used
- Because the access domain is strictly differentiated by package, the outer malicious class can not get access to the inner class through the built-in code, and the destruction of the code will not take effect naturally.
-
Access Controller: The access controller controls the access permissions of the core API to the operating system, and the policy Settings of this control can be specified by the user.
-
Security Manager: ** is the main interface between the core API and the operating system. Implement permission control, higher priority than access controller.
-
** Java. security classes and extension classes that allow users to add new security features to their applications, including:
- Security provider
- The message digest
- A digital signature
- encryption
- To identify
6. GC garbage collection algorithm
6.1. How to identify garbage
- Reference counting method
In Java, references and objects are associated. If you want to manipulate objects, you must do so by reference. Therefore, it is obvious that a simple way to determine whether an object is recyclable is by reference counting. Simply put, an object that has no references associated with it, that is, none of them has a reference count of zero, which means that the object is unlikely to be used again, and therefore is a recyclable object.
- Accessibility analysis
To solve the circular reference problem of reference counting, Java uses the method of reachability analysis. Search through a series of GC Roots objects as a starting point.If there is no reachable path between GC roots and an object, the object is said to be unreachable.It is important to note that unreachable objects are not equivalent to recyclable objects, and at least two marking processes are required for unreachable objects to become recyclable. If the object is still recyclable after two marks, it will face collection.
6.2. Mark-sweep Algorithm
The most basic garbage collection algorithm is divided into two stages, annotation and cleanup. The mark phase marks all objects that need to be reclaimed, and the clear phase recycles the space occupied by the marked objects. As shown in figure:
As can be seen from the figure, the biggest problem of this algorithm is the serious memory fragmentation, and the problem that large objects cannot find available space may occur later.
6.3 Copying Algorithms
An algorithm was proposed to solve the memory fragmentation defect of Mark-Sweep algorithm. The memory is divided into two pieces of equal size based on the memory capacity. Use only one block at a time. When this block is full, copy the surviving objects to the other block to clear the used memory, as shown in the figure:
Although this algorithm is simple to implement, has high memory efficiency and is not easy to generate fragmentation, the biggest problem is that the available memory is compressed to half of the original. Also, with more surviving objects, the efficiency of Copying algorithms decreases dramatically.
6.4 Mark-Compact Algorithm
Combined with the above two algorithms, in order to avoid defects. The marking phase is the same as the Mark-sweep algorithm. Instead of cleaning up objects, the living objects are moved to one end of memory. It then clears objects outside the end boundary. As shown in figure:
6.5 Generational collection algorithm
The generational collection method is currently used by most JVMS. The core idea is to divide the memory into different domains according to the lifetime of the object. The GC heap is typically divided into Tenured/Old Generation and Young Generation. The characteristics of the old generation are that only a small number of objects need to be recycled in each garbage collection, while the characteristics of the new generation are that a large number of garbage needs to be recycled in each garbage collection, so different algorithms can be selected according to different regions.
- New generation and replication algorithms
Most JVM GCS currently adopt a Copying algorithm for the new generation because it recycles most of its objects with each garbage collection, meaning there are fewer operations that need to be replicated, but the new generation is usually not classified in a 1:1 fashion. Generally, the new generation is divided into a large Eden Space and two small Survivor Spaces (From Space, To Space). Each time Eden Space and one Survivor Space are used, when recycling, The surviving objects in the two Spaces are copied to the other Survivor space.
- Old age and tag copy algorithm
In the old days, the mark-Compact algorithm was used because only a few objects were collected at a time.
- The JAVA VIRTUAL machine mentioned Permanet Generation in the method area, which stores class classes, constants, method descriptions, and so on. Recycling of the immortal generation mainly involves discarding constants and useless classes.
- The memory allocation of objects is mainly in the Eden Space of the new generation and the From Space of Survivor Space. In a few cases, it is directly allocated to the old generation.
- When the Space of Eden Space and From Space of the new generation is insufficient, a GC will occur. After GC, the surviving objects in Eden Space and From Space will be moved To To Space. Then clean up Eden Space and From Space.
- If To Space is not sufficient To store an object, the object is stored in the old generation.
- After GC, Eden Space and To Space are used, and the cycle repeats.
- When an object escapes a GC in a Survivor zone, its age increases by +1. By defaultObjects aged 15 are moved to the old generation.
Ps: Private letter xiaobian has a surprise oh ~
Classification: [JavaSE]