In this first technical article, LAN is going to talk about the class loading mechanism in Java. When writing Java programs, you don’t need to know what the classloading mechanism is, where it comes from, and where it’s going. This knowledge is useless, but an interview might help.

Every Java programmer at the beginning of learning Java, the first to understand, must be the soul of Java – JVM. Your JavaBug, I’m sorry, Java code must eventually be compiled into a class file by the Java compiler and delivered to the JVM for logical execution.

You must wonder, then, how Java classes interact with the JVM. Let’s take a look at some of the things you’ll forget when you learn them.

Note: The JDK versions of the technologies described in this article are JDK1.8

Tip The JVM only recognizes Class files, which means that any computer language that eventually compiles code into a Class file can be executed by the JVM. This is true of Kotlin, Groovy, JRuby, Jython, Scala, and more.

What is a Class file

1. The class file is Java codeJavacThe bytecode file generated after compilation is shown in the following figure.

2. The class file contains the magic number, JVM version number, constant pool, constant pool counter, access identifier and other information, as shown below (taken from the Java Virtual Machine specification).

  • Magic: magic number, 4 bytes. Determines whether the file is an acceptable class file for a virtual machine
  • Minor_version: : Minor version number, containing 2 bytes. The minimum supported version number.
  • Major_version: indicates the major version number, which is 2 bytes. The maximum version number is supported.
  • Constant_pool_count: a constant pool counter that records the number of members in the constant pool table. Its value is the number of members in the constant pool table +1. An index value in the constant pool table is considered valid only if it is greater than 0 and less than constant_pool_count.
  • Constant_pool: Constant pool that contains all string constants, classes or interfaces, field names, and other constants referenced in the class file structure and its substructures.
  • Access_flag: Indicates the access permission and attributes of a class or interface.
  • This_class: class index.
  • Super_class: superclass index.
  • Interfaces_count: indicates the interface counter.
  • Interfaces_count [] : indicates the interface table.
  • Fields_count: field counter.
  • Fields: indicates the field table.
  • Methods_count: method counter.
  • Methods [] : Table of methods
  • Attributes_count: indicates the attribute counter.
  • Attributes [] : Attribute table.
3. For each Java class in the JVM, a C++ class instance is created. This C++ class is called a Klass instance (corresponding to the instanceKlass class in hotspot source code). As shown below, instanceKlass fields are designed to store data in Java class files, as shown in the InstanceKlass.hpp file in hotspot source code.

Tip When creating an InstanceKlass object, the JVM allocates far more memory for it than InstanceKlass itself needs, because InstanceKlass also stores virtual tables, interface tables, and reference type tables in Java classes.

The process of class loading

1. Loading phase
  1. Get the compiled Class file of the Java class by its fully qualified name, load it into the JVM, and parse the class file.
  2. After parsing, the JVM internally creates an instanceKlass instance of the Java class template object (also a C++ class that holds the Java class’s constant pool, methods, properties, and so on).

Tips Parsing constant pool, parsing Java class fields, parsing Java class methods these JVM essence of the students, you can also follow the wechat public account: Cloud Xia Fenglan. Relevant articles will be updated on the official account. The hotspot source code that parses constant pools, fields, and methods and builds the corresponding Klass objects is presented below, all in the Parsecassfile method in ClassFileParser.cpp, for those interested in taking a look at it themselves. The following is a snippet of the main code

instanceKlassHandle ClassFileParser::parseClassFile(Symbol* name, ClassLoaderData* loader_data, Handle protection_domain, KlassHandle host_klass, GrowableArray<Handle>* cp_patches, TempNewSymbol& parsed_name, TRAPS) {// Constant pool constantPoolHandle cp = parse_constant_pool(CHECK_(nullHandle)); / /... Array<u2>* fields = parse_fields(class_name, access_flags.is_interface(), &fac, &java_fields_count, CHECK_(nullHandle)); / /... Array<Method*>* methods = parse_methods(access_flags.is_interface(), &promoted_flags, &has_final_method, &declares_default_methods, CHECK_(nullHandle)); / /... _klass = InstanceKlass::allocate_instance_klass(loader_data, vtable_size, itable_size, info.static_field_size, total_oop_map_size2, rt, access_flags, name, super_klass(), ! host_klass.is_null(), CHECK_(nullHandle)); }Copy the code
  1. Generate a mirror class from instanceKlass and place it in the heap, instanceMirrorKlass instance (corresponding to instanceMirrorKlass class in hotspot source). InstanceKlass is for internal use by JVM. LAN Shu believes that the reason for generating an extra instanceMirrorKlass is that the class cannot be directly exposed to external use in consideration of operation safety factors, so he created a mirror class instance to provide external program invocation.

Tip:

  1. Static variables in Java classes will be stored in instanceMirrorKlass class. InstanceMirrorKlass class defines a static field offset property more than instanceKlass class, which can be used to obtain static variables.

2. An instance of the Java array class generated in the runtime data area is the method area: ArrayKlass. CPP and ObjArrayKlass refer to typearrayklass. CPP and objArrayKlass. CPP respectivelyWhen is the class loader loadedJavaMain is a LoadMainClass method that is called in javaMain. All your confusion is in LoadMainClass. Its execution logic is by starting class loader to load the class sun. The launcher. LauncherHelper, perform the class methods checkAndLoadMain, load the main function of class, start the extension class loader, application class loader is also done at this time.

2. Verify

Validation basically checks constraints defined by the Java virtual machine and throws an exception if it fails.

  1. Static constraints:

  2. Structural constraints

    The verification of the two constraints can write an article alone, here will not be described, there is a deep understanding of the author can give a message or pay attention to the public number: cloud xia Feng LAN.

3. Prepare
  1. Create static fields for the class or interface and initialize them with default values. This phase does not execute any virtual machine bytecode instructions.
The data type The default value
byte (byte)0
shot (shot)0
long 0L
char ‘\u0000’
int 0
float 0.0 f
double 0.0 d
boolean 0(false)

Final modifies the class’s variables, which are no longer changing, so the assignment is done in preparation, so there is no initial assignment.

4. The parsing
  1. As we have seen in various articles, the main process of parsing is to replace symbolic references with direct references. So what is a symbolic reference? The following figure, using the Jclasslib plug-in, shows the class file information generated by compiling a Java class file. We can see after the bytecode instruction in the methodSymbols like #1 are what we call symbolic references.

This symbolic reference is actually the index in the constant pool (for example, #1 points to the first class or method in the constant pool) as shown below

The JVM replaces these index symbols with direct memory addresses in the preparation phase. To be invoked by subsequent JVM instructions.

  1. Which virtual machine instructions need symbolic reference parsing?

Anewarray, checkcast, getField, getstatic, Instanceof, Nvokedynamic, InvokeInterface, Invokespecial, Invokestatic, Invokevirtual, LDC, LDC_W, multianewarray, new, putField, putStatic, executing any of the above instructions requires parsing its symbolic reference.

If an error occurs during the process of a symbolic reference resolution, the symbol should be used reference program throw IncompatibleClassChangeError or its subclasses

5. The initialization
  1. Initialization is the execution of a class’s static code block and the assignment of a static variable. As we can see in the following figure, if we have static variables in our code and assign static variables, the generated bytecode file will have the Clinit method. Clinit is the instruction to perform static variable assignment, andThe order of statements in a method is related to the order in which the code is written

Since variables can be assigned directly during initialization, can we skip the preparation phase and assign directly during initialization? Since the preparation phase is mainly assigning initial values, we can just ask for the values we wrote, not the initial values. Of course not, and here’s why

The initialization phase mainly relies on the instructions generated by the Clinit method for assignment, but if we define an empty static variable, the Clinit method will not generate the assignment code associated with that static variable. As shown in the figure below, the static variable needs to be initialized and assigned its initial value in preparation, otherwise it will be discarded.

After initialization, the JVM’s execution engine does the finger-pointing. The execution engine is a bit too complex, but I’ll look at it later.

Conclusion:

  1. What the JVM can execute is a Class file, and any computer language that eventually generates a Class file can be handed over to the JVM to execute. This is true of Kotlin, Groovy, JRuby, Jython, Scala, and more.
  2. Since the JVM is written in C/C++, each Java class loaded into the JVM generates a corresponding C++ class, instanceKlass, which is stored in the method area (meta-space). An instanceKlass instance object, instanceMirrorKlass, is generated and placed in the heap.
  3. The JVM class loading mechanism is divided into five stages: loading, validation, preparation, parsing, and initialization.
  4. Loading stage:
    • Get the class file that stores the class by its fully qualified name and parse it
    • The corresponding C++ template class, instanceKlass instance, is parsed and stored in the meta space for internal use within the JVM
    • A Class object instance of this Class, instanceMirrorKlass, is generated in the heap for other systems or programs to call.
  5. The verification stage mainly includes static constraint verification and structural constraint verification.
  6. The preparation phase is mainly the operation of assigning initial values to static variables
  7. The parsing phase is to replace symbolic references (constant pool indexes) with direct references (method memory addresses)
  8. Initialization is the execution of a static block of code for the class and the completion of the assignment of a static variable. The assignment instructions are generated in the Clinit method, and only when static variables are assigned in Java code are generated in Clinit.

Extension problem:

  • Are there thread safety issues during class initialization?
  • Does the class load phase use synchronized locking?
  • What are the reasons for later-biased locking during class loading?
  • If you implement a class loader, can you not implement it according to the class loader mechanism?

These questions will be answered in a follow-up article.

Ask for attention, like, comment ~~~~~~

Interested in in-depth understanding can pay attention to the public number: cloud fenglan

Ask how much worry you can have, just like full screen Bug+ demand.