Dynamic class loading in a JAVA virtual machine

The original connection: citeseerx.ist.psu.edu/viewdoc/dow…

0 in this paper,

Classloaders are powerful mechanisms for dynamically loading software components on the Java platform, and they are interesting in supporting features such as:

laziness
type-safe linkage
user-defined extensibility
multiple communicating namespaces

This article introduces the concepts of classLoaders and demonstrates some of their special uses. In addition, this article discussed how to maintain type safety in user-defined dynamic class loading.

1 introduction

In this article, we examined an important feature of the Java Virtual machine: dynamic class loading. This is the underlying mechanism that provides the power of the Java platform – the ability to install software components at run time. A typical example is an applet that is dynamically downloaded to a Web browser. While many other systems also support some form of dynamic loading and linking, the Java platform is the only one of these systems that includes all of the following features:

Lazy loading: Classes are loaded on demand and should be delayed as much as possible to reduce memory usage and improve system response times.
Type-safe Linkage: Dynamic class loading cannot violate Java VIRTUAL machine Type security. To ensure type safety, dynamic loading must not require additional runtime checking. Additional link time checks are acceptable because they are performed only once.
User-definable class Loading Policy: ClassLoader objects are also a class of Java objects, so developers have complete control over the behavior of dynamic class loading. For example, a user-defined ClassLoader can specify the remote location of a class to be loaded, or assign appropriate security attributes to classes loaded from a specific source.
Multiple Namespaces: The ClassLoader provides separate namespaces for different components. For example, the Hotjava browser loads applets from different sources into separate class loaders. These applets may contain classes with the same name, but the Java virtual machine treats these classes as different types (classloaders are different).

In contrast, existing dynamic linking mechanisms do not support all of these features. Although most operating systems support some form of dynamically linked library, such a mechanism is for C/ C ++ code and is not type-safe. Dynamic languages such as Lisp, Smalltalk, and Self implement type safety through additional run-time checking rather than link-time checking.

The main research of this article is the first in-depth description of ClassLoader, a concept introduced by the Java platform, ClassLoader has been provided since JDK 1.0, The original is currently used to dynamically load applets in the Hotjava browser. Since then, the use of ClassLoader has expanded to handle a wider range of components and scenarios. Examples include server-side components (servlets), extension mechanisms for the Java platform, and Javabean components. Despite the growing importance of ClassLoader, the underlying mechanism has not been adequately described in the relevant articles.

Another focus of this article is to provide a solution to the long-standing type safety problem with ClassLoader. Earlier versions of the JDK (1.0 and 1.1) contained a serious flaw in the ClassLoader implementation. Improperly written ClassLoaders can break type safety guarantees for Java virtual machines. Note that type safety issues do not directly pose any security risks, because untrusted code (such as downloaded applets) does not allow classloaders to be created. However, application developers who need to write custom Classloaders may inadvertently compromise type safety.

While this problem has been around for some time, there hasn’t been a very universal solution to this problem in the industry. For example, the previous discussion focused on whether the lack of type security is a fundamental limitation on how users can define classLoaders, and whether we must limit the ability of classLoaders, forgo lazy class loading, or introduce additional dynamic type checking at run time. The solution we propose in this article, already implemented in JDK 1.2, addresses type safety issues while preserving all the other required features of ClassLoader.

We assume that the reader has a basic knowledge of the Java programming language. The rest of this article is organized as follows: First, a more detailed introduction to ClassLoader. Section 3 discusses ClassLoader applications. Section 4 describes the type safety issues that may arise from using ClassLoader and their solutions. Finally, a conclusion is given.

2 on this

The main purpose of ClassLoader is to support dynamic loading of software components on the Java platform. The basic unit of software distribution is class. Classes are distributed as a platform-independent, standard binary class file format. The representation of a single class is called a class file. Class files are generated by the Java compiler and can be loaded into any Java virtual machine. Class files do not need to be stored in actual files; It can be stored in memory buffers or retrieved from network flows.

The Java virtual machine executes byte code stored in class files. However, bytecode sequences are only part of what a virtual machine needs to execute a program. Class files also contain symbolic references to fields, methods, and other class names. For example, class C is declared as follows:

class C {
	void f(a){
    	D d = new D();
        // ...}}Copy the code

The class file representing C contains symbolic references to class D. Symbolic references are resolved to the actual class type at link time. Class types are first-class objects that are externalized in the Java virtual machine. Class types are represented in user code as java.lang.class-like objects. To resolve symbolic references to classes, the Java virtual machine must load class files and create class types.

First-class: In a nutshell, this means that there are no restrictions on the use of objects. It’s just like any other object. First-class objects are entities that can be created dynamically, destroyed, passed to functions, returned as values, and have all the permissions of other variables in the programming language. Stackoverflow.com/questions/2…

Class types are represented in user code as java.lang.class-like objects. To resolve symbolic references to classes, the Java virtual machine must load class files and create class types.

Class Loading Overview

Java virtual machines use classLoaders to load class files and create class objects. Classloaders are ordinary objects that can be defined using Java code, but they must be subclasses of the abstract class ClassLoader, which looks like this (omitting other irrelevant code) :

class ClassLoader{
    public Class loadClass(String name);
    protected final Class defineClass(String name, byte[] buf, int off, int len);
    protected final Class findLoadedClass(String name);
    protected final Class findSystemClass(String name);
    // ... 
}
Copy the code

In the description above, the classLoader.loadClass method takes a class name as an argument and returns a class object that is the runtime representation of the class type. More on defineClass, findLoadedClass, and findSystemClass below. In the example above, assuming class C is loaded by ClassLoader L, L is the defining loader for C, and the Java VIRTUAL machine will use L to load the classes referenced by C. Before the virtual machine allocates objects of class D, it must resolve references to D. If D has not been loaded, the virtual machine will call the loadClass method of C’s class loader L to load D:

L.loadClass("D")
Copy the code

After D is loaded, the virtual machine can parse the references and create d-like objects.

Multiple Class Loaders

Java applications can use several different types of class loaders to manage various software components. The following figure shows how a Web browser written in Java uses the class loader.

This example demonstrates the use of two types of ClassLoaders: user-defined classLoaders and system Classloaders provided by the Java Virtual Machine. User-defined classLoaders can be used to create classes from user-defined sources. For example, a browser application creates a classloader for a downloaded applet. We use a separate class loader for the Web browser application itself, into which all system classes (such as java.lang.String) are loaded, and the Java Virtual machine directly supports the system class loader.

The arrows in the figure represent delegate relationships between classLoaders; ClassLoader L1 can delegate to another ClassLoader L2 to load class C on its behalf. In this case, L1 delegates C to L2. For example, the applet and application class ClassLoader delegates all system classes to the system class ClassLoader, so all system classes are shared between the applet and the application. This is ok, because if the applet and the system code have different concepts about the type java.lang.String, it violates type safety. Delegating ClassLoader loading allows us to maintain namespace isolation while sharing a common set of classes. In the Java virtual machine, class types are uniquely determined by a combination of class names and classloaders. Applets and application classloaders delegate to the system ClassLoader. This ensures that all system types (java.lang.String) are unique. In addition, the class named C loaded in Applet 1 is considered to be of a different type than the class named C in Applet 2, and although the two classes have the same name, they are defined by different Classloaders. In fact, the two classes are completely unrelated. For example, they might have different methods or fields.

Classes in one applet cannot interfere with classes in another applet because applets are loaded in separate Classloaders, which is critical to Java platform security. Also, because the browser is in a separate ClassLoader, the applet cannot access the classes used to implement the browser, and the applet only allows access to the standard Java apis exposed in the system classes.

The Java virtual machine starts by creating an application ClassLoader and using it to load the initial browser class. The application starts execution in the public class method void Main (String[]) of the initial class. Calls to this method drive all further execution. The execution of the directive may cause additional classes to be loaded, and in this application, the browser also creates additional Classloaders for the downloaded applet.

Garbage collector offloads applet classes that are no longer referenced, and each class object contains a reference to its ClassLoader definition; Each ClassLoader refers to all the classes it defines, which from the garbage collector’s point of view means that classes are strongly connected to their definition classLoaders and will be unloaded when the class’s definition ClassLoader is garbage collected.

This example

Here is a simple ClassLoader implementation. As mentioned earlier, all user-defined ClassLoader classes are subclasses of ClassLoader. A subclass of ClassLoader can override the loadClass method to provide a user-defined loading policy. This is a custom ClassLoader that looks up classes in a given directory:

/ * * *@author: guolei.sgl (guolei.sgl@antfin.com) 2020/12/4 7:42 下午
 * @since: * * /
public class FileClassLoader extends ClassLoader {

    private String directory;

    public FileClassLoader(String directory){
        this.directory = directory;
    }

    public synchronized Class loaderClass(String name) throws ClassNotFoundException{
        Class c = findLoadedClass(name);
        if(c ! =null) {
            return c;
        }

        try {
            c = findSystemClass(name);
        } catch (ClassNotFoundException e){
            // keep looking
        }


        try {
           byte[] data = getClassData(directory,name);
           return defineClass(name,data,0,data.length);
        } catch (IOException e){
            // keep looking
            throw newClassNotFoundException(); }}private byte[] getClassData(String directory, String name) throws IOException{
        / / to omit
        return null; }}Copy the code

The public constructor FileClassLoader() only records the directory name. In the definition of loadClass, we use the findLoadedClass method to check if the class is loaded. If findLoadedClass returns NULL, the class has not been loaded. We then delegate to the system class loader by calling findSystemClass, and if the class we are trying to load is not a system class, we call the getClassData method to read the class file.

After reading in the Class file, we pass it to the defineClass method, which constructs the runtime representation of the Class (that is, the Class object) from the Class file. Note that the loadClass method uses the synchronized keyword, which also ensures thread-safety when multiple threads load the same class.

Class initialization and Defining Loaders

When a classloader is delegated to another classloader, the classloader that initiates the load is not necessarily the classloader that completes the classload and defines it, as shown in the following code fragment:

try {
    FileClassLoader cl = new FileClassLoader("/foo/bar");
    Class stringClass = cl.loaderClass("java.lang.String");
} catch (ClassNotFoundException e) {
    // 
}
Copy the code

An instance of the FileClassLoader class delegates the loading of java.lang.String to the system classloader, so java.lang.String is loaded and defined by the system classloader, even if the loading is initiated by FileClassLoader.

Definition 2.1: Assuming C is the result of L.fineclass (), L defines C, or equivalently, L defines C.

Definition 2.2: Assuming that C is a result of L. loader (), then L initiates C and, equivalently, L initiates C.

In the Java virtual machine, each class C is permanently associated with its defining loader, which initiates the loading of any class referenced by C.

The class loading mechanism that initiates the loading of one class is not necessarily the same as that that initiates the loading of another class.

Suppose the delegate hierarchy is L-> Lp -> Lq and the class is defined in Lp, in this case:

L delegates class loading to Lp

Lp delegates class loading to Lq

Lq will not load the class, and the call will return to Lp

Lp will load this class because it is defined in Lp, and the call returns to L

Here, Lp and L are both the initiating class loaders, Lp is the defining class loader

Similarly, if the delegate level is L-> Lp and the class is defined in L

L becomes the defining and initializing class loader.

Lp is not an initialization class loader.

Simply put, a class loader is an initializer if it can return a reference to an instance of a class in a delegate chain. There may be multiple initial class loaders, but defining Class Loader has only one.

3 Class Loaders in the application

In this section, we’ll demonstrate the power of class loaders through several examples.

Reloading Classes

Software components typically need to be upgraded in a long-running application such as a server, and the upgrade must not require the application to be shut down and restarted. On the Java platform, this feature can be translated into reloading a subset of classes already loaded in a running virtual machine, which corresponds to schema evolution problems that can often be difficult to solve, some of which are as follows:

There may be some live objects that are instances of classes that we want to reload, and we need to migrate these objects to fit the new class’s schema. For example, if a new version of the class contains a different set of instance fields, we must somehow map the existing set of instance field values to the fields in the new version class.
Map static field values to a different set of static fields in the Reload version of the class.
The application might be executing a method, which might be the class we want to reload.

We do not discuss these issues in this article. But we’ll show how to bypass them using class loaders, which means that by organizing these software components in a separate class loader, developers can often avoid dealing with architectural evolution, which means that new classes are loaded by a separate loader.

Figure 3 illustrates how the Server class dynamically redirects service requests to the new version of the Server. The key technique is to load the Server class, the old Service, and the new Service into a separate classloader. For example, we can use the FileClassLoader class described in the previous section to define the Server.

public class Server {

    private Object service;

    public void updateService(String location) {
        try {
            FileClassLoader cl = new FileClassLoader(location);
            Class c = cl.loaderClass("Service");
            service = c.newInstance();
        } catch (ClassNotFoundException e){
			// ignore
        } catch (IllegalAccessException e){
			// ignore
        } catch (InstantiationException e){
			// ignore}}public void processRequest(String args) {
        try {
            Class c = service.getClass();
            Method m = c.getMethod("run",String.class);
            m.invoke(service,args);
        } catch (NoSuchMethodException e){
			// ignore
        } catch (InvocationTargetException e){
			// ignore
        } catch (IllegalAccessException e){
			// ignore}}}Copy the code

The server.processRequest method redirects all incoming requests to the Service object stored in a private field. It uses the Java core reflection API to call the “run” method on the Service object. In addition, the server.updateservice method allows a new version of the Service class to be dynamically loaded to replace an existing Service object. The updateService caller provides the location of the new class file. Further requests are redirected to the new object referenced by the Service. To reload, Server refers directly to the Service class:

public class Server {
    private Service service;
    public void updateService(String location) {
        try {
            FileClassLoader cl = new FileClassLoader(location);
            Class c = cl.loaderClass("Service");
            service = (Service) c.newInstance();
        } catch (ClassNotFoundException e){
			// ignore
        } catch (IllegalAccessException e){
			// ignore
        } catch (InstantiationException e){
			// ignore}}// ..
}
Copy the code

Once the Server class resolves a symbolic reference to a Service class, it contains a hard linke to the class type and cannot change the parsed reference. This makes it impossible for the new version of the Service returned from the classloader to change the symbolic reference. The type conversion in the last line of the server.updateservice method will fail.

Reflection allows the Server class to use the Service class without direct reference. Alternatively, the Server and Service can share a common interface or superclass:

public class Server {
    private ServiceInterface service;
    public void updateService(String location) {
        try {
            FileClassLoader cl = new FileClassLoader(location);
            Class c = cl.loaderClass("Service");
            service = (ServiceInterface) c.newInstance();
        } catch (ClassNotFoundException e){

        } catch (IllegalAccessException e){

        } catch (InstantiationException e){

        }
    }

    public void processRequest(String args) { service.run(args); }}public class Service implements ServiceInterface{

    @Override
    public void run(String args){ System.out.println(args); }}Copy the code

Dispatching through an interface is usually more efficient than reflection, and the interface type itself cannot be reloaded because the Server class can only reference a ServiceInterface type, and the getServiceClass method must return a class that implements the same ServiceInterface each time.

After we invoke the updateService method, all future requests will be handled by the new Server. However, the old Server class may not have finished processing some of the earlier requests. Therefore, the two Server classes may coexist for some time until all use of the old class is complete, all references to the old class are removed, and the old class is uninstalled.

Instrumenting Classes Files Instrumenting Classes Files

Class loaders can detect class files before making a defineClass call. For example, in the FileClassLoader example, we could insert a call to detect the contents of the class file:

try {
    byte[] data = getClassData(directory,name);
    / / testing classFile
    byte[] newdata = instrumentClassFile(data);
    return defineClass(name,newdata,0,newdata.length);
} catch (IOException e){
    // keep looking
    throw new ClassNotFoundException();
}
Copy the code

The instrumented class file must be valid according to the Java Virtual machine specification. The virtual machine applies all the normal checks (such as running bytecode validators) to the detected class files, giving the programmer wide latitude to modify the class files as long as the class file format is followed. Instrumented class files, for example, may contain new bytecode instructions in existing methods, new fields, or new methods. Existing methods can also be deleted, but the resulting class file may not be linked to other classes.

Instrumented class files must define a class with the same name as the original class file. The loadClass method should return a class object whose name matches the name passed in as an argument.

A class loader can only detect the classes it defines, and cannot delegate to classes of other loaders. All user-defined class loaders should first be delegated to the system class loader and therefore cannot be accessed through the class loader

Instrument System class. User-defined class loaders cannot get around this limitation by trying to define system classes themselves. For example, if a classloader defines its own String class, it cannot pass objects from that class to Java apis that require standard String objects. The virtual machine will catch and report these types of errors.

Class file detection can be useful in many cases. For example, a checked class file might contain performance analysis hooks that count the number of times a particular method is executed. Resource allocation can be monitored and controlled by replacing references to certain classes with references to resource-conscious versions of those classes. You can use class loaders to implement parameterized classes, extending and customizing the code in the class file for each different invocation of the parameter type.

4 Maintain type-secure links

The examples provided so far have demonstrated the usefulness of multiple delegate classloaders. However, as we will see, special care needs to be taken to ensure type-safe linking in the presence of class loaders. The Java programming language relies on name-based static typing. At compile time, each static class type corresponds to a class name. At run time, class loaders introduce multiple namespaces. A runtime class type is determined not only by its name alone, but also by a pair: its class name and the classloader that defines it. As a result, user-defined class loaders can introduce namespaces that are inconsistent with those managed by the Java compiler, compromising type safety.

Temporal Namespace Consistency

OadClass methods may return different class types for a given name at different times. To maintain type safety, the virtual machine must always be able to get the same class type for a given class name and loader. For example, consider two references to class X in the following code:

class C {
	void f(X x){...}
    ...
    void g(a){f(new x());}
}
Copy the code

If C’s classloader maps the two occurrences of X to different class types, the type-safety of method calls to F within G is compromised. The virtual machine cannot trust any user-defined loadClass method to consistently return the same type for a given name, so it maintains a loaded class cache internally. The loaded class cache maps the class name and the initial loader to the class type. When the virtual machine gets a class from the loadClass method, it does the following:

The real name of the class is checked against the name passed to the loadClass method. If loadClass returns a class without the requested name, an error is raised.
If the names match, the generated class is cached in the loaded class cache. A virtual machine will never call a loadClass method with the same name multiple times on the same classloader

Namespace Consistency among Delegating Loaders

We now describe the type safety issues that can arise from delegating a class loader

Notation 4.1: <C, L is used below^d> LⁱSuch a symbol represents a class type, where C stands for the class name, L^dRepresents class defining loader, LⁱIndicates loading Loader. If we don’t care about defining the loader, we use C LⁱThis symbol is going to represent LⁱIs the initializer loader for C. When we are not concerned with initializing the loader, we use the specified <C, L^d> to show that C is divided by L^dDefined.

If L1 delegates L2 to load C, then C:L1 = C:L2 now we’ll give an example that demonstrates type safety issues. To make it clear which class loaders are involved, we use the notation above where class names normally appear.

C is defined by L1. As a result, L1 is used to start loading Spoofed and Delegated classes that are referenced internally by C.f. L Define Spoofed. However, L1 delegates the load to L2, and L2 defines Delegated Delegated power. Since Delegated is defined by L2, L is used to start the loading of Spoofed. As it happens, L2 defines another Spoofed type. C Instances of the expectation

are returned by Delegated. However, Delegated. G actually returns an instance of

, which is a completely different class.
,>
,>

This is an inconsistency between the L1 and L2 namespaces. If such inconsistencies are not found, you can use a delegated classloader to forge one type into another. To understand how this type of security problem leads to bad behavior, please assume that the definitions of “Spoofed” of the following two versions are as follows:

Class

can now display the private field of the

instance and forge a pointer from an integer value:
,>
,>

We can access the private field secret value in the

instance, because this field is declared as a public field in

. We can also forge the integer field into an integer array in the

instance and dereference the pointer forged from that integer. The root cause of type safety problems is that virtual machines fail to consider that class types are determined by class names and definition loaders. Instead, the virtual machine relies on the Java programming language concept of using only the class name as the type during type checking. This problem has been corrected as described below.
,>
,>
,>

The solution

A straightforward solution to the type safety problem is to use class names and their definition loaders uniformly to represent class types in the Java virtual machine. However, the only way to be sure of defining a loader is to actually load the class through the initial loader. In the example in the previous section, before we can determine if C. f. ‘s call to Delegated. G is type safe, we must first load Spoofed in both L1 and L2 to see if we get the same definition loader. The disadvantage of this approach is that it sacrifices the lazy loading nature of the class.

Our solution preserves the type safety of simple methods, but avoids the need for eager class loading. The key idea is to maintain a set of loader constraints that are dynamically updated when class loading occurs. In the above example, we do not load Spoofed in L, 1 and L2, but simply record a constraint :Spoofed:L1=Spoofed:L2. If Spoofed is later loaded by L1 or L 2, we will need to verify that the existing set of loader constraints will not be violated. What if the constraint Spoofed:L1=Spoofed:L2 is introduced after L1 and L2 are loaded with Spoofed? It is too late to impose constraints and undo previous class loads.

Therefore, we must consider both the loaded class cache and the loader constraint Settings. We need to keep things the same: every entry in the loaded class cache satisfies all the constraints of the loader. The formula remains unchanged as follows:

Each time a new entry is added to the loaded class cache, we verify that no existing loader constraints are violated. If a new entry cannot be added to the loaded class cache without violating existing loader constraints, the class load fails.
Each time a new loader constraint is added, we verify that all classes loaded in the cache satisfy the new constraint. If a new loader constraint cannot be satisfied by all loaded classes, triggering the addition of a new loader constraint will fail.

Let’s see how these checks can be applied to the previous example. C.f first line of vm generation constraint Spoofed:L1=Spoofed:L2. If L1 and L2 have already loaded the Spoofed class when we generate this constraint, then an exception will immediately be raised in the program. Otherwise, the constraint will be logged successfully. If Delegated. G first loads Spoofed:L2, an exception will be raised when C.f tries to load Spoofed:L1.

Constraint rules

Now we declare the rule that generates the constraint. These correspond to situations in which a class type may be referenced by another class. When defining two such classes in different loaders, there may be inconsistencies between namespaces.

If the < C, L₁> refers to a field: T filedName, which is in <D,L₂> < p style = “max-width: 100%; clear: both; min-height: 1em₁ = T:L₂
If the < C, L₁> references a method: T0 methodName (T1,….. , Tn); This method is used in <D,L₂>, then we have a constraint: T0:L₁ = T0:L₂. , Tn:L₁ = Tn:L₂.
If the < C, L₁> overrides a method: T0 methodName (T1,….. , Tn); This method is used in <D,L₂>, then we have a constraint: T0:L₁ = T0:L₂. , Tn:L₁ = Tn:L₂.

The constraint set {T:L1= T:L2, T:L2 = T:L3} indicates that T must be loaded with the same class type in L1 and L2, L2 and L3. Even if T is never loaded by L2 during program execution, L 1 and L2 cannot load different versions of T. If the loader constraint is violated, a Java.lang.LinkageError is thrown. When the corresponding class loader is garbage collected, the loader constraint is removed from the constraint set.

Alternative solutions

Another alternative is that it is recommended that method overrides should also be based on dynamic types, rather than static (name-based) types. This approach uniformly uses the dynamic concept of type from link time onwards. The following code illustrates the differences between this model and the previously mentioned model:

class<Super.L1> {
    void f(Spoofed x){... code1....}
}
Copy the code

Assume that L1 and L2 define different versions of Spoofed. Saraswat believes that method F in Super and Sub have different type signatures: the parameter type of method super. f is

, while the parameter type of method sub. f is

. Sub.f has no way to override the super. f method under this model.
,l2>
,l1>

In the model we set up earlier, if Main is loaded by L2, linkageError is raised when f is called. This behavior is very similar to that of the alternative model: a NoSuchMethodError is generated in the alternative model.

In our model, when Main is loaded by L1, the difference in method becomes obvious, and when Main is loaded by L1, the call to F will call code2. When code 2 attempts to access any fields or methods of Spoofed, a linkageError is raised. In our opinion, it is better to fail in this case than to silently run code that should not be executed. Programmers write the Super and Sub classes above with the expectation that sub.f does overwrite super.f according to the semantics of the Java programming language. These expectations are violated in the proposal for an alternative model.

conclusion

We have introduced the concept of class loaders in the Java platform. Class loaders combine four desirable features: lazy loading, type-safe linking, multiple namespaces, and user extensibility. Type safety, in particular, requires special attention. We have shown how to keep type safety without limiting the functionality of the classloader. Class loaders are a simple and powerful mechanism that has proven to be very valuable in managing software components.