Summary: Java executes very efficiently, about half as efficiently as the fastest C language. This is second only to C, Rust, and C++ among the major programming languages. But behind the high performance, Java also had an impressively poor startup performance, which is the source of most of the impression that Java is clunky and slow. High performance and fast boot speed may seem a little contradictory, but this article will explore whether you can have both.

The author | | ali Liang Xi source technology public number

A high performance and fast start speed, can you have both?

As an object-oriented programming language, Java is unique in its performance excellence.

Energy Efficiency Across Programming Languages, How Does Energy, Time, and Memory Relate? This report examines the implementation effectiveness of various programming languages, albeit in a limited range of scenarios.

As you can see from the table, Java is performing very efficiently, about half as efficiently as the fastest C language. This is second only to C, Rust, and C++ among the major programming languages.

Java’s excellent performance is due to the very good JIT compiler in Hotspot. The Server Compiler(C2) Compiler for Java is the work of Dr. Cliff Click and uses the Sea-of-Nodes model. And this technology, over time, represents the most advanced in the industry:

  • The well-known TurboFan V8 (JavaScript engine) compiler uses the same design, but implements it in a more modern way;
  • When Hotspot uses Graal JVMCI for JIT, its performance is almost the same as C2.
  • Azul’s commercial product replaced the C2 compiler in Hotspot with LLVM, and the peak performance was the same as C2.

Behind the high performance, Java’s poor startup performance is also impressive, which is the source of most of the impression that Java is clunky and slow. High performance and fast boot speed may seem a little contradictory, but this article will explore whether you can have both.

The root cause of slow Java startup

1. Complex framework

JakartaEE is the new name of Oracle after it donated J2EE to the Eclipse Foundation. When Java was launched in 1999, the J2EE specification was released, and EJB(Java Enterprise Beans) defined the security, IoC, AOP, transaction, concurrency, and other capabilities required for enterprise-level development. The design is extremely complex, and even the most basic application requires a large number of configuration files, making it very inconvenient to use.

With the rise of the Internet, EJBs were gradually replaced by the more lightweight and free Spring framework, which became the de facto standard for Java enterprise development. Spring, while being more lightweight, is still heavily influenced by JakartaEE in its bones, such as the heavy USE of XML configuration in earlier versions, the heavy use of JakartaEE related annotations (such as JSR 330 dependency injection), And the use of specifications such as the JSR 340 Servlet API.

While Spring is still an enterprise-class framework, let’s look at some of the design philosophies of the Spring framework:

  • By providing options at each level, Spring lets you defer selection as long as possible.
  • Adapting to different perspectives, Spring is flexible; it doesn’t force you to decide what to do. It supports a wide range of application requirements from different perspectives.
  • Maintain strong backward compatibility.

Under the influence of this design philosophy, there must be a lot of configurable and initialization logic, as well as complex design patterns to support this flexibility. Here’s an experiment:

Let’s run a spring-boot-web helloWord and see the dependent class file by -verbose:class:

$Java - verbose: class - jar myapp - 1.0 - the SNAPSHOT. Jar | grep spring | head - n 5 [the Loaded . Org. Springframework. The boot loader. The Launcher from file: / Users/yulei/TMP/myapp - 1.0 - the SNAPSHOT. Jar] [the Loaded . Org. Springframework. The boot loader. ExecutableArchiveLauncher the from file: / Users/yulei/TMP/myapp - 1.0 - the SNAPSHOT. Jar] [the Loaded . Org. Springframework. The boot loader. JarLauncher the from file: / Users/yulei/TMP/myapp - 1.0 - the SNAPSHOT. Jar] [the Loaded . Org. Springframework. The boot loader. The archive. The archive the from file: / Users/yulei/TMP/myapp - 1.0 - the SNAPSHOT. Jar] [the Loaded . Org. Springframework. The boot loader. LaunchedURLClassLoader the from file: / Users/yulei/TMP/myapp - 1.0 - the SNAPSHOT. $Java jar] Verbose: class - jar myapp - 1.0 - the SNAPSHOT. Jar | egrep '^ \ [the Loaded' > $wc classes classes, 7404, 29638, 1175552 classesCopy the code

The number of classes reached a staggering 7,404.

In contrast to the JavaScript ecology, write a basic application using the common Express:

const express = require('express') const app = express() app.get('/', (req, res) => { res.send('Hello World! ') }) app.listen(3000, () => { console.log(`Example app listening at http://localhost:${port}`) })Copy the code

We borrow Node’s debug environment variable analysis:

NODE_DEBUG=module node app.js 2>&1 | head -n 5 MODULE 18614: looking for "/Users/yulei/tmp/myapp/app.js" in ["/Users/yulei/node_modules ", "/ Users/yulei. Node_libraries", "/ usr/local/Cellar/node / 14.4.0 / lib/node"] MODULE 18614: load "/Users/yulei/tmp/myapp/app.js" for module "." MODULE 18614: Module._load REQUEST express parent: . MODULE 18614: looking for "express" in ["/Users/yulei/tmp/myapp/node_modules","/Users/yulei/tmp/node_modules","/Users/yulei/node_modules","/Users/node_modules" , "/ node_modules", "/ Users/yulei/node_modules", "/ Users/yulei. Node_libraries", "/ usr/local/Cellar/node / 14.4.0 / lib/node"] MODULE 18614: load "/Users/yulei/tmp/myapp/node_modules/express/index.js" for module "/Users/yulei/tmp/myapp/node_modules/express/index.js" $ NODE_DEBUG=module node app.js 2>&1 | grep ': load "' > js $ wc js 55 392 8192 jsCopy the code

This relies on a paltry 55 JS files.

Although it’s not fair to compare Spring-Boot to Express. In the Java world, it’s possible to build applications based on lighter frameworks like Vert.X and Netty, but in practice, people almost automatically choose Spring-Boot to enjoy the benefits of the Java open source ecosystem.

Compile once, run everywhere

Is Java slow to start because the framework is complex? The answer is that the complexity of the framework is one of the reasons for the slow start. By combining GraalVM’s Native Image feature with the Spring-Native feature, the spring-boot app startup time can be shortened by about 10 times.

Java’s Slogan is “Write once, Run Anywhere “(WORA), and Java does this through bytecode and virtual machine technology.

WORA enables developers to develop and debug applications on MacOS and quickly deploy them to Linux servers. Its cross-platform functionality also makes Maven’s central repository easier to maintain, enabling the Java open source ecosystem to flourish.

Let’s take a look at WORA’s impact on Java:

  • Class Loading

Java organizes source code through classes, which are stuffed into JAR packages for module organization and distribution, which are essentially ZIP files:

$jar tf slf4j - API - 1.7.25. Jar | head meta-inf/meta-inf/MANIFEST. The MF/org slf4j/org/slf4j/event/EventConstants class org/slf4j/event/EventRecodingLogger.class org/slf4j/event/Level.classCopy the code

Each JAR package is a functionally independent module, so the developer can rely on a specific function of the JAR, which is known by the JVM through the class path, and is loaded.

According to, class loading is triggered when the new or InvokeStatic bytecode is executed. The JVM gives control to the Classloader. The most common implementation, URLClassloader, iterates through the JAR to find the appropriate class file:

for (int i = 0; (loader = getNextLoader(cache, i)) ! = null; i++) { Resource res = loader.getResource(name, check); if (res ! = null) { return res; }}Copy the code

Therefore, the cost of searching classes is usually proportional to the number of JAR packages. In large application scenarios, the number of JAR packages may be thousands, resulting in a high search time.

When a class file is found, the JVM needs to verify that the file is valid and parse it into an internally usable data structure called InstanceKlass.

$ javap -p SimpleMessage.class public class org.apache.logging.log4j.message.SimpleMessage implements org.apache.logging.log4j.message.Message,org.apache.logging.log4j.util.StringBuilderFormattable,java.lang.CharSequence {  private static final long serialVersionUID; private java.lang.String message; private transient java.lang.CharSequence charSequence; public org.apache.logging.log4j.message.SimpleMessage(); public org.apache.logging.log4j.message.SimpleMessage(java.lang.String);Copy the code

This structure contains interfaces, base classes, static data, layouts for objects, method bytecode, constant pools, and so on. These data structures are required for the interpreter to perform bytecode or JIT compilation.

Class initialize

When the class is loaded, it is initialized before the object is actually created or static methods are called. Class initialization can be simply thought of as a static block:

public class A { private final static String JAVA_VERSION_STRING = System.getProperty("java.version"); private final static Set<Integer> idBlackList = new HashSet<>(); static { idBlackList.add(10); idBlackList.add(65538); }}Copy the code

The initialization of the first static variable above, JAVA_VERSION_STRING, also becomes part of the static block after it is encoded into bytecode.

Class initialization has the following features:

  • Execute only once;
  • When multiple threads attempt to access a class, only one thread will perform class initialization, and the JVM guarantees that all other threads will block until initialization is complete.

These features are ideal for reading configurations, or constructing data structures, caches, and so on that are required at runtime, so the initialization logic for many classes can be complex to write.

  • Just In Time compile

After being initialized, a Java class can instantiate an object and call methods on the object. Explain execution similar to a large switch.. Case loop, poor performance:

while (true) { switch(bytocode[pc]) { case AALOAD: ... break; case ATHROW: ... break; }}Copy the code

We run a Hessian serialized Micro Benchmark test using the JMH:

$Java - jar benchmarks. Jar hessianIO Benchmark Mode Cnt Score Error Units SerializeBenchmark. HessianIO THRPT 118194.452 ops/s $ java -Xint -jar benchmarks.jar hessianIO Benchmark Mode Cnt Score Error Units SerializeBenchmark.hessianIO thrpt 4535.820 ops/sCopy the code

The -xint parameter of the second run controlled us to use only the interpreter, where the 26-fold difference was due to the difference between the direct machine execution and the interpreted execution. This difference depends a lot on the scene, our usual experience score is 50 times higher.

Let’s take a closer look at JIT behavior:

$ java -XX:+PrintFlagsFinal -version | grep CompileThreshold
     intx Tier3CompileThreshold                     = 2000                                {product}
     intx Tier4CompileThreshold                     = 15000                               {product}
Copy the code

Here are the values of the two INTERNAL JIT parameters in the JDK. We will not go into the principle of layered compilation for the time being. Please refer to Stack Overflow. Tier3 can be simply interpreted as (client compiler)C1, and Tier4 is C2. When a method is interpreted 2000 times, it is compiled to C1, and when IT is compiled to C1 15,000 times, it is compiled to C2, which is really half the performance of C at the beginning of this article.

In the initial application startup phase, the methods have not been fully JIT compiled, so most of the execution remains interpreted, which affects the speed of application startup.

How to optimize the startup speed of Java applications

We’ve spent a lot of time analyzing the main reasons why Java applications start slowly.

  • Influenced by JakartaEE, common frameworks are designed to be complex because of reuse and flexibility.
  • To be cross-platform, code is loaded dynamically and compiled dynamically, taking time to load and execute at startup;

These two things combine to make Java applications slow to start.

Both Python and Javascript parse and load modules dynamically, and CPyhton doesn’t even have a JIT, so it shouldn’t theoretically start much faster than Java, but they don’t use a very complex application framework, so you don’t feel the overall startup performance problem.

While we can’t easily change the way users use the framework, we can enhance it at the runtime level so that startup performance is as close to Native Image as possible. The official OpenJDK community has also been working on startup performance issues, so can we, as Java developers, take advantage of the latest OpenJDK features to help us improve our startup performance?

  • Class Loading solves this problem by using JarIndex, but this technique is too old to be used in modern projects that include Tomcat and fatJar. AppCDS solves the performance problem of Class file parsing
  • Class Initialize: OpenJDK9 has added HeapArchive, which can persist some Heap data related to class initialization, but only a few internal JDK classes (such as IntegerCache) can be accelerated, and there is no open way to use them.
  • JIT warm-up: JEP295 implements AOT compilation, but there are bugs, improper use can cause correct performance problems. Performance is not well tuned, and in most cases there is no effect and even a performance fallback.

Alibaba Dragonwell has developed and optimized each of these technologies and integrated them with the cloud product to make it easy for users to optimize startup time without much effort.

1 AppCDS

CDS(Class Data Sharing) was first introduced in Oracle JDK1.5. AppCDS was introduced in Oracle JDK8u40, which supports classes outside the JDK, but is provided as a commercial feature. Oracle then contributed AppCDS to the community, and in JDK10 CDS was gradually improved, with support for a user defined classloader (also known as AppCDS V2).

Object-oriented languages bind objects (data) to methods (operations on objects) to provide greater encapsulation and polymorphism. These features depend on the type information in the object header, as in both Java and Python. The layout of Java objects in memory is as follows:

+-------------+
|  mark       |
+-------------+
|  Klass*     |
+-------------+
|  fields     |
|             |
+-------------+
Copy the code

Mark represents the state of the object, including whether it is locked, GC age, and so on. While Klass* points to a data structure that describes the type of the object InstanceKlass:

//  InstanceKlass layout:
//    [C++ vtbl pointer           ] Klass
//    [java mirror                ] Klass
//    [super                      ] Klass
//    [access_flags               ] Klass
//    [name                       ] Klass
//    [methods                    ]
//    [fields                     ]
...
Copy the code

Based on this structure, expressions such as o instanceof String have enough information to judge. Note that InstanceKlass is a complex structure that contains all the methods, fields, and so on of the class. Methods contain bytecode information. This data structure is obtained by parsing the class file at runtime, which also verifies the validity of the bytecode for security purposes (method bytecode not generated through Javac can easily cause a JVM to crash).

CDS can dump the resulting data structure to a file and reuse it on the next run. The dump product is called Shared Archive, with the jSA suffix (Java Shared Archive).

In order to reduce the overhead of CDS reading JSA dump and avoiding the overhead of deserializing data to InstanceKlass, the storage layout of JSA file is exactly the same as that of InstanceKlass object. In this way, when using JSA data, only the JSA file needs to be mapped to memory. And the type pointer in the object header points to this memory address, which is very efficient.

Object: +-------------+ | mark | +-------------------------+ +-------------+ |classes.jsa file | | Klass* +--------->java_mirror|super|methods| +-------------+ |java_mirror|super|methods| | fields | |java_mirror|super|methods|  | | +-------------------------+ +-------------+Copy the code

AppCDS is insufficient for the Customer Class Loader. Procedure

InstanceKlass stored in jSA is a product of parsing class files. For boot Classloaders (which load classes under jre/lib/rt.jar) and system(app) classloaders (which load classes under -classpath), CDS has an internal mechanism to skip reading the class file and just match the corresponding data structure in the JSA file by the class name.

Java also provides a mechanism for custom class loaders. Users can obtain highly customized class logic by overriding their own classLoader.loadClass () method. For example, fetching it from the network or dynamically generating it directly in code is feasible. To enhance the security of AppCDS and avoid getting unexpected classes by loading class definitions from CDS, AppCDS Customer Class Loader needs to go through the following steps:

  1. Call the user-defined classLoader.loadClass () to get the class byte stream
  2. Calculate the checksum of a class byte stream and compare it to the checksum of the same name structure in jSA
  3. InstanceKlass in jSA is returned if the match is successful, otherwise slow Path is used to continue parsing the class file

We’ve seen many scenarios where the first step above takes the bulk of the class loading time, and AppCDS is overwhelmed. For example:

bar.jar
 +- com/bar/Bar.class
 
baz.jar
 +- com/baz/Baz.class
 
foo.jar
 +- com/foo/Foo.class
Copy the code

Class path contains the above three JAR packages. When loading class com.foo.foo, Most Classloader implementations (including URLClassloader, Tomcat, and Spring-boot) choose the simplest strategy (premature optimization is the root of all evil): Try to extract the com/foo/ foo.class file one by one in the order in which the JAR packages appear on disk.

JAR packages are stored in the ZIP format. Each class load requires traversal of JAR packages in the classpath, trying to extract a single file from the ZIP to ensure that existing classes can be found. Assuming that there are N JAR packages, the average class load will attempt to access N/2 ZIP files.

In one of our real scenarios, when N reaches 2000, the overhead of JAR package lookup is very large and much larger than that of InstanceKlass parsing. The AppCDS technology is inadequate for such scenarios.

JAR Index

According to the JAR file specification, a JAR file is a format that uses zip encapsulation and text to store META information in a meta-INF directory. The format has been designed with the above lookup scenario in mind, and this technique is called JAR Index.

Let’s say we’re looking for a class in the aforementioned bar.jar, baz.jar, and foo.jar. If we can immediately infer which jar we’re in through the type com.foo.foo, we can avoid the scanning overhead.

JarIndex -version: 1.0 foo.jar com/foo bar.jar com/bar baz.jar com/bazCopy the code

The above Index file index.list can be generated using the JAR Index technology. When loaded into memory, it becomes a HashMap:

com/bar --> bar.jar
com/baz --> baz.jar
com/foo --> foo.jar
Copy the code

When we see the class name com.foo. foo, we can know the specific JAR package foo.jar from the index according to the package name com.foo, and quickly extract the class file.

The Jar Index technology seems to solve our problem, but the technology is too old to be used in modern applications:

  • The JAR I generates an index file based on the class-path attribute in meta-INF/manifest.mf, which is hardly maintained in modern projects
  • Only URLClassloader supports JAR Index
  • Require that indexed jars appear at the front of the classpath as much as possible

Dragonwell uses agent injection to enable the index.list to be generated correctly and to appear in the appropriate place on the CLASspath to help the application improve startup performance.

Class 2 is initialized in advance

The execution of the code in the static block of a class is called class initialization. After the class has been loaded, the initialization code must be executed before it can be used (creating an instance, calling a static method).

The initialization of many classes is essentially the construction of static fields:

class IntegerCache { static final Integer cache[]; static { Integer[] c = new Integer[size]; int j = low; for(int k = 0; k < c.length; k++) c[k] = new Integer(j++); cache = c; }}Copy the code

We know that the JDK caches a section of data commonly used in box types. To avoid excessive duplication, this section of data needs to be constructed in advance. Since these methods are executed only once, they are executed in a purely interpreted manner. If we can persist a few static fields to avoid calling the class initializer, we can get a pre-initialized class and reduce the startup time.

The most efficient way to load persistence into memory is memory mapping:

int fd = open("archive_file", O_READ);
struct person *persons = mmap(NULL, 100 * sizeof(struct person),
                              PROT_READ, fd, 0);
int age = persons[5].age;
Copy the code

C manipulates data almost directly in memory, while high-level languages like Java abstract memory into objects, with meta information such as Mark and Klass*, which change from run to run, so more sophisticated smarts are needed to achieve efficient object persistence.

Introduction of Heap Archive

OpenJDK9 introduced the HeapArchive capability and OpenJDK12 heap Archive was officially used. As the name suggests, Heap Archive technology can persist objects on the Heap.

The object graph is pre-built and put into archive. We call this phase dump; Using data in an Archive is called the runtime. Dump and run time are usually not the same process, but in some scenarios can be the same process.

Recall the memory layout with AppCDS. The Klass* pointer to the object points to the data in the SharedArchive. AppCDS persists the InstanceKlass meta-information. If a persistent object is to be reused, the type pointer to the object header must also point to the persisted meta-information. Therefore, the HeapArchive technology relies on AppCDS.

To accommodate multiple scenarios, OpenJDK’s HeapArchive also provides two levels: Open and Closed:

Above is the allowed reference:

  • Closed Archive Does not allow references to objects in the Open Archive and Heap. Objects inside the Closed Archive can be referenced only, but cannot be written
  • Open Archive can refer to any object writable

The reason for this is that for some read-only structures, placing them in a Closed Archive can be completely overhead free to GC.

Why read only? Imagine that object A in the Closed Archive refers to object B in the heap, and when object B moves, the GC needs to correct the field in A that points to B, resulting in GC overhead.

Do class initialization in advance using Heap Archive

With this structure supported, class initialization can be completed by pointing the static variable to the archived object after class loading:

class Foo { static Object data; } + | <---------+ Open Archive Object: +-------------+ | mark | +-------------------------+ +-------------+ |classes.jsa file | | Klass* +--------->java_mirror|super|methods| +-------------+ |java_mirror|super|methods| | fields | |java_mirror|super|methods|  | | +-------------------------+ +-------------+Copy the code

3 AOT compilation

Apart from class loading, the bytecode is executed in interpreted mode because the first few executions of the method were not compiled by the JIT compiler. According to the analysis in the first half of this article, interpretation execution is about a few dozen times faster than JIT compilation, and slow code interpretation execution is also a major contributor to slow starts.

Traditional languages such as C/C++ are native machine code compiled directly to the target platform. As people realize the problem of JIT language preheating for Java, JS and other interpreters, the way to compile bytecode directly to native code through AOT gradually comes into public view.

Wasm, GraalVM, and OpenJDK all support AOT compilation to varying degrees, and we focused on optimizing the startup speed around the jAOTC tool introduced in JEP295.

Note the terminology: JEP295 uses AOT to compile the methods in a class file one by one into native code snippets, using the Java virtual Machine to replace the method’s entry into the AOT code after loading a class. GraalVM’s Native Image features more thorough static compilation, with SubstrateVM, a small runtime written in Java code that is statically compiled along with application code into executable files (similar to Go), no longer dependent on the JVM. This practice is also AOT, but for the sake of terminology, AOT here refers solely to the JEP295 way.

First experience with AOT features

Through the introduction of JEP295, we can quickly experience AOT

The jaotc command calls the Graal compiler to compile the bytecode, producing the libHelloworld.so file. The resulting SO file can be mistaken for calling directly into the compiled library code, just like JNI. But instead of using the full ld loading mechanism to run the code, the so file acts more like a container for native code. Hotsopt Runtime requires further dynamic linking after loading AOT SO. After class loading hotspot will automatically associate the AOT code entry and use the AOT version for the next method call. AOT generated code also actively interacts with the hotspot runtime, jumping between AOT, interpreter, and JIT code.

1) The twists and turns of AOT

It seems that JEP295 has implemented a complete AOT system, but why is this technology not being used on a large scale? Among the new features of OpenJDK, AOT is one of the worst.

2) Multiple Classloaders

JDK-8206963: bug with multiple class loaders

This is not designed to take into account Java’s multi-classloader scenario. When multiple Classloaders load a class with the same name that uses AOT, their static fields are shared, but according to the Design of the Java language, this part of the data should be separated.

Since there is no quick fix for this problem, OpenJDK simply adds the following code:

ClassLoaderData* cld = ik->class_loader_data(); if (! cld->is_builtin_class_loader_data()) { log_trace(aot, class, load)("skip class %s for custom classloader %s (%p) tid=" INTPTR_FORMAT, ik->internal_name(), cld->loader_name(), cld, p2i(thread)); return false; }Copy the code

AOT is not allowed for user-defined class loaders. This is already a glimpse of the gradual lack of maintenance at the community level.

In this case, AOT can still be used for classes specified through class-path, but our common frameworks such as Spring-boot and Tomcat require the Custom Classloader to load the application code. It’s fair to say that this change cuts a big chunk out of AOT.

3) Lack of tuning and maintenance, falling back to experimental features

JDK-8227439: Turn off AOT by default

JEP 295 AOT is still experimental, and while it can be useful for startup/warmup when used with custom generated archives tailored for the application, experimental data suggests that generating shared libraries at a module level has overall negative impact to startup, dubious efficacy for warmup and severe static footprint implications.

To open AOT, add experimental parameter:

java -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=...
Copy the code

According to the issue, this feature has a negative effect on startup speed and memory footprint when compiling an entire module. The reasons for our analysis are as follows:

  • The Java language itself is too complex, and runtime mechanisms such as dynamic class loading prevent AOT code from running as fast as expected
  • AOT technology as a phased project was not maintained long after Java 9 and lacked necessary tuning (AppCDS was iteratively optimized)

4) JDK16 is deleted

Jdk-8255616: Disable AOT and Graal in Oracle OpenJDK

On the eve of the release of OpenJDK16, Oracle officially decided not to maintain the technology:

We haven’t seen much use of these features, and the effort required to support and enhance them is significant.

The root cause is the lack of necessary optimization and maintenance. As for AOT related future planning, it can only be inferred from the snippet that there are two technical directions for Java AOT in the future:

Do AOT based on OpenJDK C2

  • Full Java language features are supported on GraalVM’s Native-image, requiring AOT users to gradually transition from OpenJDK to Native-Image
  • Neither of these technical directions is expected to make progress in the short term, so Dragonwell’s technical direction is to make the existing JEP295 work better and give users extreme startup performance.

5) Quick start on Dragonwell

Dragonwell’s quick start features address weaknesses in AppCDS, AOT compilation, and advanced class initialization based on the HeapArchive mechanism. These features eliminate almost all application startup time visible to the JVM.

In addition, Dragonwell consolidated the startup acceleration technologies and integrated them into SAE products because they all fit into the trace-dump-replay usage pattern.

SAE X Dragonwell: Serverless with Java Startup acceleration Best practices

With good ingredients, you need the right ingredients, and a master cook.

The combination of Dragonwell’s startup acceleration technology and the notoriously resilient Serverless technology, combined with the full lifecycle management of microservice applications, will help to leverage their ability to reduce end-to-end startup times. So Dragonwell chose SAE to land its startup acceleration technology.

SAE (Serverless Application Engine) is the first PaaS platform for Serverless that can:

  • Java software package deployment: Zero code transformation to enjoy microservice capabilities, reduce r&d costs
  • Serverless Maximum flexibility: Resources are o&M free, application instances are rapidly expanded, and o&M and learning costs are reduced

1 Difficulty Analysis

Through our analysis, we found that users of microservices face some challenges at the application startup level:

  • Package size: several hundred MB or even GB level
  • Dependencies: hundreds of dependencies, thousands of classes
  • Loading time: Loading dependency packages from disk to Class loading on demand can account for up to half of the startup time

With Dragonwell’s quick start capability, SAE provides a set of best practices for Serverless Java applications to launch as quickly as possible, enabling developers to focus more on business development:

  • Java environment + JAR/WAR package deployment: Dragonwell 11 integration provides accelerated startup environment
  • JVM shortcut Settings: support a key to open quick start, simplify the operation
  • NAS web disk: Supports cross-instance acceleration, which speeds up the startup of new instances or batch releases when new packages are deployed

2 Acceleration effect

We selected some typical demos or internal applications in micro-services and complex dependency business scenarios to test the startup effect. It was found that the startup time of applications was generally reduced by 5% to 45%. If the application starts in the following scenarios, the acceleration effect will be obvious:

  • Class loading (spring-PetClinic starts loading about 12000+ classes)
  • Rely less on external data

3 Customer Cases

Alibaba search recommendation Serverless platform

Ali internal search recommendation Serverless platform through the class loading isolation mechanism, multiple services are deployed in the same Java VIRTUAL machine. The scheduling system combines service codes into idle containers as required, so that multiple services can share the same resource pool, greatly improving the deployment density and overall CPU usage.

To support a large number of different business r&d operations, the platform itself needs to provide rich enough functions, such as caching and RPC invocation. So each JVM searching for the recommended Serverless platform requires pulling up a middleware isolation container like Pandora Boot, which loads a lot of classes and slows down the platform’s own startup. The startup time of the containers themselves becomes particularly important when the surge of demand enters and the scheduling system needs to pull up more containers for the business code to deploy.

Based on Dragonwell’s quick start technology, the search recommendation platform performs optimizations such as AppCDS, Jarindex, etc., in the pre-release environment, and the resulting Archive files are put into the container image, so that each container can enjoy accelerated startup time, reducing startup time by approximately 30%.

The ultimate elasticity of the brand is SAE

An external customer quickly iterated a popular store App with the help of Dragonwell 11, a Jar package provided by SAE.

In the face of rapid expansion, with the help of SAE Serverless extreme elasticity and application index QPS RT resilience, it can easily face more than 10 times the rapid expansion demand. At the same time, the one-click Dragonwell enhanced AppCDS startup acceleration capability is enabled, which reduces the startup time of Java applications by more than 20% and further accelerates the startup of applications to ensure smooth and healthy operation of services.

Five summarizes

The quick-start technology direction on Dragonwell is based entirely on the work of the OpenJDK community, with detailed enhancements and bugfix to make it easier to get started. Doing so ensures compliance with standards, avoids internal customization, and contributes to the open source community.

As base software, Dragonwell can only generate/use archive files on disk. Combined with SAE’s seamless integration with Dragonwell, JVM configuration and distribution of archive files are automated. Customers can easily enjoy the benefits of accelerated application technology.

Author: Liang Xi, from aliyun Java Virtual machine team, responsible for Java Runtime direction. Led the Java coroutine, startup optimization and other technology research and development and large-scale landing. Generation order, from aliyun SAE team, responsible for Runtime evolution, flexibility and efficiency direction. Lead the development of application elasticity, Java acceleration, image acceleration and other technologies.

The original link

This article is ali Cloud original content, shall not be reproduced without permission.