By Byte Mobile Technology — Yang Hao
Introduction to the
The Dart VM (Dart Virtual Machine) is a set of components that help the Dart code run locally. The core components are as follows:
-
Runtime System
-
Core Libraries
-
Development Experience Components
-
JIT (Just in Time) and AHEAD of time (AOT) compilation pipeline
-
Interpreter = Interpreter
-
ARM Simulator
This article focuses on several common compilation patterns for Dart code on the Dart VM:
-
from source or kernel binary using JIT;
-
from snapshots:
-
from AppJIT snapshot;
-
from AppAOT snapshot;
-
Dart VM Isolate
In the Dart VM, any Dart code runs in an ISOLATE. Each ISOLATE is independent and has its own storage space, main thread, and helper threads. The ISOLATES do not affect each other. Multiple ISOLATES may run in the Dart VM. However, they do not share data directly. Instead, they communicate with each other through ports (different from ports in the network).
From the picture above, it is not difficult to see that an ISOLATE mainly includes the following parts:
-
Heap: Stores all objects created as the Dart code runs and is managed by the GC thread.
-
Mutator Thread: the main Thread that executes the DART code.
-
Helper threads: Helper threads that manage and optimize the ISOLATE in the Dart VM.
We can also see that the VM has a special VM-ISOLATE, which holds some globally shared constant data. Although the ISOLATES cannot reference each other, each ISOLATE can reference data stored in the VM-ISOLATE.
Let’s take a closer look at the relationship between ISOLATE and OS Thread, which is complex and uncertain depending on the platform and how VM is packaged into applications, but three things are certain:
-
An OS thread can enter only one ISOLATE at a time. To enter another ISOLATE, exit from the current ONE.
-
An ISOLATE can associate only one Mutator thread at a time. The Mutator thread is used to execute dart code and invoke the PUBLIC C API of the VM.
-
An ISOLATE can be associated with multiple helper threads, such as JIT compilation threads and GC threads.
The Dart VM maintains a global thread pool internally to manage OS Threads. All requests to create threads are described as a ThreadPool::Task, such as a SweeperTask when GC reclaims memory. The thread pool first checks to see if any OS threads are available in the pool. If so, they are reused directly, and if not, a new thread is created.
Run From Source Via JIT
Dart < filenames. Dart > is often used to execute the dart source file. A simple example is as follows:
// hello.dart main() => print('Hello, World! '); // execute it in command line $ dart hello.dart Hello, World!Copy the code
So how does the Dart VM work in this mode? In fact, since DarT2 the Dart VM has not been working directly on source code, but on intermediate kernel binary files (a.dill file that contains the serialized kernel AST). Translating the source code into the Kernel binary is done by common Front-end (CFE), a tool in the DART SDK, which is also shared by several other tools (VM, Dart2JS, dart Dev Compiler, etc.).
The figure above briefly shows how dart code is compiled before VM. However, in order to make it easier for users to run the DART file directly from the source code, a Helper ISOLATE called Kernel Service is started in the VM to run the DART file, which uses CFE to compile the source code into a Kernel binary and then sends it to the VM for execution.
This is not the only way to match CFE and VM, for example in Flutter they are separated: the CFE is compiled on the development machine and the kernel file is handed over to the VM running on the target device for execution.
The following figure shows how dart code executes in debug mode. If we look closely at the process, we can see that: When flutter_tool is started to run dart code, the source-to-kernel compilation is not done by Flutter_tool itself, but by starting a persistence process called Frontend_Server. In fact, it’s just a thin wrapper of CFE with the added kernel-to-kernel translation that flutter has. The resulting kernel binary is sent via flutter_Tool to the Flutter engine on the device for execution.
Frontend_server persistence plays an important role in hot overloading, because it saves the last CFE state, and when a user performs a hot overload, he or she can recompile only the changed part based on the previous record, rather than compiling all of it.
The VM loads the kernel binary and resolves to the corresponding object model, but the process is lazy (as shown below) : At first only basic information about the library and class is loaded into the entity in the heap. Each entity contains a pointer to the binary that generated them, and more information can be generated when needed later.
For example, when Runtime needs to instantiate a class or find a member of the class, it will use this pointer to find the corresponding binary and generate all the information about the class. At this stage, all fields of the class will be loaded, but only the signature of the class method will be loaded. The corresponding function body is still in lazy mode, and it will be fully loaded only when it is used. At this time, there is enough information for Runtime to parse and reference the class method.
Initially each function has no actual executable code. Instead, it contains a placeholder to LazyCompileStub (shared by all functions), which asks Runtime to generate executable code for the current function and map it to them. Each subsequent call returns the execution result of the corresponding code segment.
The function is first compiled using unoptimizing Pipeline without any optimization. The purpose is to quickly generate executable code, which consists of two phases:
-
Turn the function body (serialized AST) in the Kernel binary into a Control flow Graph (CFG). CFG is composed of basic blocks, each of which is composed of intermediate Language (IL) instructions. IL instructions are stack instructions based on the virtual machine. The basic mode is to fetch an operand from the stack, then execute the operation and push the result onto the stack.
-
IL instructions are converted directly to machine code without any optimization.
Unoptimizing Pipeline will not statically resolve any calls that are not resolved in the Kernel binary. It will treat all unresolved calls as fully dynamic and dynamically bind them with inline caching.
The implementation of Inline Catching includes the following:
-
Call Site specific Cache: A RawICData Object (RawICData Object) is created for each unparsed Call point. The RawICData object contains the different classes and the methods that should be called, as well as some auxiliary information, such as the number of calls.
-
Inline Cache Stub: This stub is shared by all cashes because the execution logic is the same: first, a linear lookup is performed on the calling point cache, and if there is a matching class, the corresponding method is directly bound. If you don’t find it, you call Runtime to complete the call resolution and update the cache, so you don’t have to call Runtime the next time you encounter the same class.
For a simple example, you can see that a cache is associated with the InlineCacheStub code snippet at the dynamic call point to Animal.toface ().
Since there are no optimizations in Unoptimizing Pipeline, the generated code executes inefficiently, although the compilation is fast. To this end, the VM provides a Optimizing pipeline that ADAPTS to the profile generated as the program runs.
The first thing to know is that unoptimized programs collect the following information at runtime:
-
Class information contained in the inline cache for each dynamic call point
-
Call counters for each function (to track how often a function is called)
When the number of calls to a function reaches a certain threshold, it is sent to the Background Optimizing Compiler, a helper thread, for optimization.
The optimization process of the function is as follows:
-
This function is similar to the unoptimizing pipeline, which requires that the serialized AST in the kernel binary is converted to a CFG composed of IL instructions. However, the function has been run several times before the optimization period and builds a better inline cache. So it can be referenced directly without having to rebuild.
-
Change the IL instruction to static Single Assignment (SSA) form (each variable can only be assigned once, and each variable must be defined before it is used). This form of IL facilitates subsequent optimization analysis.
-
Optimize IL instructions in SSA form, including adaptive optimization based on profile information, as well as dart specific optimizations. (e.g. inlining, range analysis, type propagation, representation selection, store-to-load and load-to-load forwarding, Global value numbering, allocation sinking, etc.).
-
Convert the optimized IL instructions to machine code (linear Scan register allocation is used here). The advantage is that the code is generated quickly and is often used in JIT compilation mode.
When the optimization is complete, the Background Compiler asks the main thread to enter a safe point (in order to allow the Background Compiler thread to operate with confidence) and then links the optimized code to the corresponding function. The next call will execute the optimized code.
This involves a technique called ON stack replacement (OSR), which simply means replacing the old stack frame with a new one. OSR enables optimized function code to be put to use quickly by simply replacing the original function stack frame, even while the function is running.
As you can see from the figure above, the Optimizing Pipeline deoptimizes each place. What is this used for? Let’s start with an example:
void printAnimal(obj) {
print('${obj.toString()}');
}
// Call printAnimal(...) a lot of times with an intance of Cat.
// As a result printAnimal(...) will be optimized under the
// assumption that obj is always a Cat.
for (var i = 0; i < 50000; i++)
printAnimal(Cat());
// Now call printAnimal(...) with a Dog - optimized version
// can not handle such an object, because it was
// compiled under assumption that obj is always a Cat.
// This leads to deoptimization.
printAnimal(Dog());
Copy the code
In this example loop, the printAnimal() function has always received a Cat object, so the optimizer makes a speculative assumption based on this experience: obj is always a Cat object. Based on this assumption, we optimized the obj.toString() dynamic call point from the inline cache to simply verify that obj is a Cat object and call the cat.toString () method directly.
But this assumption is broken at line 17, when the printAnimal() function accepts a Dog object, which no longer fits the assumption that obj is always a Cat object. At this point, the previous optimized code ceases to work. So we need to optimize, that is, go back to the optimized version, where our deoptimization ID comes into play, and the VM uses it to find the correct location for the unoptimized version of the code and continue execution. (This procedure needs to be done with great care, as it can cause side-effects when the function is running.)
After optimization, we usually discard the expired optimized code and re-optimize it later based on new information.
From this example, we can see that when an optimization is based on an assumption that can be violated, two things must be guaranteed:
-
Capable of detecting all possible violations; A common practice is to check the assumptions on which the current optimization is based before each use of the optimization product, such as in the example above, where the method call to OBj is optimized but obj is still checked to see if it is a Cat object before the call is made.
-
To be able to recover in the event of violation; When it detects that the hypothesis is invalid, all optimizations based on it are invalid, so Runtime needs to find any obsolete optimizations and restore them. If the execution stack also contains outdated code, the lazy deoptimize method marks it first and then optimizes it when it gets there.
Running from Snapshots
In addition to the common method of starting from the Dart source code, the DART VM can also start from a snapshot file, which is much faster.
This is because the VM can serialize the data in the heap, or more accurately the Object graph, into a binary snapshot and quickly restore the original data structure from the snapshot file. Therefore, when an ISOLATE starts up again, you no longer need to parse and build from the source code, which greatly saves compilation time.
Initially, Snapshot did not contain machine code (as shown in the figure above), but this feature was later added to the AOT mode, so that snapshot can now be parseable and not only quickly build data structures, but also obtain executable code. The structure is something like this, where machine code itself is binary. So you don’t need to serialize and deserialize intentionally.
Now that you know what a snapshot is, what are some common scenarios for using snapshot
Running from AppJIT snapshots
AppJIT Snapshot is one of the Snapshot types that can be used to reduce the startup time of common DART tools. Tools like DartAnalyzer and Dart2JS, for example, have a certain volume in their own right, and when running small projects, it often takes longer to compile the tools than it takes to actually compile the project, which is not ideal.
AppJIT Snapshot is a good solution to this problem: Run a time-consuming tool on the VM with mock training data, serialize the generated machine code and internally built data structures to AppJIT Snapshot, and then start directly from the Snapshot instead of the source every time you use the tool. And it still optimizes the performance of the update tool in JIT mode (so don’t worry that the tool trained with simulated data is not suitable for real data).
It is easy to see from the following example that AppJIT Snapshot has significantly improved performance.
# from the source code to perform $dart PKG/compiler/lib/SRC/dart2js dart - o hello. Js. Hello dart Compiled as 7359592 characters dart to 10620 Characters JavaScript in 2.07 seconds Dart file (hello.dart) compiled to JavaScript: $dart --snapshot-kind=app-jit --snapshot=dart2js.snapshot \ PKG/compiler/lib/SRC/dart2js dart - o hello. Js. Hello dart Compiled as 7359592 characters dart to 10620 characters JavaScript in 2.05 seconds Dart file (hello.dart) compiled to JavaScript: Js # Execute $dart dart2js.snapshot -o hello.js hello.dart Compiled 7 359,592 characters dart to 10,620 characters JavaScript in 0.73 seconds Dart file (hello.dart) compiled to JavaScript: hello.jsCopy the code
The Flutter_tools tool in Flutter is stored in a cache directory in the form of a snapshot. Every time the flutter command is called, data is directly parsedfrom the snapshot and quickly loaded into the VM. This greatly improves the startup time of the tool.
Running from AppAOT snapshots
AppAOT Snapshot is a type of snapshot, but it is very different from AppJIT Snapshot because it is produced in AOT mode, which means it will no longer support the JIT feature. This means two things:
-
The AppAOT Snapshot needs to contain executable code for all functions that may be called while the program is running, because it cannot be compiled and replenished while the JIT is running.
-
All executable code must not rely on assumptions that might be violated at run time.
To meet these requirements, Type Flow Analysis (TFA) was introduced into the AOT Compilation Pipeline to perform a global static analysis of the code, including the process of finding all reachable codes, identifying which classes have been instantiated, identifying the flow of variable types, and so on. All analysis is conservative, that is, more correctness than performance like JIT (because JIT can always optimize back to the unoptimized version to ensure correctness).
VM statically optimizes code based on TFA information, such as discarding unreachable functions and statically resolving method calls to objects based on type information. All reachable functions are then compiled into executable code using the same tool chain and JIT mode, but without any speculative optimizations.
When all functions are compiled, an AppAOT snapshot can be generated for the heap. The VM provides a special preCompiled Runtime to run this snapshot, which simplifies many unnecessary components compared to the JIT.
Here is an example of using the DART AOT mode. The release mode in the Flutter handles the DART code using the AOT mode in the Dart VM.
# Need to build normal dart executable and runtime for running AOT code. $ tool/build.py -m release -a x64 runtime dart_precompiled_runtime # Now compile an application using AOT compiler $ pkg/vm/tool/precompiler2 hello.dart hello.aot # Execute AOT snapshot using runtime for AOT code $ out/ReleaseX64/dart_precompiled_runtime hello.aot Hello, World!Copy the code
Now let’s talk about another problem. We mentioned earlier that method calls can be statically resolved with TFA information, but there are still some dynamic call points where limited information can’t be statically resolved, For this case, a method called Switchable calls is used in the PreCompiled Runtime, which is actually an extension of the inline cache we discussed earlier.
Recall that the two most important parts of inline cache are the cache associated with the call point and the code snippets used to execute the logic. In JIT mode, you only need to update the cache, code snippets are constant and shared; In AOT mode, however, the code snippet is no longer shared, but is associated with the call point like cache, and both can be replaced.
In the initial case, the call point is in unlinked state, and the cache only has the method name. When the call occurs, the stub asks the Runtime to find the corresponding method in the cache.
Then the call point will enter monomorphic State. At this time, the cache stores the corresponding class name of the method, class C. In the next call, the stub will judge whether the currently called object is an instance of class C, if yes, the method will be directly called, otherwise it will enter the next state. There are two cases.
In the first case, if the calling object is an instance of a subclass of class C, and the subclass does not override the method, c. thod is still valid. In this case, the call point is entered into the single Target state, the cache stores the lower and upper bounds of the CID, and the stub uses the cid of the object to make a clever judgment (CID is an integer ID assigned to each class in depth first during AOT compilation). C id <= classId(obj) <= Max (classId(obj) <= Max (D0. , dn. cid), then the object call c. ethos is reasonable.
In the second case, the method called by the object is no longer C.method or a miss occurs in the single target state, and you have to find a new method to call, so the call point enters IC State. This is very similar to the Incline Cache pattern we introduced earlier — the cache stores the calling methods of different classes, and the stub does a linear lookup.
The size of the array in the cache grows, and when it reaches a certain threshold it enters megamorphic State, which turns the array into a dictionary-like structure.
The Dart VM is a core component of the Flutter. I hope to see more articles about the Dart VM in the future.
reference
Introduction to Dart VM
About the Byte Mobile Platform team
Bytedance Client Infrastructure is the industry leader in big front-end Infrastructure, responsible for the construction of big front-end Infrastructure in The entire China region of Bytedance, improving the performance, stability and engineering efficiency of the company’s entire product line. The supported products include but are not limited to Douyin, Toutiao, Watermelon video, Huoshan small video, etc., which have been researched in depth on mobile terminal, Web, Desktop and other terminals.
Now is the time! Client/front-end/server/intelligent algorithm/test development for global recruitment! Let’s use technology to change the world. If you are interested, please contact [email protected]. Email subject: Resume – Name – Job Objective – Expected city – Phone number.