An introduction to the Dart VM and runtime mode resolution

Mrale. Ph /dartvm/

PS: The content is complicated, please watch it accordingly

The Dart VM is a collection of components that are used to execute THE Dart code locally, including the following:

Run-time system
- The object model
- Garbage collection
- The snapshot
Native methods for core libraries
Components accessible through Service Protocol: debug * analyze * hot overloading
Both just-in-time (JIT) and AOT ahead of time (AOT) compilation pipelines
Interpreter
ARM emulator

The Dart VM is a virtual machine in the sense that it provides an execution environment for high-level programming languages, but that doesn’t mean Dart always needs to be interpreted or JIT compiled when executed on the Dart VM.

For example, Dart code can be compiled into machine code using Dart VM AOT and then executed in a tailored version of Dart VM. This is called a precompiled runtime, which does not contain any compiler components and cannot dynamically load THE Dart source code.

How does the Dart VM run your code?

The Dart VM has several ways of executing code, such as:

JIT mode using source code or kernel binaries;
Using snapshots:
- From AOT snapshot;
- From AppJIT snapshot;

The main difference, however, is “when” and “how” the VM converts the Dart source code into executable code, and then ensures that the runtime environment for execution remains the same.

Any Dart code in the VM runs within some ISOLATE, which can be described as a Dart isolation universe with its own memory (heap) and often its own thread of control (mutator thread).

VMS can have a number of ISOLATE executing Dart code at the same time, but they cannot share any state directly and can only communicate by passing messages through ports (not to be confused with network ports!). .

The relationship between OS threads and THE ISOLATE here is a little fuzzy and highly dependent on how the virtual machine is embedded into the application, but the main thing to keep in mind is the following:

An OS thread can only enter one at a timeisolateIf it wants to enter anotherisolateIt must leave the presentisolate;
There can only be one and at a timeisolateThe Mutator thread is the thread that executes the Dart code and uses the COMMON C API of the VM.

However, the same OS thread could enter one ISOLATE to execute the Dart code, then leave that ISOLATE and enter another isolate to continue executing. Or there could be a number of different OS threads going into a ISOLATE and executing Dart code in it, just not all at the same time.

Of course, in addition to a single Mutator thread, the ISOLATE can also associate multiple worker threads, for example:

A background JIT compiler thread;
GC Sweeper on site;
Concurrent GC Marker thread;

The VM internally uses a ThreadPool (dart::ThreadPool) to manage OS threads, and the code is built around the dart::ThreadPool::Task concept rather than around the OS thread concept.

For example, in the GC will dart VM: : ConcurrentSweeperTask published to the global VM thread pool, rather than generating a dedicated thread to perform the background removal, and implement either idle threads, thread pool or without creating a new thread thread; Similarly, the default implementation for isolating message handling event loops does not actually generate a dedicated event loop thread, but instead publishes the Dart ::MessageHandlerTask to the thread pool when a new message arrives.

Dart ::Isolate represents an Isolate, DART ::Heap represents the Heap of the Isolate, and DART ::Thread describes the status of threads connected to the Isolate.

Note that the name Thread can be a bit confusing because all OS threads are attached to the same ISOLATE as Mutator and will reuse the same Thread instances. For the default implementation of ISOLATE message processing, see Dart_RunLoop and Dart ::MessageHandler.

Run the source code through the JIT

This section describes what happens when Dart is executed from the command line:

// hello.dart
main() => print('Hello, World! ');

$ dart hello.dart
Hello, World!
Copy the code

Instead of having the ability to execute Dart directly from raw code, the Dart 2 VM wants to get kernel binaries (also known as DILL files) that contain the serialized kernel AST. The task of translating Dart source code into Kernel AST is handled by the Common Front End (CFE), which is written in Dart and shared across different Dart tools (for example, VM, Dart 2JS, Dart Dev Compiler).

To maintain Dart execution directly from source code, a secondary ISOLATE called Kernel Service is hosted, which handles compiling the Dart source into the kernel, and then the VM runs the generated kernel binaries.

However, this setup is not the only way for CFE and VM to run Dart code. For example, Flutter completely separates the process compiled to the Kernel from the process executed from the Kernel and implements them on different devices: Compilation takes place on the developer machine (host) and execution is processed on the target mobile device, which receives kernel binaries sent to it by the Flutter tool.

Note here that the Flutter tool does not handle Dart parsing itself. Instead, it generates another persistent process, frontend_server, which is essentially an encapsulation around CFE and some of the Flutter specific kernel-to-kernel transformations.

Frontend_server compiles the Dart source code into a kernel file, which flutter then sends to the device. Frontend_server comes into play when the developer requests a hot reload: In this case, Frontend_server can reuse the CFE state from a previous compilation and recompile the actual changed library.

Once the kernel binary file is loaded into the VM, it would be parsed to create on behalf of all kinds of application entity object, but the process is done by inert: first load only basic information about the library and class, each entity are derived from the kernel binary keep a pointer to a binary file, so that you can according to need to load more information later.

We use the Untagged prefix whenever we refer to objects allocated internally by the VM, because this follows the VM’s own naming convention: the layout of the internal VM object is defined by a C++ class with a name starting with the Untagged header file runtime/ VM /raw_object.h. Dart ::UntaggedClass represents a VM object of the DART class. Dart ::UntaggedField is a VM object

Information about a class is fully deserialized only when it is needed at run time (for example, to find a class member, assign an instance, and so on). At this stage, the class members are read from the kernel binaries. However, the complete function body is not deserialized at this stage, only their signatures are deserialized.

At this point methods can be successfully parsed and invoked at run time because enough information has been loaded from the kernel binaries, such as the ability to parse and call functions in the main library.

Package :kernel/ast.dart defines classes that describe kernel AST; Package: Front_end handles parsing the Dart source code and building the kernel AST from it. Dart: : the kernel: : KernelLoader: : LoadEntireProgram is the kernel AST deserialize the entry point for corresponding VM object; Dart implements kernel service isolation, and Runtime/VM/kernel_ISOLate.cc glue the DART implementation to the rest of the VM. Package: THE VM hosts most of the kernel-based VM-specific functions, such as various kernel-to-kernel transformations; Some VM-specific transformations still exist in Package :kernel for historical reasons.

Initially all functions will have a placeholder, not the actual executable code of their body: they point to LazyCompileStub, which simply asks the runtime system to generate executable code for the current function, and then tail-calls the newly generated code.

The first time a function is compiled, it is done through an unoptimized compiler.

The unoptimized compiler generates machine code twice:

1. Iterate over the serialized AST of the function body to generate a control flow graph (CFG) of the function body, which consists of basic blocks filled with intermediate language (IL) instructions. The IL instructions used at this stage are similar to those of a stack-based virtual machine: they fetch operands from the stack, perform operations, and then push the results onto the same stack.

Not all functions actually have the actual Dart/Kernel AST body, such as native functions defined in C++ or artificial tear-off functions generated by the Dart VM, and in these cases the IL is simply created out of thin air rather than generated from the Kernel AST.

2. The generated CFG uses one-to-many underlying IL instructions to compile directly into machine code: each IL instruction expands into multiple machine language instructions.

No optimizations are performed at this stage, and the primary goal of an unoptimized compiler is to generate executable code quickly.

This also means that an unoptimized compiler does not attempt to statically resolve any unresolved calls in the kernel binaries, and the VM does not currently use scheduling based on virtual tables or interface tables, but instead uses inline caching for dynamic calls.

The original implementation of inline caching was actually the native code of the patch function, hence the name inline caching. The idea of inline caching can be traced back to Smalltalk-80, see Efficient implementation of Smalltalk-80 systems.

The core idea behind inline caching is to cache the result of method resolution at a specific call point. The inline caching mechanisms used by VMS include:

A call-specific cache (DART ::UntaggedICData) that maps the receiver’s class to a method that should be called if the receiver is a matching class. The cache also stores auxiliary information, such as a call frequency counter that tracks how often a given class occurs at this call point;
A shared lookup stub implements a fast path to method calls. This stub searches the given cache to see if it contains entries that match the recipient’s class. If the entry is found, the stub will increment the frequency counter and tail-calls with the cache method. Otherwise the stub will invoke a runtime system helper to implement the method resolution logic. If the method resolves successfully, the cache is updated and subsequent calls do not need to enter the runtime system.

The following figure shows the structure and state of the inline cache associated with the animal.toface () call, which is executed twice with an instance of Dog and once with an instance of Cat.

The unoptimized compiler itself is sufficient to execute any Dart code, but the resulting code is rather slow, which is why THE VM also implements an adaptive optimization compilation pipeline. The idea behind adaptive optimization is to use the execution profile of the running program to drive optimization decisions.

When unoptimized code runs, it collects the following information:

As mentioned above, the inline cache collects information about the type of sink observed at the call point;
The function and the execution counter associated with the base block within the function track the hot area of the code;

When the execution counter associated with the function reaches a certain threshold, the function is submitted to the background optimization compiler for optimization.

Optimized compilation is started the same way as non-optimized compilation: by iterating through the serialized kernel AST, non-optimized IL is built for the function being optimized.

Instead of processing IL directly into machine code, however, instead of form-based optimized IL, the optimization compiler continues to convert the unoptimized IL to static singleassignment (SSA), and then makes specialized guesses based on the SSA-based IL based on the type feedback collected through a series of Dart specific optimizations, such as:

Inlining;
Range analysis;
Type Propagation;
Representation Selection;
Store-to-load and load-to-load forwarding;
Global value numbering (global value numbering);
Allocation sinking, etc.;

Finally, the optimized IL is converted into machine code using linear scan registers and simple one-to-many IL reduction instructions.

Once compiled, the background compiler requests the Mutator thread to enter the safe point and append the optimized code to the function.

Broadly speaking, a thread in a managed environment (virtual machine) is considered to be at a safe point when the state associated with the thread (such as stack frames, heap, etc.) is consistent and can be accessed or modified without interruption by the thread itself. Usually this means that the thread is either suspended or executing some code outside of the managed environment, such as running unmanaged native code.

The next time this function is called, it will use optimized code. Some functions contain very long run loops, and for those functions, it makes sense to switch from executing unoptimized code to optimized code while the function is still running.

This process is called stack replacement (OSR), and it gets its name because stack frames of one version of a function are transparently replaced with stack frames of another version of the same function.

The compiler source is in the Runtime/VM/Compiler directory; Compile pipe entry point is the dart: : CompileParsedFunctionHelper: : Compile; IL in the runtime/vm/compiler/backend/IL. H is defined; Kernel to IL conversion from dart: : the kernel: : StreamingFlowGraphBuilder: : BuildGraph began, the function is also processing all kinds of artificial function of IL build; When InlineCacheMissHandler processing IC misses, dart: : the compiler: : StubCodeCompiler: : GenerateNArgsCheckInlineCacheStub for inline cache stub generated machine code; The runtime/vm/compiler/compiler_pass. Cc defines the optimizing compiler passed and its order; Dart ::JitCallSpecializer Most specializations are based on type feedback.

It is important to note that the code generated by the optimized compiler is assumed under professional assumptions based on the application execution profile.

For example, a dynamic call point only observes an instance of class C as a recipient, which will be transformed into an object that can be called directly, and checked to verify that the recipient has an expected class C. However, these assumptions may be violated during program execution:

void printAnimal(obj) {
  print('Animal {');
  print(' ${obj.toString()}');
  print('} ');
}

// Call printAnimal(...) a lot of times with an intance of Cat.
// As a result printAnimal(...) will be optimized under the
// assumption that obj is always a Cat.
for (var i = 0; i < 50000; i++)
  printAnimal(Cat());

// Now call printAnimal(...) with a Dog - optimized version
// can not handle such an object, because it was
// compiled under assumption that obj is always a Cat.
// This leads to deoptimization.
printAnimal(Dog());
Copy the code

Whenever code is doing some hypothetical optimization, it may be violated during execution, so it needs to be sure that the original execution can be restored in the event of a violation of the assumption.

This recovery process is also known as de-optimization: when the optimized version encounters a situation that it cannot handle, it simply moves execution to the match point of the unoptimized function and continues execution there. The unoptimized version of the function makes no assumptions and can handle all possible inputs.

VMS typically discard the optimized version of a function after it has been de-optimized and then re-optimize it using newer type feedback.

The VM has two ways of protecting speculative assumptions made by the compiler:

Inline checks (such as the CheckSmi, CheckClassIL directive) verify that the assumption is true at the site where the compiler makes the assumption. When a dynamic call is converted to a direct call, for example, the compiler adds these checks before the direct call.
Global Guards will discard optimized code at runtime when dependent content changes. For example, the optimization compiler might observe aCClass is never extended and this information is used during type propagation. However, subsequent dynamic code loading or class termination may introduce a subclass C. At this point, the runtime needs to find and discard theCAll optimized code compiled under the assumption that there are no subclasses. The runtime may find some optimization code on the execution stack that is now invalid, in which case the affected frames will be marked as “de-optimized” and de-optimized when the execution returns.This de-optimization is called lazy de-optimization: it is delayed until control is returned to the optimized code.

The de-optimizer mechanism in runtime/ VM /deopt_instructions.cc is essentially a mini-interpreter for solving optimization instructions, which describes how to reconstruct the required state of unoptimized code from the state of optimized code. To optimize the instruction by the dart: : CompilerDeoptInfo: : CreateDeoptInfo at compile time to optimize each potential “to optimize the location of the code generation.

Running from a Snapshot

VMS are able to call the isolate’s heap, or the graph of more precisely serialized objects in the heap, binary snapshots, which can then be used to recreate the same state during VM startup.

The format of the snapshot is low-level and optimized for quick startup: it’s essentially a list of objects to create and instructions on how to wire them together.

The original idea behind snapshots: Instead of parsing Dart sources and gradually creating internal VM data structures, the VM can quickly unpack all necessary data structures from the snapshot and then isolate up.

The idea of snapshots came from Smalltalk graphics, which in turn was inspired by Alan Kay’s master’s thesis. The Dart VM uses the cluster serialization format, which is similar to the system sing: Techniques described in the papers A Fast and Feature-rich Binary Deployment Technology and Clustered Serialization with Fuel.

The machine code was not included in the initial snapshot, but was later added during the development of the AOT compiler. The motivation for developing AOT compilers and snapshots with code is to allow the use of VMS on platforms where JIT is not possible due to platform-level constraints.

Snapshots with code work almost the same as normal snapshots, but with a slight difference: They contain a code part that, unlike the rest of the snapshot, does not require deserialization and is placed in a way that allows it to become part of the heap directly after being mapped to memory.

Runtime/VM/Clustered_snapshot. cc handles serialization and deserialization of snapshots; API function Dart_CreateXyzSnapshot [AsAssembly] responsible for write a heap snapshot (such as Dart_CreateAppJITSnapshotAsBlobs and Dart_CreateAppAOTSnapshotAssembly); Dart_CreateIsolateGroup You can obtain snapshot data to start the ISOLATE.

Run from AppJIT snapshot

AppJIT snapshots were introduced to reduce JIT warm-up time for large Dart applications, such as Dart Analyzer or Dart 2JS. When these tools are used on small projects, they spend as much time doing the actual work as the VM does JIT compiling these applications.

AppJIT snapshots solve this problem: you can run an application on a VM with some simulated training data, serialize all generated code and VM internal data structures into an AppJIT snapshot, and then distribute this snapshot instead of distributing the application in source (or kernel binary) form.

The VM from this snapshot is still JIT ready.

Run from AppAOT snapshot

AOT snapshots were originally introduced for platforms where JIT compilation was not possible, but they can also be used for fast startup and lower performance penalty situations.

There is a lot of confusion about the performance characteristics of JIT versus AOT:

The JIT can access the local type information of the running application and execute the configuration file, but it must pay the price of warm-up;
AOT can infer and prove various properties globally (for which it must pay compilation time), with no information about how the program actually executes, but AOT-compiled code reaches its peak performance almost immediately, with almost no warm-up.

Dart VM JIts currently have the best peak performance, while Dart VM AOT has the best startup time.

Failure to JIT means:

1. AOT snapshots must contain executable code for each function that can be called during application execution;
2. Executable code shall not rely on any speculative assumptions that may be violated during execution;

To meet these requirements, the AOT compilation process performs a global static analysis (type flow analysis or TFA) to determine which parts of the application can be collected from known entry points, which instances of classes can be allocated, and how types behave in the program.

All of these analyses are conservative: meaning they can’t perform as many optimization executions as the JIT can because it can always de-optimize into unoptimized code to achieve the correct behavior.

All possible functions are compiled into native code without any speculative optimizations, while type flow information is still handled in specialized code (such as de-virtualizing calls).

Once all the functions are compiled, a snapshot of the heap can be taken, and the resulting snapshot can then be run using the precompiled runtime, a special variant of the Dart VM that does not include components such as the JIT and the dynamic code loading tool.

Package: the vm/transformations/type_flow/transformer. The dart is based on the results of TFA type flow analysis and transformation of the entry point; The dart: : Precompiler: : DoCompileAll is the entry point for the VM AOT compilation in the cycle.

Switchable call

Even with global and local analysis, AOT-compiled code may contain calls that cannot be de-virtualized (meaning they cannot be resolved statically). To compensate for this AOT-compiled code, the runtime uses an extension of in-line caching technology in the JIT called Switchable Calls.

The JIT section has described that each inline cache associated with a call point consists of two parts:

Cache object (bydart::UntaggedICDataInstance representation);
The local code block to call (for exampleInlineCacheStub);

In JIT mode, the runtime only updates the cache itself, but AOT runtime can choose to replace the cache and the native code to invoke based on the state of the inline cache.

Initially all dynamic calls start in an unlinked state, and when the first call point SwitchableCallMissStub is called, it simply calls the running helper DRT_SwitchableCallMiss link to that call location.

DRT_SwitchableCallMiss then attempts to convert the call point to a singleton state, where the call point becomes a direct call, which enters the method through a special entry point that verifies that the receiver has the expected class.

In the example above, we assume that the first execution of obj.method() is an instance of C, and obj.method resolves to c.thod.

The next time we execute the same call point, it will call C.mothod directly, bypassing any type of method lookup process.

But it will enter C.thod through a special entry point, which will verify that it obj is still C, and if not, DRT_SwitchableCallMiss will be called and try to select the next call point state.

C. mode D may still be a valid target of the call, for example, obj is an instance of D extends C but not overridden c. mode D, in which case we check whether the call point can be converted to a single target state, Dart by SingleTargetCallStub implementation (see: : UntaggedSingleTargetCache).

An introduction to the Dart VM and runtime mode resolution

How does the Dart VM run your code?

Run the source code through the JIT

Running from a Snapshot

Run from AppJIT snapshot

Run from AppAOT snapshot

Switchable call

Related Posts

ReactNative copies the ONE APP

Profiler analyzes Memory jitter, Memory Analyzer (MAT) analyzes Memory leaks

Arouter goes from use to principle