This article is based on V8 V9.5.172.

Nearly two years have passed since my last V8 article. During this time, I have been pleased to see more and more discussion of V8 in the Chinese-speaking community. The fly in the ointment is that some of the new content that has been around for a long time is still missing from the Chinese community. This article has been extensively reconstructed and updated based on the 2019 article, aiming to bring as new and comprehensive content as possible to your readers. Happy reading

1. Prepare before reading

Before reading, it is recommended that you get ready for D8, the V8 developer command-line tool, which has some experiments based on D8.

It is recommended to read Building V8 with GN, download the source code, compile and build V8 (note that you must specify debug mode when Building), and complete all the experiments in this article.

gm x64.debug # specify build to debug mode
Copy the code

You can also download the d8 directly out of the box and complete most of the experiments in this article.

2. About the V8

V8 is a high-performance JavaScript and WebAssembly engine written in C++ that supports ten processor architectures including familiar ia32, x64 and arm. On the browser side, it runs Chrome and many other browsers with The Chromium kernel. On the server side, it is the execution environment for Node.js and Deno.

Release cycle

  • Every four weeks or so, a new VERSION of V8 is released (four weeks for V8 9.5 and after, and six weeks before).
  • V8 versions correspond to Chrome versions. For example, V8 V9.5 corresponds to Chrome 95

Separate core modules

  • Ignition (Baseline Compiler)
  • SparkPlug (non-optimized compiler faster than Ignition)
  • TurboFan (Optimized Compiler)
  • Liftoff (WebAssembly baseline compiler)
  • Orinoco (Garbage collector)

Other JavaScript Engines

  • Chakra (former Edge JavaScript Engine)
  • JavaScript Core (Safari)
  • SpiderMonkey (Firefox)

The following is a comparison of V8 and other JavaScript engines on Compiler Pipelines. Javascript Engines: The Good Parts

JavaScript’s compiler pipeline in different engines is pretty much the same. In V8, the parser converts JavaScript source into an AST, and the baseline compiler compiles the AST into bytecode. Later, when certain conditions are met, the bytecode is compiled by an optimized compiler to generate optimized machine code.

3. Complete the pipeline

The execution process of JavaScript can be simplified to “compile-execution-garbage collection”. In this article, we will focus on compilation and garbage collection, with more space devoted to the compiler (pipeline).

The following diagram shows the architectural evolution of the V8 compiler pipeline. TurboFan: A new code Generation Architecture for V8

Early V8 had only one Codegen compiler, which compiled the AST generated by the parser directly into machine code, with fast execution but limited optimizations.

Two years later, a new compiler pipeline with the baseline compiler full-codeGen and optimization compiler Crankshaft appeared. The baseline compiler is more concerned with compilation speed, while the optimized compiler is more concerned with the execution speed of compiled code. A combination of baseline and optimized compilers gives JavaScript code faster cold startup and faster execution after optimization.

Although V8 had a baseline and an optimized compiler at this point, the architecture still had many problems. For example, the Crankshaft optimizes only a subset of JavaScript; The lack of isolation between the middle and layers of the compilation pipeline, and in some cases, the need to write assembly code for multiple processor architectures at the same time.

V8 then added another optimized compiler, TurboFan, to the Crankshaft. TurboFan reduces the cost of processor architecture adaptation by introducing a layered design that optimizes ES6 for future features forward.

At this point, however, the baseline compiler full-CodeGen generates unoptimized machine code, which takes up about 30% of the heap space in V8 and is not released even if it is executed only once, which is a significant memory drain.

So V8 introduced bytecode and developed the corresponding baseline compiler, Ignition. Bytecode is much cleaner and more compact than machine code, and consumes less memory, about 50 to 25 percent of equivalent baseline machine code. Igintion’s bytecode can be used directly by TurboFan to generate optimized machine code, simplifying the de-optimization mechanism and making the overall architecture cleaner and more maintainable. Because bytecode is generated faster than optimized machine code, Ignition also reduces the cold startup time of scripts, which in turn speeds up web page loads.

In May 2017, V8 V5.9 officially turned on the JavaScript compiler pipeline of the baseline compiler Ignition and optimization compiler TurboFan by default, and removed the previous Crankshaft and Full-CodeGen compilers.

Liftoff V8 V6.9 was released in August 2018, marking the beginning of V8’s support for both JavaScript and WebAssembly.

In May 2021, V8 V9.1 introduced Sparkplug, a non-optimized compiler, for faster execution before the optimized compiler generates optimized code.

Let’s take a look at how JavaScript is handled in V8 with a snippet of code.

function addTwo(a, b) {
  return a + b
}
Copy the code

4. Parsers and AST

When V8 gets the JavaScript code, the first thing it needs to do is parse it, as shown below. Parsing, Part 1: Optimizing the scanner

The Token is segmented and generated by Scanner, consumed by Parser and processed as an AST. The AST describes the structure of the program, which is consumed by the baseline compiler Ignition and generates bytecode.

Because parsing code takes time, JavaScript engines try to avoid parsing source code completely. On the other hand, during a user visit, there is a lot of code on the page that is not executed, such as actions triggered by user interaction.

To save on unnecessary CPU and memory overhead, all major browsers implemented Lazy Parsing. Instead of generating an AST for every function, the parser can decide to “Pre Parsing” or “complete Parsing” of the functions it encounters.

Preparsing checks the syntax of the source code and throws syntax errors, but it does not resolve the scope of variables in a function or generate an AST. Full parsing analyzes the function body and generates the AST data structure corresponding to the source code.

For those interested in lazy parsing, read V8 for yourself.

We can view the AST information of the code through D8. (Note: only d8 in debug mode can be used here, release mode does not support –print-ast parameter)

// example.js
function addTwo(a, b) {
  return a + b
}
Copy the code
d8 --print-ast example.js
Copy the code

The following figure shows the output information. As you can see, no functions are included in the generated informationaddTwoThis is due to code hitting lazy parsing.We change the code to look like this, add a line of function calls, and execute the d8 command again.

// example.js
function addTwo(a, b) {
  return a + b
}
addTwo(1.2)
Copy the code

The following figure shows the output information.

According to theaddTwo, we can draw the following tree graph. One of the subtrees is for parameter declarations and the other is for the actual function body.

Due to variable promotion, eval, etc., there is no way to know which names correspond to which variables in the program during parsing. The parser initially creates VAR PROXY nodes, which are then connected to the declared VAR node by the scope resolution step. Or mark them as global or dynamic lookups, depending on whether the parser sees an EVAL expression in some surrounding scope.

It is worth noting that the AST generated by V8 is somewhat different from the AST Explorer generated by ESLint/Babel/TS, as I explained in the previous edition.

5. Baseline compiler Ignition

V8 introduces just-in-time (JIT) technology and uses the Ignition baseline compiler to quickly generate bytecode for execution.

Bytecode is an abstraction of machine code, independent of the system architecture. V8’s bytecodes can be thought of as small building blocks that can be combined to implement arbitrary JavaScript functionality. V8 reduces memory usage, parsing overhead, and compilation complexity by introducing bytecode.

In V8, the process of generating bytecode from the AST is implemented via Ignition. Igntion is a register-based interpreter with accumulator. Unlike the physical register, the register is a virtual implementation.

Ignition: Starting an Interpreter for V8

The Ignition bytecode pipeline is shown above. JavaScript code is converted to BytecodeGenerator, which is an AST traverser that implements different bytecode conversion rules for different AST node types, as shown in the following figure.

After that, Ignition does a bunch of optimizations to the bytecode. Register Optimizer is used to optimize unnecessary Register loading and storage operations. Peephole Optimizer is used to optimize a set of instructions into equivalent instructions with better performance; Dead-code Elimination Eliminates code that cannot be executed.

With the d8 command, we can obtain the bytecode of the function, as shown in the figure below.

d8 --print-bytecode example.js
Copy the code

When we call addTwo(1,2), the a0 and A1 registers have already loaded the small integers 1 and 2 through LdaSmi, respectively. Load 2 into the accumulator by calling Ldar A1; Then call Add A0, [0], Add the values in the accumulator and a0, get the new value — 3 in the accumulator; Finally, Return returns the value in the accumulator.

It’s worth notingAdd a0, [0]Here,[0]Is the index of the feedback vector. The analysis information during the execution of bytecode interpretation is stored in the feedback vector, which provides optimization information for the subsequent optimization compiler TurboFan.

The bytecodes used in Ignition are shown below, all of which are available in the V8 source code for those interested.

Ignition: Starting an Interpreter for V8

In practice, the V8 team found that many functions were run only at application initialization, but the function’s bytecode remained in V8’s heap space. To reduce V8 memory overhead, V8 V7.4 introduced the Bytecode flushing technique. V8 keeps track of function usage, increments the function count with each garbage collection, sets the count to zero when the function executes, and reclaims memory when the count exceeds a threshold.

Optimize the TurboFan compiler

More generalizing code performs worse, and conversely, the fewer function types the compiler needs to account for, the smaller and faster the code it produces.

JavaScript is known to be a weakly typed language. There is a lot of ambiguity and type judgment in the ECMAScript standard, and code generated by the baseline compiler Ignition is not efficient enough to execute.

For example, the operands of the + operator can be integers, floating-point numbers, strings, booleans, and other reference types, and can be arranged in different combinations.

function addTwo(a, b) {
  return a + b;
}
addTwo(2.3);                / / 3
addTwo(8.6.2.2);            / / 10.8
addTwo("hello "."world");   // "hello world"
addTwo("true or ".false);   // "true or false"
// There are many combinations...
Copy the code

But that doesn’t mean that JavaScript code can’t be optimized. The types of arguments received are often fixed for a particular program logic. As a result, TurboFan, V8’s optimized compiler, collects Type Feedback at run time via Inline Cache to optimize hot code into more efficient machine code.

Due to space problems, inline caching is not expanded here. JavaScript engine basics: Shapes and Inline Caches. For those interested, read my other article on how V8 runs — Object representations in V8.

To verify the code optimization process, we changed the test code to:

// example.js
function addTwo (a, b) {
  return a + b;
}

for (let j = 0; j < 100000; j++) {
  if (j < 80000) {
    addTwo(10.10);
  } else {
    addTwo('hello'.'world'); }}Copy the code

A Tale Of TurboFan

TurboFan’s execution process is shown above. The code generation process relies on a graph-based IR (Intermediate representation) called “Sea of Nodes,” which is a graph that combines control flow and data flow.

D8 also provides a tool to view the Sea of Nodes graph. We first need to run our script file with the –trace-turbo parameter.

d8 --trace-turbo example.js
Copy the code

CFG and turbo-xxx-xx.json files are generated in the current directory. At this point, you can visually view the Sea of Nodes diagram using V8’s online utility service.

Open the link above and findTurbolizer in V8 V9.5, and then the generated from the previous stepjsonFile import, you can see it in your browserSea of nodesFigure.

Bytecode is translated into intermediate code by TurboFan’s compiler front end, and the final machine code is generated on the compiler back end through optimization, instruction selection, instruction scheduling, register allocation, assembly, and disassembly. TurboFan uses a linear scan algorithm for register allocation that performs better than graph coloring.

Next, we will discuss TurboFan’s IR and compiler code optimization measures, and hot code optimization and de-optimization.

TurboFan IR

TurboFan JIT Design

The layered compiler design introduced in TurboFan provides a clear separation between high-level and low-level compiler optimizations through the JavaScript layer, Intermediate layer (also known as Simple layer in the V8 section documentation), and layered IR design for the Machine layer.

Related to the architecture in TurboFan is the Machine layer in IR, which corresponds to TurboFan’s back end. The architecture related code only needs to be written once, which effectively improves the system scalability, reduces the coupling degree of associated modules and the complexity of the system.

The layering diagram is as follows:

For example, there are three features A, B, and C that need to be migrated to two processor platforms. 3 * 2 = 6 code implementations are required before IR is introduced and 3 + 2 = 5 code implementations are required after IR is introduced. As you can see, one is multiplication and one is addition. The advantages of introducing IR are greatly increased when many features need to be implemented and adapted to multiple processor architectures.

Compiler code optimization measures

Inlining

Inlining is to expand small functions at the called location to save the overhead of function calls, especially for frequently called functions.

// https://docs.google.com/presentation/d/1UXR1H2elTdAYJJ0Eed7lUctCVUserav9sAYSidxp8YE/edit#slide=id.g284582328f_0_43
function add(x, y) {
  return x + y;
}

function three() {
  return add(1.2);
}
Copy the code

For example, after the above function is inlined, we get the following function:

function three_add_inlined() {
  var x = 1;
  var y = 2;
  var add_return_value = x + y;
  return add_return_value;
}
Copy the code

Inlining not only reduces the overhead of functions, but also makes more optimization processes efficient, Such as Constant folding, Strength reduction, Redundancy elimination, Escape analysis, and Scalar replacement Replacement).

For example, the above function can be further optimized by constant folding:

function three_add_const_folder() {
  return 3;
}
Copy the code

Escape analysis and scalar substitution

Escape analysis is used to determine whether the lifetime of an object is limited to the current function and can be used to determine whether scalar substitutions can be made.

// https://docs.google.com/presentation/d/1UXR1H2elTdAYJJ0Eed7lUctCVUserav9sAYSidxp8YE/edit#slide=id.g2957a3ab8f_0_292
class Point {
  constructor(x, y) {
    this.x = x;
    this.y = y;
  }
  distance(that) {
    return Math.abs(this.x - that.x) + Math.abs(this.y - that.y); }}function manhattan(x1, y1, x2, y2) {
  const a = new Point(x1, y1);
  const b = new Point(x2, y2);
  return a.distance(b);
}
Copy the code

The above functions can be converted to:

function manhattan_inl(x1, y1, x2, y2) {
  const a = {x: x1, y: y1};
  const b = {x: x2, y: y2};
  return Math.abs(a.x - b.x) + Math.abs(a.y - b.y);
}
Copy the code

From escape analysis, it can be seen that the life cycles of A and B exist only in the Manhattan_INL, so the function can be optimized by scalar substitution to:

function manhattan_ea(x1, y1, x2, y2) {
  var a_x = x1;
  var a_y = y1;
  var b_x = x2;
  var b_y = y2;
  return Math.abs(a_x - b_x) + Math.abs(a_y - b_y)
}
Copy the code

Scalar substitution can reduce unnecessary object attribute access overhead and transform attribute access into ordinary variable access with less overhead. While increasing the speed of execution, it also reduces the stress of garbage collection.

Optimization and de-optimization of hot code

Note: The tuning covered in this section is not directly related to the tuning in the previous section; it refers to whether the hot code is optimized as a whole, not how a specific implementation is optimized.

For code that executes repeatedly, if multiple executions pass in arguments of the same type, V8 will assume that each subsequent execution will have the same type of argument and optimize the code. The optimized code retains basic type checking, and V8 will always execute the optimized code if the parameter type has not changed each time it is executed.

If the optimized code does not meet the hypothetical conditions, the optimized code will not run, and V8 will “undo” the previous optimization, a step known as “Deoptimization,” and the next type feedback will refer to this unexpected result and use a more generic data type.

In D8, the output of optimization and deoptimization processes can be controlled by the –trace-opt and –trace-deopt parameters, respectively.

// example.js
function addTwo (a, b) {
  return a + b;
}

for (let j = 0; j < 100000; j++) {
  if (j < 80000) {
    addTwo(10.10);
  } else {
    addTwo('hello'.'world'); }}Copy the code
d8 --trace-opt --trace-deopt example.js
Copy the code
[marking 0x0d4e08293421 <JSFunction (sfi = 0xd4e08293291)> for optimized recompilation, reason: hot and stable] [compiling method 0x0d4e08293421 <JSFunction (sfi = 0xd4e08293291)> (target TURBOFAN) using TurboFan OSR] [Optimizing 0x0D4E08293421 <JSFunction (sFI = 0xd4E08293291)> (Target TURBOFAN) - took 5.114, 11.420, [bailout (kind: deopt-soft, reason: Insufficient type feedback for call): begin. deoptimizing 0x0d4e08293421 <JSFunction (sfi = 0xd4e08293291)>, opt id 0, node id 66, bytecode offset 65, deopt exit 2, FP to SP delta 96, caller SP 0x7ffeec6d2528, pc 0x0d4e0090536e] [marking 0x0d4e08293465 <JSFunction addTwo (sfi = 0xd4e082932e9)> for optimized recompilation, reason: small function] [compiling method 0x0d4e08293465 <JSFunction addTwo (sfi = 0xd4e082932e9)> (target TURBOFAN) using TurboFan] [Optimizing 0x0D4E08293465 <JSFunction addTwo (sFI = 0xd4E082932e9)> (Target TurboFan) - took 1.320, 3.947, [completed Optimizing 0x0D4E08293465 <JSFunction addTwo (sFI = 0xD4E082932e9)> (Target TURBOFAN)] [marking 0x0d4e08293421 <JSFunction (sfi = 0xd4e08293291)> for optimized recompilation, reason: hot and stable] [compiling method 0x0d4e08293421 <JSFunction (sfi = 0xd4e08293291)> (target TURBOFAN) using TurboFan OSR] [Optimizing 0x0D4E08293421 <JSFunction (sFI = 0xd4E08293291)> (Target TURBOFAN) - took 4.136, 12.252, 0.400 ms]Copy the code

In this code, we perform the + operation 100,000 times, with the first 80,000 being the addition of two integers and the last 20,000 being the addition of two strings.

By tracing the V8 tuning log, we can see that line 4 of the output, line 10 of the code (on the 80,001 execution) triggers the de-tuning operation as the parameter type changes from an integer to a string. When the new code becomes hot again, the code is optimized again.

When de-optimizing, V8 iterates over all the optimized JavaScript functions, unlinking the functions to the optimized and de-optimized code objects. Larger, more optimized JavaScript functions become performance bottlenecks. Although V8 does Lazy deoptimization for some of the steps, deoptimization can be expensive, and you should try to avoid triggering it when you actually write the function.

Optimization of the pain

TurboFan optimizations provide significant performance gains, but the actual optimization process relies on type feedback and the logic is complex and uncontrolled, making it prone to bugs and bugs.

In a real production environment, our team encountered a problem where the service’s polling address changed to a different value after the user opened the web page for some time. LeuisKen found that this Bug could be stably repeated in V8 V6.7.

A reproducible code is as follows (desensitized) :

var i = "abcdefg"
var n = "comments"
var t = "article"

for (let j = 0; j < 10000; j++) {
    var o = "/api/v1/".concat(t, "/").concat(n, "? commentId=").concat(i);
    if (j % 1000= = =0) {
        console.log(o); }}Copy the code

As can be seen from the execution result, for the same logic, the 5001st and 6001st repetitions of the program execution output completely different content respectively.

In addition, the extended reading at the end of this article includes two convincing team analyses of TurboFan vulnerabilities for those interested.

Orinoco and garbage collection

When memory is no longer needed, it is reclaimed by a garbage collector that runs periodically.

V8 garbage collection has three main phases

  1. Mark: Identify live/dead objects
  2. Clear: Reclaims memory occupied by dead objects
  3. Defragment: Compress and defragment memory

Generation hypothesis

And the generational hypothesis, also known as the weak generational hypothesis. This hypothesis suggests that most newborn objects die after allocation, while older objects tend to persist throughout the program’s run cycle.

V8’s garbage collection is based on the generation hypothesis, which divides memory into new generation and old generation.

Trash Talk: The Orinoco garbage Collector

As the figure shows, the Cenozoic is further subdivided into the Nursery and Intermediate subgenerations (the division is only logical). New objects are assigned to the Nursery generation of the new generation. If an object survives the first garbage collection, its flag bits change and it enters the logical Intermediate subgeneration, which physically remains in the new generation. If the object survives again in the next garbage collection, it enters the old generation. The process of moving objects from the new generation to the old generation is called promotion.

V8 adopts different garbage collection strategies in the new generation and the old generation, making garbage collection more targeted and efficient. V8 also limits the memory size of the new generation and the old generation.

The name of the The main algorithm The maximum capacity
The new generation Scavenge 2 * 16MB(64-bit) / 2 x 8MB (32-bit)
The old generation Mark clearing, mark sorting 4096MB(64-bit) / 2048MB (32-bit)

It is important to note that as memory increases, the number of garbage collections decreases, but the time required for each collection increases, which negatively affects application performance and responsiveness. Therefore, more memory is not always better.

The new generation

The Insane use of the Scavenge algorithm, the idea of swapping space for time.

V8 splits the New generation into two equally sized half-spaces called Form space and To space. When garbage collection is done, V8 checks for living objects in the From space and copies them To the To space. When all live objects have been moved To the To space, V8 will release the From space directly. After each copy, the positions of the From and To Spaces are swapped.

When an object survives a copy, it is moved to the old generation, a process called promotion.

The old generation

According to the generation hypothesis, old-generation objects tend to persist throughout the life of a program, i.e. they rarely need to be recycled. This means that using replication algorithms in older generations is inefficient. V8 in its old generation used the tag scavenging and tag collation algorithms for garbage collection.

Mark-sweep

The principle of tag clearing is simple. The garbage collector starts at the root node, marks the objects directly referenced by the root, and then recursively marks the objects directly referenced by those objects. The object’s reachability is used as a criterion for whether it is “alive”.

The time taken by the marker clearing algorithm is proportional to the number of viable objects.

Mark-compact

The tag sorting algorithm is a combination of the copy algorithm and the tag clearing algorithm.

When we do tag clearing, we may generate memory fragmentation, which is not good for our program’s memory allocation.

To take an extreme example, in the figure below, the objects in blue are new objects that we need to allocate memory for. Until memory is defragmented, all of the defragmentation space (the light color) cannot hold the full object. After memory defragmentation, the defragmentation space is consolidated into a large space that can also accommodate the new object.

The advantages and disadvantages of the tag collation algorithm are obvious. It has the advantage of making heap utilization more efficient. The disadvantage is that it requires additional scanning time and object movement time, and the time taken is proportional to the size of the heap.

About the tag

This section is the source of Concurrent marking in V8

V8 adopts tri-color marking method to identify memory garbage. The three colors are distinguished by two marking bits, namely white (00), gray (10) and black (11).

Initially, all objects were white. The tag starts at the root node and grays the node each time it is traversed.

If all direct children of a gray node are traversed, the gray node turns black.

If there are no new grey nodes, the end is marked, and the remaining white nodes are unreachable and can be safely reclaimed.

V8 also has many optimizations for garbage collection for performance reasons (described in the next section), which can result in garbage collection while allocating new memory. To avoid memory access conflicts, V8 implements Write barriers. The main mechanism by which the write barrier works is to ensure that the black node cannot point to the white node. If a child node is assigned under the black node, the child node will be forced to change from white to gray.

Optimization strategy for garbage collection process

Trash Talk: The Orinoco garbage Collector

When garbage collection is performed, JavaScript execution is inevitably paused. On the other hand, in order for the page to run smoothly, we usually want the page to run at a frame rate of 60 frames per second, which is about 16ms between renders per frame. This means that if the garbage collection plus code execution time exceeds 16ms, the user will experience the stutter.

Orinoco uses parallel, incremental, and concurrent techniques for garbage collection to free up the main thread and allow more time for normal JavaScript code execution.

Parallelism refers to the allocation of garbage collection tasks into roughly equal amounts of work and the simultaneous execution of the main thread and the worker thread. Since there is no JavaScript running during execution, the implementation is simple, just ensuring synchronization between threads.

Delta is when the main thread breaks up a large, centralized garbage collection task and runs it intermittently for a small number of times.

Concurrency is when the main thread keeps JavaScript execution uninterrupted and the helper thread performs garbage collection entirely in the background. It is the most complex of the three strategies because of read-write contention involving the main thread and the worker thread.

In the new generation, V8 used the Scavenge algorithm in parallel.

In its old days, V8 used a strategy of concurrent tagging, parallel collation, and concurrent collection.


The article two years ago also mentioned a “mistake” in calculating the maximum reserved space in a community. Since there are few discussions about it and the space is limited, it will not be described again. Interested students can refer to it by themselves.

8. Sparkplug, a faster non-optimized compiler

TurboFan can’t be optimized ahead of time without sufficient type feedback; At the same time, premature optimization may lead to optimization of non-hot code, resulting in a waste of resources. On the other hand, if you’re using Ignition’s bytecode all the time, it means that your code isn’t executing very efficiently. To solve this problem. V8 introduced the non-optimized compiler Sparkplug in V9.1, which directly generates assembly code from bytecode without optimization.

Notice the “optimization” here, around the point “whether to infer from type feedback”. In fact, the assembly code generated by Sparkplug is optimized for performance compared to Ignition bytecode.

Source – Sparkplug, the New Lightning-Fast V8 baseline JavaScript Compiler

Sparkplug doesn’t generate any intermediate representation like most compilers do, and it can be thought of as a “translator” from Ignition bytecode to CPU bytecode. Its compiler is a for loop nested switch statement that allocates each bytecode to its corresponding, fixed code generation function.

Sparkplug maintains a stack frame compatible with the Ignition interpreter, and whenever the interpreter stores a register value, Sparkplug stores it synchronically, directly reflecting the interpreter’s behavior. Not only does this simplify and speed up Sparkplug compilation, but it also costs almost nothing to integrate with the rest of the system. In addition, this makes it easy to implement OSR (on-stack Replacement, a technique for replacing the Stack frame of a running function).

Sparkplug is designed to reuse as many existing mechanisms as possible (such as built-in functions, macro assembly, stack frames) and minimize architecture-specific code. Also, because Sparkplug is context-free, code can be cached and shared across pages.

In addition, because Sparkplug applies the same type of feedback strategy as Ignition, the resulting TurboFan optimization code is equivalent.

9. WebAssembly baseline compiler Liftoff

Liftoff is a full-platform WebAssembly (WASM) baseline compiler that was enabled in V8 V6.9 in V8 V8.5.

Liftoff’s goal is to reduce WASM application startup time by generating code as quickly as possible.

Liftoff iterates over a piece of WASM code only once, generating machine code for each WASM instruction as it is decoded and validated. (With the Streaming API, WASM code can also be downloaded and compiled like JavaScript code.) While Lifoff is very fast (about 10 megabits per second), there is little room for optimization.

Liftoff: A new baseline Compiler for WebAssembly in V8

As can be seen from the figure above, the generation of Liftoff code does not need to generate IR, but it also reduces the possibility of optimization.

Because WASM is statically typed, there is no need to generate optimized code through type feedback. Therefore, after Liftoff builds, V8 will recompile all functions using TurboFan, which will significantly speed up code execution because TurboFan will optimize the code and apply a better register allocation strategy.

Whenever a function is compiled in TurboFan, the same function compiled in Liftoff is immediately replaced. All subsequent calls to this function will use TurboFan compiled code (unlike Replacing Ignition with Sparkplug, this process is not OSR). For large modules, V8 can take anywhere from 30 seconds to a minute to fully compile the module.

CovalenceConf 2019: Bytecode Adventures with WebAssembly and V8

If the WASM module USES WebAssembly.com pileStreaming loading, TurboFan generated machine code will be cached. When the same WASM module is fetched again using the same URL (the server returns 304 Not Modified), the module is Not compiled but loaded from the cache.

By the way, currently, in V8, WASM modules can use up to 4GB of memory.

Code caching

There are a number of features in Chrome that affect JavaScript execution to some extent. One of them is Code Caching, which was enabled in V8 V6.6.

Code caching makes JavaScript load and execute faster when the user visits the same page and the script file associated with that page is unchanged.

Figure source: Code Caching for JavaScript Developers

The code cache is classified as cold, warm, and hot and stored in memory and disk. The code cache on disk is managed by Chrome to enable cache sharing across multiple V8 instances.

  1. The first time a user requests a JS file (that is, Cold Run), Chrome downloads the file and provides it to V8 for compilation, caches the file itself to disk.

  2. When the user requests this file a second time (i.e., warm Run), Chrome grabs it from the browser cache and gives it to V8 again for compilation. After compiling in the warm Run phase, the compiled code is deserialized and appended as metadata to the cached script file.

  3. When the user requests the JS file a third time (i.e., hot run), Chrome retrieves the file and metadata from the cache and gives both to V8. V8 will skip the compile phase and deserialize the metadata directly.

A link to the

The resources

  • Blazingly fast parsing, part 1: optimizing the scanner
  • An Introduction to Speculative Optimization in V8
  • Understanding V8 ‘s the Bytecode
  • Ignition: Jump-starting an Interpreter for V8
  • Sneak peek into Javascript V8 Engine
  • Launching Ignition and TurboFan
  • Celebrating 10 years of V8
  • TurboFan: A new code generation architecture for V8
  • TurboFan JIT Design
  • Introduction to TurboFan
  • A Tale Of TurboFan
  • Sea of Nodes
  • Deoptimization in V8
  • TurboFan V8 engine backend code analysis
  • Trash talk: the Orinoco garbage collector
  • Sparkplug — a non-Optimizing JavaScript Compiler
  • Sparkplug
  • Mid-Tier Compiler Investigation
  • Sparkplug, the new lightning-fast V8 baseline JavaScript compiler
  • V8 release v7.4
  • Liftoff: a new baseline compiler for WebAssembly in V8
  • WebAssembly compilation pipeline
  • Code caching for WebAssembly developers
  • V8 release v6.6
  • Code caching for JavaScript developers
  • Concurrent marking in V8

Further reading

  • How does V8 work – object representation in V8
  • JavaScript engine basics: Shapes and Inline Caches
  • plctlab/v8-internals
  • Chrome browser framework, V8 exploit vulnerabilities from 0 to 1
  • Chrome V8 engine from -1 length array to remote code execution
  • Ignition: Design Doc
  • V8 TurboFan Register Allocation Design
  • V8 / Chrome Architecture Reading List – For Vulnerability Researchers