ThornWu is a front-end engineer at HIGO.

The author has something to say

The “How V8 Runs” series is a summary of my learning of V8. Since I officially became a front-end engineer a year ago, I have consciously learned about V8. I also found that there were few fresh and original Chinese materials in the technical community, so I began to share the summary of my learning process.

Due to my busy work, I have not updated my blog for half a year. The series began in April with how V8 Runs — Object Representations in V8, which we introduced to object representations in V8 by using Chrome DevTools validation.

This is the first real article in this series. This article is intended to be an outline of the series, which will walk you through each step of the way in which JavaScript is executed in V8, and clarify a common “mistake” in the community. This article won’t delve too deeply into the details (a follow-up chapter will follow), but feel free to leave what you want to know about the V8 engine in the comments, and maybe the next topic will be picked up and introduced first.

Enjoy your reading.

1. Why V8

Any application that can be written in JavaScript, will eventually be written in JavaScript.

Many of you have heard of Atwood’s Law, a well-known Law in the front-end world. In 2007, Jeff Atwood argued that “every application that can be written in JavaScript will eventually be written in JavaScript.” Twelve years later, we do see JavaScript playing a role on the browser side, the server side, the desktop side, the mobile side, and the IoT.

Chrome, on the other hand, has 64.92% market share across all platforms as of now (source: StatCounter). As Chrome’s JavaScript engine, V8 has also played a key role in expanding Chrome’s market share.

As one of the most powerful JavaScript engines, V8 is also ubiquitous. On the browser side, it runs Chrome and many other browsers with The Chromium kernel. On the server side, it is the execution environment for Node.js and the Deno framework. There is also a place for V8 on the desktop and IoT.

2. Knowledge about V8

V8 is a high-performance JavaScript and WebAssembly engine written in C++ that supports eight processor architectures including familiar ia32, x64, and arm.

The V8 release cycle

  • Every six weeks or so, a new V8 version comes out
  • V8 versions correspond to Chrome versions. For example, V8 V7.8 corresponds to Chrome 78

The V8 competing goods

  • Chakra (former Edge JavaScript Engine)
  • JavaScript Core (Safari)
  • SpiderMonkey (Firefox)

Important parts of V8

  • Ignition (Baseline Compiler)
  • TurboFan (Optimized Compiler)
  • Orinoco (Garbage collector)
  • Liftoff (WebAssembly baseline compiler)

Liftoff is a baseline compiler for WebAssembly that has been enabled since V8 6.8. Although version 6.8 was released in August 2018, there are currently a number of articles in the community introducing V8 published after that date that do not mention Liftoff. Whether or not Liftoff is included in an article can also be a sign that the content is stale.

Because WebAssembly is outside the scope of this article, Liftoff will be omitted.

3. V8 JavaScript execution pipeline

The early V8 execution pipeline consisted of the baseline compiler full-codeGen and the optimization compiler CrankShaft. (The V8 implementation pipeline has been adjusted for many times. This paper only selects a key stage in the early implementation pipeline. Students who are interested in the evolution of the implementation pipeline can learn about it through the V8 lecture).

The baseline compiler pays more attention to compilation speed, while the optimized compiler pays more attention to the execution speed of compiled code. A combination of baseline and optimized compilers gives JavaScript code faster cold startup and faster execution after optimization.

There were a number of problems with this architecture. For example, the Crankshaft could only optimize a subset of JavaScript; The lack of isolation between the middle and layers of the compilation pipeline, and in some cases, the need to write assembly code for multiple processor architectures at the same time.

Over the years, V8 has evolved into a JavaScript execution pipeline of parsers, the baseline compiler Ignition, and the optimized compiler TurboFan to address architectural clutter and scaling difficulties.

The parser converts the JavaScript source code into an AST, the baseline compiler compiles the AST into bytecode, and when the code meets certain conditions, the optimized compiler recompiles to generate optimized bytecode.

Here we have to mention the idea of layering. In the process of pipeline improvement, Intermediate representation (IR) is introduced to effectively improve the system scalability, reduce the coupling degree of associated modules and the complexity of the system.

For example, there are three features A, B, and C that need to be migrated to two processor platforms. 3 * 2 = 6 code implementations are required before IR is introduced and 3 + 2 = 5 code implementations are required after IR is introduced. As you can see, one is multiplication and one is addition. The advantages of introducing IR are greatly increased when many features need to be implemented and adapted to multiple processor architectures.

Here’s a snippet of code that looks at how JavaScript is handled in V8.

// example1.js
function addTwo(a, b) {
  return a + b
}
Copy the code

4. Parsers and AST

Parsing code takes time, so JavaScript engines try to avoid parsing source files entirely. On the other hand, during a user visit, there is a lot of code on the page that is not executed, such as actions triggered by user interaction.

Because of this, all major browsers implemented Lazy Parsing. Instead of generating an AST (Abstract Syntax tree) for every function, the parser can decide to “pre-parsing” or “fully parsing” the functions it encounters.

Preparsing checks the syntax of the source code and throws syntax errors, but it does not resolve the scope of variables in a function or generate an AST. Full parsing analyzes the function body and generates the AST data structure corresponding to the source code. Pre-parsing is twice as fast as normal parsing.

The generation of AST mainly goes through two stages: word segmentation and semantic analysis. AST is designed to describe the specific syntactic composition of source code through a structured tree data structure, often used for syntax checking (static code analysis), code obfuscation, code optimization, and so on.

We can use the AST Explorer tool to generate an AST of our JavaScript code.

// example1.js
function addTwo(a, b) {
  return a + b
}
Copy the code

It is important to note that the diagram above only describes the rough structure of the AST. V8 has its own AST representation and generates a different AST structure.

5. Baseline compiler Ignition and bytecode

V8 introduces just-in-time (JIT) technology and uses the Ignition baseline compiler to quickly generate bytecode for execution.

Bytecode is an abstraction of machine code. If the bytecode design is the same as the computing model of the physical CPU, it is much easier to compile bytecode into machine code. This is why the interpreter is usually a register or stack machine. Ignition is a register with an accumulator. (Understanding V8’s Bytecode)

Ignition generates much smaller bytecodes than the previous baseline compiler, Full-CodeGen, which generates machine code. Bytecode can be used directly by TurboFan, the optimization compiler, to generate graphs (TurboFan’s optimization of code is based on graphs), avoiding the need for the optimization compiler to re-parse the JavaScript source code when optimizing the code.

Using the D8 tool (V8 developer Shell, available by compiling the V8 source code, as shown in Building V8 with GN), you can view the bytecode generated by Ignition compilation.

d8 --print-bytecode example1.js
Copy the code
[generated bytecode for function:  (0x2d5c6af1efe9 <SharedFunctionInfo>)]
Parameter count 1
Register count 3
Frame size 24
         0x2d5c6af1f0fe @    0 : 12 00             LdaConstant [0]
         0x2d5c6af1f100 @    2 : 26 fb             Star r0
         0x2d5c6af1f102 @    4 : 0b                LdaZero 
         0x2d5c6af1f103 @    5 : 26 fa             Star r1
         0x2d5c6af1f105 @    7 : 27 fe f9          Mov <closure>, r2
         0x2d5c6af1f108 @   10 : 61 2c 01 fb 03    CallRuntime [DeclareGlobals], r0-r2
         0x2d5c6af1f10d @   15 : a7                StackCheck 
         0x2d5c6af1f10e @   16 : 0d                LdaUndefined 
         0x2d5c6af1f10f @   17 : ab                Return 
Constant pool (size = 1)
0x2d5c6af1f0b1: [FixedArray] in OldSpace
 - map: 0x2d5c38940729 <Map>
 - length: 1
           0: 0x2d5c6af1f021 <FixedArray[4]>
Handler Table (size = 0)
Copy the code

All of the bytecode operators in Ignition are available in the V8 source code for those who are interested.

6. Optimize the compiler TurboFan with optimization and de-optimization

The less variation the compiler needs to take into account in the type of function input, the smaller and faster the generated code will be.

JavaScript is known to be a weakly typed language. There is a lot of ambiguity and type judgment in the ECMAScript standard, so code generated through the baseline compiler is inefficient to execute.

For example, an operand of the + operator could be an integer, a floating-point number, a string, a Boolean, and other reference types, not to mention various combinations of them (see how + is defined in the ECMAScript standard).

function addTwo(a, b) {
  return a + b;
}
addTwo(2.3);                / / 3
addTwo(8.6.2.2);            / / 10.8
addTwo("hello "."world");   // "hello world"
addTwo("true or ".false);   // "true or false"
// There are many combinations...
Copy the code

But that doesn’t mean that JavaScript code can’t be optimized. The parameters received for a particular program logic are often of fixed type. For this reason, V8 introduced the type feedback technique. V8 uses type feedback to dynamically check all parameters as it performs the computation.

Simply put, for code that executes repeatedly, if arguments of the same type are passed in multiple executions, V8 will assume that each subsequent execution will have the same type of argument and optimize the code. Basic type checking is retained in the optimized code. V8 will always execute the optimized code if the parameter type is not changed each time after the execution. When the type of argument passed in a later execution changes, V8 “undoes” the previous optimization, a step called “Deoptimization.”

Let’s take a look at the above code and see how it is optimized in V8.

// example2.js
function addTwo (a, b) {
  return a + b;
}

for (let j = 0; j < 100000; j++) {
  if (j < 80000) {
    addTwo(10.10);
  } else {
    addTwo('hello'.'world'); }}Copy the code
d8 --trace-opt --trace-deopt example2.js
Copy the code
[marking 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> foroptimized recompilation, reason: hot and stable] [compiling method 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> using TurboFan OSR] [optimizing 0x2ecFB2a5F229 <JSFunction (sFI = 0x2ECFB2a5F049)> - took 5.268, 5.305, 0.023ms [DEOPT soft: begin 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> (opt#0) @2, FP to SP delta: 96, caller sp: 0x7ffee48218c8]
            ;;; deoptimize at <example2.js:10:5>, Insufficient type feedback for call
  reading input frame  => bytecode_offset=80, args=1, height=5, retval=0(# 0); inputs:
      0: 0x2ecfb2a5f229 ;  [fp -  16]  0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)>
      1: 0x2ecfbcf815c1 ;  [fp +  16]  0x2ecfbcf815c1 <JSGlobal Object>
      2: 0x2ecfb2a418c9 ;  [fp -  80]  0x2ecfb2a418c9 <NativeContext[253]>
      3: 0x2ecf2a140d09 ; (literal  4) 0x2ecf2a140d09 <Odd Oddball: optimized_out>
      4: 0x000000027100 ; rcx 80000
      5: 0x2ecfb2a5f299 ; (literal  6) 0x2ecfb2a5f299 <JSFunction addTwo (sfi = 0x2ecfb2a5f0b1)>
      6: 0x2ecfb2a5efd1 ; (literal  7) 0x2ecfb2a5efd1 <String[#5]: hello>
      7: 0x2ecfb2a5efe9 ; (literal  8) 0x2ecfb2a5efe9 <String[#5]: world>
      8: 0x2ecf2a140d09 ; (literal  4) 0x2ecf2a140d09 <Odd Oddball: optimized_out>
  translating interpreted frame  => bytecode_offset=80, variable_frame_size=48, frame_size=104
    0x7ffee48218c0: [top +  96] <- 0x2ecfbcf815c1 <JSGlobal Object> ;  stack parameter (input # 1)
    -------------------------
    0x7ffee48218b8: [top +  88] <- 0x00010bd36b5a ;  caller's pc 0x7ffee48218b0: [top + 80] <- 0x7ffee48218d8 ; caller's fp
    0x7ffee48218a8: [top +  72] <- 0x2ecfb2a418c9 <NativeContext[253]> ;  context (input # 2)
    0x7ffee48218a0: [top +  64] <- 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> ;  function (input # 0)
    0x7ffee4821898: [top +  56] <- 0x2ecfb2a5f141 <BytecodeArray[99]> ;  bytecode array
    0x7ffee4821890: [top +  48] <- 0x00000000010a <Smi 133> ;  bytecode offset
    -------------------------
    0x7ffee4821888: [top +  40] <- 0x2ecf2a140d09 <Odd Oddball: optimized_out> ;  stack parameter (input # 3)
    0x7ffee4821880: [top +  32] <- 0x000000027100 <Smi 80000> ;  stack parameter (input # 4)
    0x7ffee4821878: [top +  24] <- 0x2ecfb2a5f299 <JSFunction addTwo (sfi = 0x2ecfb2a5f0b1)> ;  stack parameter (input # 5)
    0x7ffee4821870: [top +  16] <- 0x2ecfb2a5efd1 <String[#5]: hello> ; stack parameter (input #6)
    0x7ffee4821868: [top +   8] <- 0x2ecfb2a5efe9 <String[#5]: world> ; stack parameter (input #7)
    0x7ffee4821860: [top +   0] <- 0x2ecf2a140d09 <Odd Oddball: optimized_out> ;  accumulator (input # 8)
[deoptimizing (soft): end 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> @2 => node=80, pc=0x00010bd394e0, callerSp =0x7ffee48218c8, took 0.339ms] [marking 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)>for optimized recompilation, reason: hot and stable]
[marking 0x2ecfb2a5f299 <JSFunction addTwo (sfi = 0x2ecfb2a5f0b1)> for optimized recompilation, reason: small function] [compiling method 0x2ecfb2a5f299 <JSFunction addTwo (sfi = 0x2ecfb2a5f0b1)> using TurboFan] [compiling method 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> using TurboFan OSR] [optimizing 0x2ecfb2a5f229 <JSFunction (sfi = 0x2ecfb2a5f049)> [optimizing 0x2ecfb2a5F299 <JSFunction addTwo (sFI = 0x2ECfb2a5F0b1)> - took [completed Optimizing 0x2ecFb2A5F299 <JSFunction addTwo (sFI = 0x2ECFB2A5F0B1)>]Copy the code

In this code, we perform the + operation 100,000 times, with the first 80,000 being the addition of two integers and the last 20,000 being the addition of two strings.

Following V8’s tuning log, we can see that line 10 of the code (the 80,001 execution) triggers the de-tuning operation as the argument type changes from an integer to a string.

Note that de-tuning is expensive and should be avoided when actually writing functions.

7. Recycling

When memory is no longer needed, it is reclaimed by a garbage collector that runs periodically.

Any garbage collector has some basic tasks that must be done regularly.

  1. Identify objects of life/death
  2. Reclaim/reuse memory occupied by dead objects
  3. Compress/defragment memory (optional)

V8 garbage collection has three main stages: mark, sweep, and collate.

Generation hypothesis

And the generational hypothesis, also known as the weak generational hypothesis. This hypothesis suggests that most newborn objects die after allocation (” burn “), while older objects tend to live forever.

V8’s garbage collection is based on the generation hypothesis, which divides memory into new generation and old generation.

V8 blog

As the figure shows, the Cenozoic is further subdivided into the Nursery and Intermediate subgenerations (the division is only logical). New objects are assigned to the Nursery generation of the new generation. If an object survives the first garbage collection, its flag bits change and it enters the logical Intermediate subgeneration, which physically remains in the new generation. If the object survives again in the next garbage collection, it enters the old generation. The process of moving objects from the new generation to the old generation is called promotion.

V8 adopts different garbage collection strategies in the new generation and the old generation, making garbage collection more targeted and efficient. V8 also limits the memory size of the new generation and the old generation.

The name of the algorithm The size of the
The new generation The Parallel algorithm of Scavenge 32MB (64-bit) / 16MB (32-bit)
The old generation Tag clearing, tag sorting algorithm 1400MB (64-bit) / 700MB (32-bit)

It is important to note that as memory increases, the number of garbage collections decreases, but the time required for each collection increases, which negatively affects application performance and responsiveness. Therefore, more memory is not always better.

The new generation

V8 uses the Parallel Avenge algorithm, which is similar to the Halstead algorithm (the Cheney like algorithm was used prior to V8 V6.2) and is based on the replication algorithm.

The replication algorithm is a way of exchanging space for time.

V8 splits the New generation into two equally sized half-spaces called Form space and To space. When garbage collection is done, V8 checks for living objects in the From space and copies them To the To space. V8 then frees the dead object’s space directly. After each copy, the positions of From and To are swapped.

When an object survives a copy, it is moved to the old generation, a process called promotion.

The old generation

According to the generation hypothesis, objects of the old generation tend to be immortal, i.e. they rarely need to be recycled, which means that copying algorithms are not feasible in the old generation. V8 in its old generation used the tag scavenging and tag collation algorithms for garbage collection.

Mark-sweep

Mark clearing has been around for more than half a century. Its algorithm principle is very simple. The garbage collector starts at the root node, marks the objects directly referenced by the root, and then recursively marks the objects directly referenced by those objects. The reachability of an object as a basis for whether or not it is “alive”.

The time taken by the marker clearing algorithm is proportional to the number of active objects.

Mark-compact

The tag cleaning algorithm is a combination of the copy algorithm and the tag cleaning algorithm.

When we do tag cleaning, we may generate memory fragmentation, which is not good for our program’s memory allocation.

To take an extreme example, in the figure below, the blue objects are new objects that we need to allocate memory for. Before defragmenting, all the defragmenting space cannot hold the whole object, and after defragmenting, the defragmenting space is merged into a large space that can also hold the new object.

The advantages and disadvantages of the tag collation algorithm are obvious. It has the advantage of making heap utilization more efficient. The disadvantage is that it requires additional scanning time and object movement time, and the time taken is proportional to the size of the heap.

Maximum Reserved Space — a Long-standing “mistake” in a Community

V8 reserves space in heap memory for the new generation, extending the concept of a Max Reserved space. Max_old_generation_size_ (the largest space of the old generation) and max_semi_space_size_ (the largest space of the new generation) are the main factors affecting the size of the maximum reserved space. In Node, the former can be specified by –max-old-space-size.

The calculation method that has been circulating in the community for a long time is “maximum reserved space = 4 * maximum half space of the new generation + maximum space of the old generation”, which should be derived from Teacher Piao Ling’s “Simple In Depth Node.js”. But since the book was published (In December 2013), the calculation of the maximum reserved space has actually changed twice.

5.1.277 and earlier (Version corresponding to Simple Node.js)

// Returns the maximum amount of memory reserved for the heap. For
// the young generation, we reserve 4 times the amount needed for a
// semi space. The young generation consists of two semi spaces and
// we reserve twice the amount needed for those in order to ensure
// that new space can be aligned to its size.
intptr_t MaxReserved() {
  return 4 * reserved_semispace_size_ + max_old_generation_size_;
}
Copy the code

5.1.278 version

// Returns the maximum amount of memory reserved for the heap.
intptr_t MaxReserved() {
  return 2 * max_semi_space_size_ + max_old_generation_size_;
}
Copy the code

7.4.137 version

size_t Heap::MaxReserved() {
  const size_t kMaxNewLargeObjectSpaceSize = max_semi_space_size_;
  return static_cast<size_t> (2 * max_semi_space_size_ +
                             kMaxNewLargeObjectSpaceSize +
                             max_old_generation_size_);
}
Copy the code

In short, these two adjustments have numerically changed the coefficient of the “Cenozoic largest half space” from 4 times to 2 times to 3 times.

According to the Release record of Node.js, the corresponding relationship between the V8 version and Node.js version is as follows:

V8 version Node. Js version
5.1.277 and earlier Version 6.4.0 and earlier
5.1.278-7.4.136 After 6.4.0, before 12.0.0
7.4.137 and later Version 12.0.0 and later

Considering that Node.js version 6.4.0 was released in August 2016 and the current LTS version is no longer maintained, it is reasonable to infer that the second and third computing methods with a large proportion of users are currently used. Data of community, however, scarcely mentioned the two changes (I only find an article on zhihu on calculation mentioned in the second column), at the same time there are still a lot of new post still used the first calculation without indicate the Node. Js version, easy to let readers think the largest reserve space calculation has not changed, A lot of outdated information has clearly caused “errors”.

Code caching

There are many features in Chrome that affect JavaScript execution to a greater or lesser extent. One of them is Code Caching.

Code caching makes JavaScript load and execute faster when the user accesses the same page and the script file associated with that page is unchanged.

Source: V8 blog

The code cache is classified as cold, warm, and hot.

  1. The first time a user requests a JS file (that is, cold Run), Chrome downloads it and provides it to V8 for compilation and caches it to disk.

  2. When the user requests the JS file a second time (known as warm Run), Chrome grabs the file from the browser cache and gives it to V8 again for compilation. After compiling in the warm Run phase, the compiled code is deserialized and appended as metadata to the cached script file.

  3. When the user requests the JS file a third time (i.e., hot run), Chrome retrieves the file and metadata from the cache and gives both to V8. V8 will skip the compile phase and deserialize the metadata directly.

A link to the

The resources

  • V8 blog
  • V8 source

, recruiting

HIGO is a famous global fashion shopping shop in China, led by Mr. Xu Yirong, founder of Meilishuo, and had the exclusive title running Man in the third season. Our dream is for China to have the best beauty and design in the world.

We celebrate people who love to create new things. They are enthusiastic, critical and fun. We believe they will make the world a better place. We are such a group of people, if you are, you are welcome to join!

Please send your resume to [email protected].