JavaScript in Plain English series:

  • Lesson 1: What is this in the arrow function?
  • Lesson 2: What does it mean that a function is a first-class citizen?
  • Lesson 3: What is garbage Collection?
  • Lesson 4: How does the V8 engine work?
  • Lesson 5: How Chrome works

Recently, the JavaScript ecosystem has been joined by two very hardcore projects.

Fabrice Bellard has released QuickJS, a new JS engine that converts JavaScript source code into C code and generates executables using a system compiler (GCC or CLang).

Facebook has developed a new JS engine called Hermes for React Native to optimize performance on Android. It compiles JavaScript source code into Bytecode at APP building time, reducing APK size, reducing memory usage, and increasing APP startup speed.

As JavaScript programmers, very few of us have the opportunity or ability to implement a JS engine, but it is necessary to understand it. This article will introduce the V8 engine principle, hope to give you some help.

JavaScript engine

When we write JavaScript code and hand it directly to the browser or Node, the underlying CPU is unaware of it and cannot execute it. The CPU only knows its own instruction set, which corresponds to assembly code. It is a pain to write assembly code. For example, if we want to compute N factorial, we need only 7 lines of recursive function:

function factorial(N) {
    if (N === 1) {
        return 1;
    } else {
        return N * factorial(N - 1); }}Copy the code

The logic of the code is so clear that it fits perfectly into the mathematical definition of factorial that even people who can’t write code can understand it.

However, to write N factorial in assembly language would take 300+ lines of N -factorial. S:

This n-factorial assembly code I wrote in college, which was written N years ago, needed to handle base 10 and base 2 conversions, needed to hold large integers in multiple bytes, and could calculate N factorial of around 500 at most.

Also, different types of cpus have different instruction sets, which means you have to rewrite assembly code for each type of CPU, which can be very frustrating…

Fortunately, the JavaScirpt engine can compile JS code into assembly code for different cpus (Intel, ARM, MIPS, etc.) so that we don’t have to go through the instruction set manual for each CPU. Of course, a JavaScript engine does more than compile code; it also executes code, allocates memory, and collects garbage.

Although there are many browsers, there are few mainstream JavaScirpt engines. After all, developing a JavaScript engine is a very complicated thing. Some of the best-known JS engines are:

  • V8 (Google)
  • SpiderMonkey (Mozilla)
  • JavaScriptCore (Apple)
  • Chakra (Microsoft)
  • IOT: Duktape, JerryScript

Also, recently released QuickJS and Hermes are JS engines, both of which go beyond browsers, and Atwood’s Law is once again proven:

Any application that can be written in JavaScript, will eventually be written in JavaScript.

V8: Powerful JavaScript engine

Of the few JavaScript engines, V8 is far and away the most popular. Chrome has a 60% market share, and Node.js is the de facto standard for JavaScript back-end programming. Many domestic browsers, in fact, are based on Chromium browser development, and Chromium is equivalent to the open source version of Chrome, naturally based on the V8 engine. Amazingly, even the browser world’s maverick Microsoft is in the Chromium camp. In addition, Electron develops desktop applications based on Node.js and Chromium, also based on V8.

The V8 engine was released in 2008, and its name was inspired by the V8 engine of the Super Performance car. It takes some strength to name it that way, and its performance has been steadily improving.

Photo source:v8.dev/

V8 has been very successful in the industry, and it has also been recognized by the academic community and received the ACM SIGPLAN Programming Languages Software Award:

V8’s success is in large part due to the efficient machine code it generates. Because JavaScript is a highly dynamic object-oriented language, many experts believed that this level of performance could not be achieved. V8’s performance breakthrough has had a major impact on the adoption of JavaScript, which is nowadays used on the browser, the server, and probably tomorrow on the small devices of the internet-of-things.

JavaScript is a dynamically typed language, which makes it difficult for compilers to perform well, so experts thought it would be difficult to improve performance, but V8 managed to do just that, generating very efficient Machine code(actually assembly code), which allows JS to be used in a variety of applications. Such as Web, APP, desktop, server, and IOT.

Strictly speaking, the code V8 generates is assembly code, not machine code, but v8-related documents, blogs, and other sources refer to the v8-generated code as Machine Code. Assembly Code and Machine Code are often one-to-one and easily interchangeable, which is also the principle of decompilation, so it is not a mistake for them to call the Code generated by V8 Machine Code, but it is not strict.

The internals of a V8 engine

V8 is a very complex project, with over a million lines of C++ code using cloc statistics.

V8 is made up of many sub-modules, of which these four are the most important:

  • Parser: Converts JavaScript source code to Abstract Syntax Tree (AST)
  • Ignition: Interpreter converts the AST to Bytecode, interprets and executes the Bytecode; Gather information needed for TurboFan optimization compilation, such as the types of function arguments;
  • TurboFan: Compiler, the compiler that converts Bytecode into optimized assembly code using the type information gathered by Ignitio;
  • Orinoco: Garbage collector, the garbage collector module that collects memory space no longer needed by the program;

Parser, Ignition, and TurboFan can compile JS source code into assembly code, and their flow diagrams are as follows:

In a nutshell, Parser converts JS source to AST, then Ignition converts AST to Bytecode, and finally TurboFan converts Bytecode to optimized Machine Code(actually assembly Code).

  • If the function is not called, V8 does not compile it.
  • If the function is called only once, Ignition compiles it and Bytecode explains the execution directly. TurboFan does not optimize compilation because it requires Ignition to gather information about the type of function it executes. This requires that the function be executed at least once before TurboFan can be optimized for compilation.
  • Bytecode is compiled as Optimized Machine Code by TurboFan if the function is called multiple times and it is likely to be recognized as a hot function. TurboFan will compile Bytecode as Optimized Machine Code to improve execution performance when Ignition collects type information that justifies Optimized compilation.

Optimized Machine Code will be restored to Bytecode. This process is called Deoptimization. That’s because Ignition can collect the wrong information, such as an integer argument to the Add function that then turns into a string. Optimized Machine Code has been generated to assume that the arguments to the Add function are integers, which of course is incorrect, so Deoptimization is required.

function add(x, y) {
    return x + y;
}

add(1.2);
add("1"."2");
Copy the code

Before running C, C++, Java and other programs, need to compile, can not directly execute the source code; With JavaScript, however, you can execute the source code directly (for example, Node Server.js) by compiling and executing it at runtime, a method known as just-in-time compilation, or JIT compilation. Therefore, V8 is also a JIT compiler.

Ignition: Interpreter

Node.js is implemented on a V8 engine, so the Node command provides a lot of V8 options. Using Node’s –print-bytecode option, you can print the Bytecode generated in Ignition.

Factorial. Js is as follows, since V8 does not compile a function that has not been called, the factorial function needs to be called on the last line.

function factorial(N) {
    if (N === 1) {
        return 1;
    } else {
        return N * factorial(N - 1);
    }
}

factorial(10); V8 does not compile functions that are not called, so this line cannot be omitted
Copy the code

Use the –print-bytecode option of the Node command (Version 12.6.0) to print the Bytecode generated by Ignition:

node --print-bytecode factorial.js
Copy the code

The console outputs a lot of stuff, the last part of which is Bytecode of factorial:

[generated bytecode for function: factorial]
Parameter count 2
Register count 3
Frame size 24
   18 E> 0x3541c2da112e @    0 : a5                StackCheck
   28 S> 0x3541c2da112f @    1 : 0c 01             LdaSmi [1]
   34 E> 0x3541c2da1131 @    3 : 68 02 00          TestEqualStrict a0, [0]
         0x3541c2da1134 @    6 : 99 05             JumpIfFalse [5] (0x3541c2da1139 @ 11)
   51 S> 0x3541c2da1136 @    8 : 0c 01             LdaSmi [1]
   60 S> 0x3541c2da1138 @   10 : a9                Return
   82 S> 0x3541c2da1139 @   11 : 1b 04             LdaImmutableCurrentContextSlot [4]
         0x3541c2da113b @   13 : 26 fa             Star r1
         0x3541c2da113d @   15 : 25 02             Ldar a0
  105 E> 0x3541c2da113f @   17 : 41 01 02          SubSmi [1], [2]
         0x3541c2da1142 @   20 : 26 f9             Star r2
   93 E> 0x3541c2da1144 @   22 : 5d fa f9 03       CallUndefinedReceiver1 r1, r2, [3]
   91 E> 0x3541c2da1148 @   26 : 36 02 01          Mul a0, [1]
  110 S> 0x3541c2da114b @   29 : a9                Return
Constant pool (size = 0)
Handler Table (size = 0)
Copy the code

The resulting Bytecode is actually quite simple:

  • Use the LdaSmi command to save the integer 1 to the register;
  • Use the TestEqualStrict command to compare the sizes of parameters A0 and 1.
  • If a0 is equal to 1, the JumpIfFalse command does not jump and continues to the next line of code;
  • If a0 is not equal to 1, the JumpIfFalse command jumps to the memory address 0x3541C2DA1139
  • .

As you can see, Bytecode is sort of assembly language, but it doesn’t have a specific CPU, or it has a virtual CPU. This way, bytecodes are much easier to generate without having to produce different code for different cpus. Remember, V8 supports nine different cpus, and the introduction of an intermediate layer, Bytecode, simplifies V8 compilation and improves scalability.

If we generate Bytecode on different hardware, we’ll find that the instructions for generating code are the same:

Photo source:Ross McIlroy

TurboFan: Compiler

Use the –print-code and –print-opt-code options of the Node command to print out TurboFan assembly code:

node --print-code --print-opt-code factorial.js
Copy the code

I ran it on a Mac and it looks like this:

Real assembly code is much less readable than Bytecode. Moreover, the generated assembly code varies depending on the CPU type of the machine.

Never mind the assembly code, because it’s important to understand how TurboFan optimizes the generated assembly code. We can tease out the whole optimization process with the add function.

function add(x, y) {
    return x + y;
}

add(1.2);
add(3.4);
add(5.6);
add("Seven"."8");
Copy the code

Since JS variables are untyped, the arguments to add can be of any type: Number, String, Boolean, etc. This means that add can be numeric addition (V8 also differentiates integers from floating point numbers), String concatenation, or something more complex. If compiled directly, the generated code will have a lot of if… Else branch, pseudo-code as follows:

if (isInteger(x) && isInteger(y)) {
    // Add integers
} else if (isFloat(x) && isFloat(y)) {
    // Add floating point numbers
} else if (isString(x) && isString(y)) {
    // String concatenation
} else {
    // Various other situations
}
Copy the code

I only have four branches, but there are more branches, such as converting arguments to different types. See how ECMASCript defines Addition: 12.8.3The Addition Operator (+).

If you generate assembly code directly from pseudocode, the generated code will be very verbose, which will take up a lot of memory space.

TurboFan can compile Bytecode by assuming that the add(1, 2) parameters are integers. This greatly simplifies the assembly code generated by Ignition:

if (isInteger(x) && isInteger(y)) {
    // Add integers
} else {
    // Deoptimization
}
Copy the code

Of course, this is risky, because if the add parameter is not an integer, the generated assembly code cannot be executed, but Deoptimize as Bytecode.

That is, add(3, 4) and Add (3, 4) can execute the optimized assembly code if TurboFan optimizes the add functions, but Add (“7”, “8”) can only Deoptimize as Bytecode.

Of course, TurboFan does more than just simplify code execution based on type information. It also does other optimizations, such as reducing redundant code, for more complex things.

From this simple example, if we change the type of variable in JS code, it will add a lot of trouble to V8 engine, in order to improve performance, we can try not to change the type of variable.

For projects with high performance requirements, using TypeScript is also a good choice. In theory, performance can be improved if typed code is strictly adhered to. Typed code helps optimize compiled assembly code for the V8 engine, though test data is needed to prove this.

Orinoco: Garbage collection

The powerful garbage collection feature is one of the keys to V8’s improved performance, as it can reclaim memory space and improve memory utilization without affecting JS code execution.

Garbage collection: What is a garbage collection algorithm? There is a detailed introduction, here will not repeat.

The future of JS engines

The V8 engine is certainly powerful, but it’s not all-powerful, and a simple analysis reveals some areas that can be improved.

I have a new idea called Optimized TypeScript Engine that I haven’t decided on a name yet:

  • Programming in TypeScript, following strict typing rules and not writing AnyScript;
  • Build TypeScript directly to Bytecode instead of generating JS files, which eliminates the need for Parse and Bytecode generation.
  • When running, Bytecode needs to be compiled into the assembly code of the corresponding CPU.
  • Because the typed programming method is adopted, it is beneficial for the compiler to optimize the generated assembly code and saves a lot of extra operations.

This idea can actually be implemented with the V8 engine and should be technically feasible:

  • Split Parser and Ignition for the construction phase;
  • Delete the code for TurboFan handling DYNAMIC JS features.

This simplifies the JS engine by eliminating the need to parse and generate Bytecodes, and eliminating the need for the compiler to do a lot of extra work due to the dynamic nature of JavaScript. You can optimize performance by reducing CPU, memory, and power usage, the only problem being that you may have to program with a strict TS syntax.

Why do you do that? Not every smart home appliance needs to install a Snapdragon 855. If you want to apply JS to the IOT field, it is necessary to optimize from the perspective of JS engine. It is useless to just make the upper frame.

That’s pretty much what Facebook’s Hermes does, except it doesn’t require TS programming.

This should be the future of JS engines, and you’ll see it more and more.

About JS, I plan to spend a year writing a series of blog posts ** JavaScript in Plain English **, anything else you don’t know? Leave a comment so I can study it and share it with you. Welcome to add my personal wechat (KiwenLau), I am the technical director of Fundebug, a programmer who loves and hates JS.

reference

  • Celebrating 10 years of V8
  • Launching Ignition and TurboFan
  • JavaScript engines – how do they even?
  • An Introduction to Speculative Optimization in V8
  • Understanding V8 ‘s the Bytecode
  • What happened to JavaScript in 2018?
  • Lesson 3: What is garbage Collection?
  • What level of programmer is Fabrice Bellard?
  • How do you evaluate Fabrice Bellard’s QuickJS engine?

About Fundebug

Fundebug focuses on real-time BUG monitoring for JavaScript, wechat applets, wechat games, Alipay applets, React Native, Node.js and Java online applications. Since its official launch on November 11, 2016, Fundebug has handled over 1 billion error events in total, and paid customers include Sunshine Insurance, Walnut Programming, Lychee FM, Zhangmen 1-to-1, Weimai, Qingtuanshe and many other brand enterprises. Welcome to try it for free!

Copyright statement

Reprint please indicate the author Fundebug and this article addresses: blog.fundebug.com/2019/07/16/…