A Peek into js compilation Principle (V8)

How javascript is compiled (V8)

Is JS really a pure interpreted language?

The V8 compilation process transforms the source code into an abstract syntax tree and generates an execution context. The interpreter (Ignition) generates bytecode from the AST and interprets and executes the bytecode. (V8 didn’t have bytecode before, but converted the AST directly to machine code. Bytecode was introduced to address memory footprint)

Understand the Compiler, Interpreter, Abstract syntax tree (AST), Bytecode, just-in-time Compiler (JIT)

Compiler and Interpreter

Compiled languages require the compiler to compile the program before it can be executed, and after the compilation, the machine-readable binary is kept so that it can be run each time the program is run without having to recompile it. C/C++, GO, etc are compiled languages.

The programs written by interpreted languages need to be dynamically interpreted and executed by an interpreter every time they are run. Python, JavaScript, etc., are interpreted languages.

The execution process is roughly:

1. In the process of compiling a compiled language, the compiler first conducts lexical analysis and syntax analysis of the source code to generate an abstract syntax tree (AST), then optimizes the code, and finally generates machine code that the processor can understand.

If the compilation succeeds, an executable file is generated. But if a syntax or other error occurs during compilation, the compiler will throw an exception and the resulting binary will not be generated successfully.

Our conclusion there is that JS is most accurately portrayed as a compiled language. — Kyle Simpson (the author or YDKJS)

2. In the process of interpreting interpreted languages, the interpreter also performs lexical analysis and syntax analysis on the source code and generates an abstract syntax tree (AST). However, it generates bytecode based on the abstract syntax tree, and finally executes programs and outputs results according to the bytecode.

Compilation first, then execution

The separation of a parsing/compilation phase from the subsequent execution phase is observable fact, not theory or opinion. While the JS specification does not require “compilation” explicitly, it requires behavior that is essentially only practical with a compile-then-execute approach.

There are three program characteristics you can observe to prove this to yourself: syntax errors, early errors, and hoisting.

The separation of the parse/compile phase from the subsequent execution phase is an observed fact, not a theory or opinion. While the JS specification does not need to be explicitly “compiled,” it does require behavior that is essentially applicable to methods compiled before they are executed.

You can observe three program features to prove this to yourself: syntax errors, early errors, and promotion.

Syntax Errors from the Start

var greeting = "Hello";
console.log(greeting);
greeting = ."Hi";
// SyntaxError: unexpected token .
Copy the code

“Hello” is not printed, but instead raises a ‘SyntaxError’. Since the syntax error occurs after ‘console.log()’, if JS runs from top to bottom, “Hello” will be printed before the syntax error is thrown. It didn’t happen.

In fact, the only way the JS engine can know about the syntax error on the third line before executing the first and second lines is to parse the entire program before executing any of the lines.

Early Errors

console.log("Howdy");

saySomething("Hello"."Hi");
// Uncaught SyntaxError: Duplicate parameter name not
// allowed in this context

function saySomething(greeting,greeting) {
    "use strict";
    console.log(greeting);
}
Copy the code

In this case, it’s because of strict mode (only ‘saySomething (…) was selected here). ‘function) disallows functions with duplicate argument names; This is always allowed in non-strict mode.

But how does the JS engine know that the “greeting” parameter is copied? How does it know “saySomething (…)” “Functions are even in strict mode when processing argument lists (‘ use strict ‘occurs only after the function body)?

Again, the only logical explanation is that the code must first be fully parsed before any execution can occur.

Hoisting

function saySomething() {
    var greeting = "Hello";
    {
        greeting = "Howdy";  // error comes from here
        let greeting = "Hi";
        console.log(greeting);
    }
}

saySomething();
// ReferenceError: Cannot access 'greeting' before
// initialization
Copy the code

In this case, the greeting variable for the statement belongs to the statement that declares let greeting= “Hi” on the next line, not the ‘var greeting= “Hello” statement.

The only way the JS engine can know that the next statement will declare a block-scoped variable with the same name is if the JS engine has already processed this code in the previous procedure, and all scopes and their associated variables have been set. This scoping and declaration processing can only be done accurately by parsing the program before execution.

The “ReferenceError” here comes from greeting= “Howdy” prematurely accessing the “greeting” variable, a conflict called temporary dead zones (TDZ).

Generate abstract syntax tree (AST) and execution context

Resources.jointjs.com/demos/javas… Online Parsing AST

Take a piece of code and its AST as an example:

var myName = "Geek Time"
function foo(){
  return 23;
}
myName = "geektime"
foo()
Copy the code

Specific links:

1. Load the byte stream from the network, cache, or server and decode it

2. Tokenize, lexical analysis

The byte stream decoder performs lexical analysis by breaking up lines of source code into tokens. Token refers to the smallest single character or string that is syntactically impossible to divide.

As can be seen from the picture, A variable is simply defined by var myName = “geek time”, Among them, keyword “var”, identifier “myName”, Assignment “=” and Literal string “geek time” are tokens, and they represent different attributes.

The token is created and sent to the parser. The remaining byte streams are sent to the parser in turn, as shown below:

For example, 0066 is decoded as F, 0075 as U, 006E as N, 0063 as C, 0074 as T, 0069 as I, 006f as O, 006E as N, followed by a space. That’s function.

3. Parse

The token data generated in the previous step is converted to AST according to the syntax rules. If the source code is syntactically correct, this step is done smoothly. But if there is a syntax error in the source code, this step terminates and a “syntax error” is thrown.

The engine uses two parsers: a preparser and a parser. To reduce site load time, the engine tries to avoid parsing code that doesn’t need immediate use.

The pre-parser handles the code you might need later, and the parser handles the code you need right away!

If a function is called only when the user clicks a button, there is no need to compile the code immediately in order to load the site.

If the user finally clicks the button and needs that code, it gets sent to the parser.

Babel works by converting ES6 source code into an AST, then converting the AST of ES6 syntax into an AST of ES5 syntax, and finally using the AST of ES5 syntax to generate JavaScript source code. In addition to Babel, ESLint also uses AST.

ESLint is a plug-in that checks for JavaScript writing conventions, and the process involves converting source code to an AST and then using the AST to check for code normalization issues.

Generate Bytecode

Enter the interpreter (Ignition), which generates bytecode based on the AST and interprets the execution of the bytecode. Once the bytecode is fully generated, the AST is removed, clearing the memory space.

Bytecode is a type of code between AST and machine code. But regardless of a particular type of machine code, bytecode needs to be translated into machine code by the interpreter before it can be executed.

As you can see from the figure, machine code takes up much more space than bytecode, so using bytecode can reduce the memory usage of the system.

A just-in-time compiler (JIT) that executes code

In Ignition, if you find a HotSpot — for example, a piece of code that has been executed multiple times — TurboFan, the compiler in the background, compiles the bytecode into efficient machine code, and when it executes the optimized code again, It only needs to execute the compiled machine code, which greatly improves the efficiency of code execution.

The V8 interpreter Ignition means that code starts slowly with a TurboFan, and when it starts, the TurboFan kicks in and becomes more and more efficient over time. Because the hot code is converted to machine code by the TurboFan compiler, executing the machine code directly eliminates the need for bytecode “translation” into machine code.

The technique by which bytecode works with an interpreter and compiler is called just-in-time compilation (JIT). In V8’s case, the interpreter Ignition collects code information while interpreting and executing the bytecode, and when it sees a section of code getting hot, the TurboFan compiler springs to life, converting the hot bytecode into machine code and saving it for future use.

REF： Compiler and Interpreter:

You don’t know JS Yet, 2nd

Dev. To/lydiahallie… GIF of V8 engine

Time.geekbang.org/column/arti… How does V8 execute a piece of JavaScript code?