How does V8 execute and recycle JavaScript code

It’s a new semester (two months after graduation hahaha), September rush!

Let’s take a look at how the V8 engine executes JavaScript and how garbage collection works.

V8 executes the code process

Before getting into V8’s mechanism for executing JavaScript code, let’s look at the differences between compiled and interpreted languages.

Compiled and interpreted languages

We know that machines cannot understand code directly. Therefore, before executing the program, the code needs to be translated into a machine language that the machine can understand. According to the execution process of language, computer language can be divided into compiled language and interpreted language:

Compiled language: before the code is run, the compiler directly converts the corresponding code into machine code. The compiled results can be used directly without re – translation at run time.
Interpreted languages: Require conversion of code to machine code, which differs from compiled languages in that conversion is required at runtime. Interpreted languages are slower to execute than compiled languages because each execution requires a conversion of the source code.

While languages like Java and C++ are compiled languages, JavaScript is interpreted and generally performs slightly slower than compiled languages. V8 is one of the most powerful javascript engines in the browser world, and it is the core of Chrome. Node.js is based on V8.

The specific flow of compiled and interpreter language code is as follows:

The execution process of the two is as follows:

In the process of compiling a compiled language, the compiler first conducts lexical analysis and syntax analysis of the source code to generate an abstract syntax tree (AST), then optimizes the code, and finally generates machine code that the processor can understand. If the compilation succeeds, an executable file is generated. But if a syntax or other error occurs during compilation, the compiler will throw an exception and the resulting binary will not be generated successfully.
In interpreted languages, the interpreter also performs lexical analysis and syntax analysis of the source code, and generates an abstract syntax tree (AST). However, it generates bytecodes based on the abstract syntax tree, and finally executes programs and outputs results based on the bytecodes.

2. V8 Executes the code process

V8 uses interpreters and compilers during execution. Its execution process is as follows:

Parse phase: THE V8 engine converts JS code into an AST (abstract syntax tree);
Ignition phase: The interpreter converts the AST to bytecode, and parsing the bytecode execution also provides the information needed for the next phase of optimized compilation;
TurboFan stage: The compiler uses the information gathered in the previous stage to optimize bytecode into executable machine code;
Orinoco stage: Garbage collection stage, where memory space that is no longer used in the program is reclaimed.

The first three steps are JavaScript execution, and the last step is garbage collection. Let’s take a look at how V8 executes JavaScript.

(1) Generate abstract syntax tree

This process converts the source code into an abstract syntax tree (AST) and generates an execution context, which is information about the environment in which the code is being executed.

Parsing JS code into an AST has two main stages:

Lexical analysis: This stage breaks down the source code into the smallest, non-divisible lexical units called tokens. For example, var a = 1; Is usually decomposed into var, a, =, 1,; These five lexical units. Whitespace in code is ignored directly in JavaScript, which simply means parsing JavaScript code into tokens.
Syntax analysis: This process is to convert the token data generated in the previous step into the AST according to the syntax rules. If the source code is syntactically correct, this step is done smoothly. If there is a syntax error in the source code, this step terminates and throws a syntax error, which simply means that the token is assembled into an abstract syntax tree (AST).

The lexical analysis will parse the code character by character to generate tokens with different types such as keywords, identifiers, symbols, numbers, and so on. Var a = 1; Will be converted to the following token:

Keyword(var)
Identifier(name)
Punctuator(=)
Number(1)
Copy the code

In the syntax analysis phase, tokens are used to generate an abstract syntax tree. In the process of generating the tree, unnecessary symbolic tokens are removed and generated according to the syntax rules. Let’s look at two pieces of code:

// The first code
var a = 1;
// Second code
function sum (a,b) {
  return a + b;
}
Copy the code

Converting these two pieces of code into the AST abstract syntax tree, respectively, returns the following JSON:

The first piece of code, compiled as follows:

{
  "type": "Program"."start": 0."end": 10."body": [{"type": "VariableDeclaration"."start": 0."end": 10."declarations": [{"type": "VariableDeclarator"."start": 4."end": 9."id": {
            "type": "Identifier"."start": 4."end": 5."name": "a"
          },
          "init": {
            "type": "Literal"."start": 8."end": 9."value": 1."raw": "1"}}]."kind": "var"}]."sourceType": "module"
}
Copy the code

It looks like this:

The second piece of code, compiled as follows:

{
  "type": "Program"."start": 0."end": 38."body": [{"type": "FunctionDeclaration"."start": 0."end": 38."id": {
        "type": "Identifier"."start": 9."end": 12."name": "sum"
      },
      "expression": false."generator": false."async": false."params": [{"type": "Identifier"."start": 14."end": 15."name": "a"
        },
        {
          "type": "Identifier"."start": 16."end": 17."name": "b"}]."body": {
        "type": "BlockStatement"."start": 19."end": 38."body": [{"type": "ReturnStatement"."start": 23."end": 36."argument": {
              "type": "BinaryExpression"."start": 30."end": 35."left": {
                "type": "Identifier"."start": 30."end": 31."name": "a"
              },
              "operator": "+"."right": {
                "type": "Identifier"."start": 34."end": 35."name": "b"}}}]}}],"sourceType": "module"
}
Copy the code

It looks like this:

As you can see, AST is just an abstract representation of the syntax structure of the source code, and the computer will not directly recognize THE JS code. Converting into an abstract syntax tree is only the first step in the process of recognizing the JS code. The structure of the AST is very similar to the structure of code, and you can also think of the AST as a structured representation of code that the compiler or interpreter depends on for subsequent work.

Application scenarios of the AST:

AST is an important data structure and is used in many places. In Babel, for example, Babel is a code transcoder that converts ES6 code into ES5 code. Babel works by converting ES6 source code into an AST, then converting the AST of ES6 syntax into an AST of ES5 syntax, and finally using the AST of ES5 syntax to generate JavaScript source code.

In addition to Babel, ESLint also uses AST. ESLint is a plug-in that checks for JavaScript writing conventions, and the process involves converting source code to an AST and then using the AST to check for code normalization issues.

In addition to the above application scenarios, there are many other application scenarios of AST:

JS decompression, syntax parsing;
Code highlighting;
Keyword matching;
Code compression.

(2) Generate bytecode

Once you have an abstract syntax tree AST and an execution context, it’s the interpreter’s turn to generate bytecode from the AST and interpret the execution of the bytecode.

In earlier versions of V8, conversion to machine code was done directly through the AST. Converting the AST directly to machine code presents some problems:

Direct conversions create a memory footprint problem because the abstract syntax tree is entirely generated in machine code, which takes up much more memory than bytecode.
For some JavaScript usage scenarios it is more appropriate to use an interpreter, parsed into bytecode, and for some code there is no need to generate machine code, thereby minimizing the memory footprint.

To address the memory footprint, bytecode was introduced in the V8 engine. So what is bytecode? Why should the introduction of bytecode solve the memory footprint problem?

Bytecode is a type of code between AST and machine code. It needs to be converted to machine code before it can be executed. Bytecode is an abstract description of machine code and has a smaller amount of code than machine code, thus reducing memory consumption. In addition to quickly generating unoptimized bytecodes, the interpreter can also execute partial bytecodes.

(3) Generate machine code

Once the bytecode is generated, it is time to go to the execution phase, which is essentially generating the bytecode into machine code.

Typically, if the bytecode is executed for the first time, the interpreter interprets the execution item by item. In the process of execution bytecode, if found to have hot code (repeat code, run more than a certain threshold is marked as hot code), then the background of the compiler will compile this hot bytecode for efficient machine code, and then when executed again this is optimized code, you just need to execute the compiled machine code, This improves the efficiency of code execution.

The technology by which bytecode works with an interpreter and compiler is just-in-time compilation (JIT). In V8, this means that the interpreter collects code information as it interprets the execution of the bytecode, and when it sees a section of code getting hot, the compiler pops in, converts the hot bytecode into machine code, and saves the converted machine code for future use.

Because the V8 engine is multithreaded, the compiler’s compilation thread and bytecode generation thread are not on the same thread, so they can be used in conjunction with the interpreter, independent of the other side. Here’s how JIT works:

Once the interpreter gets the AST, it interprets and executes as needed. That is, if a function is not called, it is not interpreted to execute. In this process, the interpreter collects repeated optimizable operations to generate analysis data. The resulting bytecode and analysis data are then passed to the compiler, which generates highly optimized machine code from the analysis data.

The optimized machine code works much like a cache, and can be executed directly when the interpreter encounters the same content again. Of course, sometimes the optimized code will not work (for example, if the function parameter type changes), then it will be de-optimized again and passed to the interpreter as bytecode.

The whole process is shown below:

3. Perform process optimization

If JavaScript code has to be fully parsed before it can be executed, you may face the following problems:

Longer code execution time: Parsing all code at once increases the code’s runtime.
More memory consumption: The AST parsed and the bytecode compiled from the AST are stored in memory, which takes up more memory.
Disk space: Compiled code is cached on disk and occupies disk space.

Therefore, V8 uses deferred parsing: during parsing, functions that are not immediately executed are pre-parsed; A function is fully parsed only when it is called.

Pre-parser is a pre-parser that verifies function syntax, parses function declarations, determines function scope, and does not generate AST.

Take the following code for example:

function sum(a, b) {
    return a + b;
}
const a = Awesome!;
const c = 996;
foo(1.1);
Copy the code

The V8 Parser parses code from the top down. When the Parser encounters a function declaration foo, it finds that it is not executed immediately, so it is pre-parsed with the pre-Parser. Only the function declaration is parsed, not the internal code of the function, and no AST is generated for the internal code of the function.

The interpreter then compiles the AST into bytecode and executes it. The interpreter executes the code in top-down order, starting with const a = 666; And const c = 996; Sum (1, 1) is then executed. Then the Parser continues to parse the code inside the function and generates the AST, which is then compiled and executed by the interpreter.

V8 garbage collection mechanism

1. JavaScript memory management mechanism

Computer programming languages run on corresponding code engines, and the process of using memory can be divided into the following three steps:

Allocate the required system memory space;
Read or write operations using allocated memory;
Free or return memory when it is no longer needed.

In JavaScript, when a variable is created, the system automatically allocates the corresponding memory to the object. Consider the following example:

var a = 123; // Allocate stack memory for numeric variables
var etf = "ARK"; // Allocate stack memory for strings
// Allocate heap memory for objects and their contained values
var obj = {
  name: 'tom'.age: 13
}; 
// Allocate memory for arrays and their values
var a = [1.null."str"]; 
// Allocate memory to the function
function sum(a, b){
  return a + b;
}
Copy the code

Data in JavaScript falls into two categories:

Basic types: These types occupy a fixed amount of memory, and their values are stored in stack space, which can be accessed directly by values;
Reference type: Because the size of the reference type value is not fixed, the storage address in stack memory refers to the object in heap memory, which is accessed by reference.

Basic types in stack memory that can be handled directly by the operating system; The reference types in heap memory, because they can change frequently and not be fixed in size, need to be handled by the JavaScript engine through the garbage collection mechanism. Garbage collection is when JavaScript code runs and needs to allocate memory to store variables and values. When variables are no longer participating in the run, the system needs to reclaim occupied memory space. Javascript has an automatic garbage collection mechanism, which periodically frees the memory occupied by variables and objects that are no longer used. The principle is to find variables that are no longer used, and then release the memory occupied by them.

There are two types of variables in JavaScript: local variables and global variables. The lifetime of global variables continues to require page unloading; Local variables are declared in functions, and their life cycle starts from the execution of the function until the end of the function execution. During this process, local variables store their values in the heap or stack. When the function execution ends, these local variables are no longer used and the space they occupy is freed. However, when a local variable is used by an external function, one of the situations is a closure. After the function is finished, a variable outside the function still points to a local variable inside the function. The local variable is still in use, so it is not recycled.

2. V8 garbage collection process

Let’s take a look at Chrome’s garbage collection process:

(1) Mark active and inactive objects in space by GC Root.

The accessibility algorithm currently used in V8 determines whether an object in the heap is an active object. This algorithm takes some GC Roots as an initial collection of living objects, starts with GC Roots, and traverses all objects in GC Root:

Objects traversed by GC Root are accessible and must be kept in memory. Accessible objects are called live objects.
Objects that are not traversed by GC Roots are not accessible, and these inaccessible objects can be reclaimed. Unreachable objects are called inactive objects.

(2) Reclaim memory occupied by inactive objects.

All objects marked as recyclable in memory are cleaned uniformly after all tags are completed.

(3) Memory consolidation.

Generally speaking, after frequent object reclamation, there will be a large number of discontinuous memory space, which is called memory fragmentation. When a large number of memory fragments appear in memory, it is possible to run out of memory if large contiguous memory needs to be allocated, so the last step is to defragment these memory fragments. This step is actually optional, because some garbage collectors do not generate memory fragmentation.

This is the general garbage collection process. V8 currently uses two garbage collectors: the primary and secondary garbage collectors. Here’s how V8 implements garbage collection.

In V8,The heap will be divided into Cenozoic and old generation regions.In the new generation, objects with a short survival time are stored, and in the old generation, objects with a long survival time are stored: The new generation usually supports only 1-8m capacity, while the old generation supports much more capacity. V8 uses two different garbage collectors for each of these areas to implement garbage collection more efficiently:

Secondary garbage collector: Responsible for the new generation of garbage collection.
Main garbage collector: Responsible for old generation garbage collection.

(1) Secondary garbage collector (New generation)

The secondary garbage collector is mainly responsible for the new generation of garbage collection. Most objects are initially allocated in the new generation, which has a relatively small storage space divided into two Spaces: from space (object area) and to space (free area).

The newly added objects are stored in the object area. When the object area is nearly full, a garbage cleaning operation is needed: first, the garbage in the object area is marked, and then the garbage cleaning phase is entered. The para-garbage collector copies the living objects into the free area, and it also arranges them in an orderly fashion. There is no memory defragmentation in the free area after copying:

Recycle garbage objects by flipping the roles of the object area and the free area after replication. The algorithm is called the Scavenge algorithm. Also, this role flipping allows these two areas of the new generation to be reused indefinitely:

Vice garbage collector, however, every time they perform cleanup operations, all need to survive the object from the object region is copied to the free region, copy operation need time cost, if the new district space set too big, so every time cleaning time is too long, so in order to execute efficiency, generally relatively small space can be set up in the newborn. Because there is not much space in the newborn area, it is easy to fill the entire area with living objects, and the sub-garbage collector performs garbage collection once the monitoring objects are full. At the same time, the garbage collector also uses object promotion, which moves objects that have survived two garbage collections into the old generation.

(2) Main garbage collector (old generation)

The main garbage collector is mainly responsible for garbage collection in the old generation. In addition to the new generation of promoted objects, some of the big objects will be directly assigned to the old generation. Thus, objects in the old generation have two characteristics:

Objects take up large space;
The object has a long lifetime.

The Scavenge algorithm takes a lot of time to copy large objects to be recycled, resulting in inefficient recycling execution and a waste of space. Therefore, the main garbage collector uses a tag sweep algorithm for garbage collection.

This method is divided into two stages: mark and clear:

Marking phase: Starting from a set of root elements, recursively traverses the set of root elements. In this traversal process, the elements that can be reached are called active objects, and the elements that cannot be reached can be judged as junk data.
Cleanup phase: The main garbage collector directly cleans up the data marked as garbage.

The two phases are shown below:

Garbage data is marked and then cleaned up, which is a marker removal algorithm, but multiple executions of the marker removal algorithm on a piece of memory will result in a large number of discrete memory fragments. However, too much fragmentation would cause large objects to be unable to allocate enough contiguous memory, so another algorithm — tag defragmentation was introduced.

In this algorithm, the marking process is still the same as in the mark clearing algorithm, which marks the recyclable first, but instead of cleaning up the recyclable directly, the next step is to move all surviving objects towards one end, and then directly clean up the memory outside this end:

(3) Complete pause

As we know, JavaScript is a single-line language that runs on the main thread. Once the garbage collection algorithm is executed, you need to pause the JavaScript script that is being executed and resume the script execution after garbage collection is complete. This behavior is called total pause.

The main garbage collector performs a complete garbage collection process as shown below:

In the V8 new generation garbage collection, due to its small space and fewer surviving objects, total pause has little impact. However, in the old generation, if the main thread is occupied too long in the garbage collection process, the main thread cannot do other things, need to wait for the completion of the garbage collection operation to do other things, which may cause the page lag phenomenon.

In order to reduce the lag caused by garbage collection in the old generation, V8 divides the marking process into sub-marking processes, and alternates the garbage collection marking and JavaScript application logic until the marking stage is complete. This algorithm is called incremental marking algorithm. As shown below:

Incremental markup algorithms can be used to break up a complete garbage collection task into smaller tasks, which can be executed in the middle of other JavaScript tasks, so that when the code is executed, the user doesn’t feel the page is stuck because of the garbage collection task.

3. Reduce recycling

The browser can do automatic garbage collection, but garbage collection can be expensive when the code is complex, so it should be minimized:

Array optimization: The easiest way to empty an array is to assign it a value of [], but at the same time create a new empty object. You can set the length of the array to 0 to clear the array.
Object optimization: Reuse objects as much as possible. For objects that are no longer used, set them to NULL and recycle them as soon as possible.
Optimize the function: if the function expression in the loop can be reused, try to put it outside the function.