From the perspective of god, see the process of V8 js execution

Most students know how JavaScript works, but they are not quite clear about the underlying working mechanism of JavaScript. In many cases, only understanding the underlying principles can help you better understand and apply a language, such as JavaScript. In today’s article we are going “down”, looking at how JavaScript code is executed from the perspective of the JavaScript engine V8.

Front-end tools and frameworks update themselves at a snail’s pace, and new ones appear all the time. If you want to keep up with the pace of updates to front-end tools and frameworks, you need to grasp that essential knowledge so that you can understand the top applications more easily. The V8 implementation mechanism, for example, will help you understand JavaScript from the ground up, as well as some of the underlying implementations of Babel, ESLint, the syntax checker, Vue, and React. Therefore, understanding the V8 compilation process will give you a better understanding of the language and related tools. (To be honest, I don’t know how to achieve this, after all, I am also a vegetable chicken, but vegetable chicken can become an eagle one day, continue to come on!)

To fully understand how V8 works, you need to be familiar with concepts and principles such as Compiler, Interpreter, abstract syntax tree (AST), Bytecode, just-in-time Compiler (JIT), and more. That you need to focus on. (These nouns need to be able to, if not before, look back)

1. Diagram V8 executing Javascript

Let’s start with a V8 code flow chart to give an overview of the overall situation:

It doesn’t matter if you can’t understand this flow chart now, just have a general impression first. Please continue to read it, guest officer.

2. What are compilers and interpreters

Compilers and interpreters exist because machines cannot understand the code we write directly, so we need to “translate” the code we write into machine language that machines can understand before we can execute programs. According to the execution process of languages, languages can be divided into compiled languages and interpreted languages.

Compiled languages require the compiler to compile the program before it can be executed, and after the compilation, the machine-readable binary is kept so that it can be run each time the program is run without having to recompile it. C/C++, GO, etc are compiled languages.

The programs written by interpreted languages need to be dynamically interpreted and executed by an interpreter every time they are run. Python, JavaScript, etc., are interpreted languages.

Interpreter and interpreter “translate” code flow:

From the figure, you can see the execution process of the two, which can be roughly described as follows:

  1. In the process of compiling a compiled language, the compiler first conducts lexical analysis of the source code, then syntactic analysis to generate an abstract syntax tree (AST), then optimizes the code, and finally generates machine code that the processor can understand. If the compilation succeeds, an executable file is generated. But if a syntax or other error occurs during compilation, the compiler will throw an exception and the resulting binary will not be generated successfully.
  2. In interpreted languages, the interpreter also performs lexical analysis and syntax analysis of the source code, and generates an abstract syntax tree (AST). However, it generates bytecodes based on the abstract syntax tree, and finally executes programs and outputs results based on the bytecodes.

To be clear, today’s V8 engine uses an interpreter and a compiler together, a technique called just-in-time compilation (JIT). More on that later.

3. Elaborate on the v8 javascript execution process

3.1. Parsed by parser, abstract syntax tree (AST) and execution context are generated

As you can conclude from the initial flowchart, the most important step in this step is to convert the source code into an abstract syntax tree and generate the execution context, which we have covered a lot in previous articles, mainly information about the environment in which the code is executed.

Some of you might ask, what is AST? To be honest, I don’t know. I thought it was a DOM tree at first, but it wasn’t.

Now the big shopkeeper from AST is what, and why it is needed, and how it is generated, these several aspects to give you a little explanation.

1. What is AST

A high-level language is one that developers can understand, but one that compilers or interpreters can’t. For a compiler or interpreter, the AST is all they can understand. So whether you’re using an interpreted language or a compiled language, they all generate an AST during compilation. This is similar to how a rendering engine converts an HTML-formatted file into a DOM tree that a computer can understand.

Here is a sample of js code that looks like an AST structure:

(resources.jointjs.com/demos/javas…

The AST is such an important data structure that the subsequent work of the compiler or interpreter depends on the AST, not the source code.

Such as:

One of the most famous projects is Babel. Babel is a widely used code transcoder that converts ES6 code to ES5 code, which means you can write in ES6 now without worrying about whether your existing environment supports ES6. Babel works by converting ES6 source code into an AST, then converting the AST of ES6 syntax into an AST of ES5 syntax, and finally using the AST of ES5 syntax to generate JavaScript source code.

2. There are two stages to generate an AST

1. The first stage is tokenize, also known as lexical analysis, which breaks down the line of source code into tokens. Token refers to the smallest single character or string that is syntactically impossible to divide. You can refer to the following figure to better understand what tokens are.

2. The second stage is parse, also known as syntax analysis, which converts the token data generated in the previous step into the AST according to the syntax rules. If the source code is syntactically correct, this step is done smoothly. But if there is a syntax error in the source code, this step terminates and a “syntax error” is thrown.

This is how the AST is generated, first word segmentation, then parsing. With the AST in place, V8 will then generate the execution context for that code. For details of the execution context, you can refer to the previous articles.Copy the code

Lexical Analysis, Grammar Analysis and Semantic Analysis of compilation Principle

1.Blog.csdn.net/lzj_lzj2014…

2,Juejin. Cn/post / 697158…

3,zhuanlan.zhihu.com/p/96502646

3.2. Using the interpreter Ignition, he generates bytecode from AST

Get to know bytecodes. The interpreter Ignition generates bytecodes

Now that you have the AST and the execution context, the next step, the interpreter Ignition, comes into play, which generates the bytecode from the AST and interprets the execution of the bytecode.

Talking about efficiency in computer science, we cannot escape the two concepts of time and space. Most of the optimization is space for time and time for space. How to achieve the balance between the two and how to achieve the highest efficiency is a problem worthy of in-depth study.

V8 didn’t actually have bytecode at first, but instead converted the AST directly to machine code, which worked well for a while after release because it was so efficient at executing machine code. However, as Chrome becomes more widely available on mobile phones, especially those with 512 megabytes of memory, the memory footprint issue is exposed, as V8 uses a lot of memory to store the converted machine code. To address the memory footprint, the V8 team dramatically reconfigured the engine architecture, introducing bytecode and ditching the previous compiler, and finally implemented the current architecture over the next four years. So what is bytecode? Why should the introduction of bytecode solve the memory footprint problem? Bytecode is a type of code between AST and machine code.

But regardless of a particular type of machine code, bytecode needs to be translated into machine code by the interpreter before it can be executed.

Abstract bytecode is machine code, all kinds of bytecode form each other, can realize all JS required functions, of course first of all, the bytecode is much smaller than machine code memory many, basic is machine code memory decades or even one percent, thus the bytecode cache memory consumed or acceptable.

Bytecode and machine code space comparison:

As you can see from the figure, machine code takes up much more space than bytecode, so using bytecode can reduce the memory usage of the system.

3.3. Execute the code

Once the bytecode is generated, it’s time for execution. Typically, if you have a piece of bytecode being executed for the first time, the interpreter Ignition interprets the execution line by line. If a HotSpot is found in the process of executing the bytecode, such as a piece of code that has been repeatedly executed, the TurboFan compiler compiles the bytecode into efficient machine code, and when it executes the optimized code again, It only needs to execute the compiled machine code, which greatly improves the efficiency of code execution.

The CPU can’t read bytecode, so it needs to convert bytecode to machine code.

After converting AST into bytecode, the interpreter will convert bytecode into machine code during execution. This execution process is definitely slower than directly executing machine code, so in terms of execution, the speed will be slow. However, JS source code is converted into AST through the parser, and then bytecode through the interpreter. This process is much faster than the compiler directly JS source code to machine code, the whole process, the whole time is not much difference, but it reduces a lot of memory, why not.

4. Explain related nouns

Hot code

In the code, often can have the same part of the code, is called many times, every time the same part of the code if need to turn the interpreter binary code to execute, efficiency, will be a little waste, so there will be special in V8 module monitoring module, to monitor whether the same code is called many times, if be called multiple times, then will be marked as hot code, What does that do?

Optimized compiler

TurboFan (TurboFan) is a term that has been used in recent product launches by brands such as Huawei and Xiaomi. TurboFan is designed to optimize a range of functions through software computing power to make them more efficient.

Then continued hot code, when there is hot code V8 will borrow the TurboFan for hot code byte code into machine code and cache, as a result, when called again hot code, not need to transfer the bytecode machine code, of course, hot code is relatively small, so the cache will not take up too much memory, And it improves execution efficiency, which is also a sacrifice of space in exchange for time.

The optimization

JS language is a dynamic language, very flexible, object structure and attributes can be changed at runtime, imagine a problem, if the hot code in a certain execution, suddenly one of the attributes is modified, then compiled into machine code hot code can continue to execute? The answer is definitely not. This is where the de-optimization of the optimized compiler is used, which pushes the hot code back to the AST, where the interpreter reinterprets the modified code, and if the code is marked as hot again, the optimized compiler is repeated.

5, summary

From the point of view of the analysis process, V8 to JS execution process,

The javsScript source code is parsed using the Parser to generate the abstract syntax tree (AST) and execution context. The second step is to generate bytecode from the AST using the interpreter Ignition, and then to interpret and execute bytecode line by line using the interpreter Ignition.

Finally, if a HotSpot is found in the process of executing the bytecode, such as a piece of code that has been repeatedly executed, the TurboFan compiler compiles the bytecode into efficient machine code, and when it executes the optimized code again, It only needs to execute the compiled machine code, which greatly improves the efficiency of code execution.

The whole process uses not only the interpreter, but also the optimized compiler. This combination of the two to deal with the way, the industry known as JIT (just-in-time). In this way, JS is processed mainly by making use of the smaller files formed by AST, while the hot code execution efficiency is high by optimizing the compiler after compilation. The combination of the two will give full play to their respective advantages and improve the efficiency to the maximum.

Just-in-time Compilation (JIT) technology:

If there is any unclear description in the article, you can leave a message to me, and then sort out relevant articles to supplement. If you have better views or ideas, you are welcome to give more guidance, thank you very much, have the opportunity to get together in Hangzhou, travel around the West Lake, eat barbecue.

Reference article:

How V8 executes javascript

Juejin. Cn/post / 697158…

Compilers and interpreters: How does V8 execute a piece of JavaScript code?

Time.geekbang.org/column/arti…

3, thousand words dry goods! Detail JavaScript execution

Mp.weixin.qq.com/s/NkFJVY_HL…

Thinking after class:

1. The longer V8 is executed, the more hot code is compiled into machine code, so the overall execution efficiency is higher. If this is the case, then V8 memory usage is also increasing. So what’s the difference between this and all machine code?

Where does V8’s parsed bytecode or hot node machine code exist?