“This article has participated in the call for good writing activities, click to view: the back end, the big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!”
As a front-end programmer, the first thing I do at work every day is turn on my computer, involuntarily click on Chrome, or fish for a while, or jump right into work. The browser window then accompanies you through the day, normally until 7 or 8, later at 9 or 10, and later throughout the day, keeping an eye on your work. Ask yourself, as a loyal companion, have you really understood how it works? Have you ever been inside it?
If you’ve been wondering, check out this episode of Inside Chrome to see how the V8 Engine Works.
What is the V8
You have to know what a thing is before you go into it.
V8 is an open-source, high-performance JavaScript and WebAssembly engine written in C++ by Google and used in Chrome, node.js, etc. It implements ECMAScript and WebAssembly and runs on Windows 7 and above, macOS 10.12+, and Linux systems using X64, IA-32, ARM, or MIPS processors. V8 can run on its own or be embedded in any C++ application.
V8 origin
Let’s take a look at how it came to be and why it’s called that.
V8 was originally developed by the Lars Bak team, named after the car’s V8 engine (an eight-cylinder V-shaped engine), and heralded as a highly capable JavaScript engine, which was released as an open source with Chrome on September 2, 2008.
Why V8 is needed
The JavaScript code we write is ultimately going to be executed in the machine, but the machine doesn’t recognize these high-level languages directly. It takes a series of processes to convert high-level language into machine-readable instructions, called binary codes, for the machine to execute. This transition is what V8 does.
Let’s look at it in detail.
V8 composition
First, take a look at the internals of V8. There are many modules built into V8, and the four most important ones are as follows:
- Parser: a parser that parses source code into
AST
- Ignition: interpreter, responsible for will
AST
Convert to bytecode and execute, marking hotspot code - TurboFan: Compiler that compiles hotspot code into machine code and executes it
- Orinoco: Garbage collector, responsible for memory space collection
V8 Workflow
The following is a workflow diagram of several important modules in V8. Let’s do it one by one.
Parser Parser
The Parser converts the source code into the abstract syntax tree AST. There are two important stages in the transformation: Lexical Analysis and Syntax Analysis.
Lexical analysis
Also known as word segmentation, it is the process of converting strings of code into sequences of tokens. The token here is a string, the smallest unit of source code, similar to a word in English. Lexical analysis can also be understood as the process of combining English letters into words. Lexical analysis does not concern itself with the relationships between words. For example, during lexical analysis, parentheses can be marked as tokens, but matching parentheses is not verified.
Tokens in JavaScript mainly include the following types:
Keywords: var, let, const, etc
Identifier: A contiguous character that is not enclosed in quotes. It may be a variable, or keywords such as if or else, or built-in constants such as true or false
Operators: +, -, *, /, etc
Numbers: hexadecimal, decimal, octal and scientific expressions
String: value of a variable, etc
Space: successive Spaces, line feeds, indentation, etc
Comment: A line comment or a block comment is the smallest syntax unit that cannot be split
Punctuation: braces, parentheses, semicolons, colons, etc
See esprima for tokens const a = ‘Hello world’
[
{
"type": "Keyword",
"value": "const"
},
{
"type": "Identifier",
"value": "a"
},
{
"type": "Punctuator",
"value": "="
},
{
"type": "String",
"value": "'hello world'"
}
]
Copy the code
Syntax analysis
Grammatical distraction is the process of converting tokens generated by lexical analysis into AST according to a given formal grammar. This is the process of putting words together into sentences. The syntax is validated during the conversion process and syntax errors are thrown if there are any errors.
Const a = ‘hello world’; const a = ‘hello world’;
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "a"
},
"init": {
"type": "Literal",
"value": "hello world",
"raw": "'hello world'"
}
}
],
"kind": "const"
}
],
"sourceType": "script"
}
Copy the code
The AST generated by the Parser is handed over to the Ignition interpreter.
Ignition interpreter
The Ignition interpreter is responsible for converting the AST to Bytecode and executing it. Bytecode is a type of code between AST and machine code that is independent of a particular type of machine code and needs to be translated into machine code by an interpreter to be executed.
Since bytecode also needs to be converted to machine code to run, why not just convert the AST to machine code to run directly in the first place? Converting to machine code is definitely faster, so why add an intermediate process?
In fact, before version 5.9 of V8, there was no bytecode. Instead, JS code was directly compiled into machine code and stored in the machine code in memory, which occupied a large amount of memory. Moreover, direct compilation to machine code results in long compilation time and slow startup speed. Furthermore, the direct conversion of JS code into machine code requires different instruction sets for different CPU architectures, which is very complicated.
After version 5.9, bytecode was introduced to solve the problems of high memory footprint, long startup time, and high code complexity.
Let’s take a look at how Ignition converts AST into bytecode.
The following is a workflow diagram of the Ignition interpreter. The AST needs to pass through a bytecode generator and then go through a series of optimizations to generate bytecode.
The optimizations include:
- Register Optimizer: The main purpose is to avoid unnecessary loading and storage of registers
- Peephole Optimizer: Finds parts of bytecode that can be reused and merges them
- Dead-code Elimination: Eliminate unnecessary code and reduce the size of bytecode
After converting the code to bytecode, it can be executed through the interpreter. While Ignition is executing, it monitors code execution and logs execution information, such as the number of times a function is executed and the parameters passed each time it is executed.
When the same code is executed more than once, it is marked as hot code. The hot code is handed over to the TurboFan compiler for processing.
TurboFan compiler
TurboFan takes the Ignition flag’s hot code and optimizes it, then compiles the optimized bytecode into more efficient machine code and stores it. The next time the same code is executed, the corresponding machine code is executed directly, which greatly improves the efficiency of code execution.
When a piece of code is no longer hot code, TurboFan goes through the de-tuning process, turning the optimized and compiled machine code back into bytecode, handing execution rights back to Ignition.
Now let’s look at the implementation process.
Taking sum += arr[I] as an example, since JS is a dynamically typed language, sum and ARr [I] can be different types each time. When executing this code, Ignition checks the sum and ARr [I] data types each time. When the same code is executed multiple times, it is marked as hot code and given to TurboFan.
It would be a waste of time for TurboFan to evaluate sum and ARR [I] data types every time they execute. Therefore, during optimization, sum and ARR [I] data types are determined based on previous executions and compiled into machine code. The next time it is executed, the process of determining the data type is eliminated.
But if the ARR [I] data type changes later in the execution and the machine code is no longer sufficient, TurboFan will throw it away and hand the execution over to Ignition to complete the optimization process.
Hotspot code:
Before optimization:
After the optimization:
conclusion
Now let’s summarize the V8 implementation process:
- Source code passage
Parser
Parser, generated by lexical analysis and grammar analysisAST
AST
afterIgnition
The interpreter generates bytecode and executes it- During execution, if hotspot code is found, hand it over
TurboFan
The compiler generates the machine code and executes it - If the hot code no longer meets the requirements, de-optimize the process
This technique of combining bytecode with interpreter and compiler is commonly known as just-in-time compilation (JIT).
The garbage collector Orinoco was not introduced in this article, V8’s garbage collection mechanism could be covered in a separate article, and we’ll see you next time.
Refer to the article
- V8 Official Documentation
- Celebrating 10 years of V8
- How does V8 execute JavaScript code?
- Ignition: An Interpreter for V8
- Instantaneous compiling