What is V8?

First let’s see what V8 is. V8 is an open source JavaScript engine developed by Google and currently used in The Chrome browser and Node.js. Its core function is to execute human-readable JavaScript code.

How does V8 execute JavaScript code?

Its main core process is divided into two steps: compilation and execution. The JavaScript code needs to be converted to low-level intermediate code or machine code that the machine can understand, and then the converted code is executed and the execution results are printed.

You can think of V8 as an imaginary computer, also known as a virtual machine, that executes code by emulating the functions of a real computer, such as the CPU, stacks, registers, and so on, and has its own instruction system.

So V8 is the world to JavaScript code, and when V8 executes JavaScript code, you don’t have to worry about the differences between operating systems in the real world, and you don’t have to worry about the differences between computers with different architectures, you just have to write the code according to the specifications of the virtual machine.

Since V8 is a fictional computer that compiles and executes JavaScript code, let’s take a look at why a computer needs to compile a high-level language like JavaScript, and how it executes once compiled.

Why should high-level code be compiled before it is executed?

Let’s start with how the CPU executes machine code. If you think of the CPU as a very small computing machine, we can communicate with the CPU through binary instructions. In order to be able to perform complex tasks, engineers provide the CPU with a bunch of instructions to perform various functions. We call this set of Instructions, machine language.

Note that the CPU can only recognize binary instructions, but binary code is difficult for programmers to read and remember, so we convert the binary instruction set into symbols that humans can recognize and remember. This is the assembly instruction set, you can refer to the following code:

Machine instruction MOV AX, BX assembly instructionCopy the code

So you might ask, can the CPU recognize assembly language directly? The answer is “no”, so if you write a program in assembly, you also need an assembly compiler, which programs assembly code into machine code. Here’s how:

Although assembly language has made a layer of abstraction to machine language, reducing the complexity of programmers to understand machine language, assembly language is still complex and tedious, even if you write a very simple function, also need to achieve a large number of assembly code, which is mainly reflected in the following two points.

First, different cpus have different instruction sets, and if you want to implement a function in machine language or assembly language, you need to write specific assembly code for each architecture of the CPU, which can lead to huge, tedious operations.

Secondly, when writing assembly code, we also need to understand the hardware knowledge related to the processor architecture. For example, you need to use registers, memory, operation CPU, etc.

So we need a language that hides the details of computer architecture, a language that can adapt to many different CPU architectures, and a language that can focus on business logic, such as C, C++, Java, C#, Python, JavaScript, and so on. These “high-level languages” came into being.

Like assembly language, the processor can’t directly recognize code written in a high-level language, so what happens? In general, there are two ways to execute this code.

The first is interpreted execution, which requires the input source code to be compiled into intermediate code by the parser, and then the intermediate code is interpreted and executed by the interpreter directly, and then the result is directly output. The specific process is shown in the figure below:

The second is compilation execution. In this way, we also need to convert the source code into intermediate code, which our compiler then compiles into machine code. The compiled machine code is usually stored in a binary file, and you can execute the binary directly when you need to execute the program. You can also use a virtual machine to store compiled machine code in memory and then execute the in-memory binaries directly.

These are the two basic ways in which a computer executes a high-level language: interpreted execution and compiled execution. However, this implementation is quite different for different high-level languages. For example, to execute code written in C, you need to compile it into a binary file and then execute the binary directly. For languages like Java and JavaScript, different virtual machines are required to simulate the compilation and execution process of computers. The Java language needs to be converted by the Java VM, and the JavaScript needs to be converted by the JavaScript VM.

Even if JavaScript is a language, there are several popular virtual machines, and their implementation methods are also different. For example, Apple uses JavaScriptCore virtual machine in Safari, while Firefox uses TraceMonkey virtual machine. Chrome uses the V8 virtual machine.

How does V8 execute JavaScript code?

In fact, V8 does not use a single technique, but rather a mixture of compiler and interpreter execution, which we call JIT (Just In Time) technology.

This is a tradeoff strategy, as both approaches have their own pros and cons: explain execution starts fast but executes slowly, whereas compile execution starts slow but executes fast. You can see the full flow chart of V8 executing JavaScript below:

Before V8 can start executing JavaScript, it needs to prepare some basic environments for executing JavaScript, including “heap space”, “stack space”, “global execution context”, “global scope”, “message loop system”, “built-in functions”, etc. These are all things you need to use when executing JavaScript.

With the base environment in place, it’s time to submit the JavaScript code to V8 to execute.

First, V8 receives the JavaScript source code to execute, but it’s just a bunch of strings to V8, and V8 doesn’t directly understand what the string means; it needs to structure it. Structuralization refers to that information can be decomposed into multiple interrelated components after analysis. Each component has a clear hierarchical structure, which is convenient for use and maintenance, and has certain operation specifications.

The structuring of V8 source code generates an abstract syntax tree (AST), called the AST, which is a structure that V8 can easily understand. It is also important to note that V8 generates the relevant scopes, which hold the relevant variables, along with the AST.

With the AST and scope in place, you can then generate bytecodes, which are code intermediate between the AST and the machine code. But regardless of a particular type of machine code, the interpreter can interpret and execute bytecode directly, or the compiler can compile and execute bytecode into binary machine code.

Once the bytecode is generated, the interpreter comes on stage, interprets the execution of the bytecode in order, and prints the execution results. As you may have noticed, we have drawn a monitor robot near the interpreter. This is a module that monitors the execution status of the interpreter. When interpreting the execution of bytecode, if a certain code is found to be executed several times, the monitor robot will mark the code as hot code.

When a piece of code is marked as hot code, V8 throws the bytecode to the optimized compiler, which compiles the bytecode into binary code behind the scenes and optimizes the compiled binary. The optimized binary machine code is much more efficient. If this code is executed again, V8 will preferentially select the optimized binary code, which will make the code execution much faster.

However, unlike static languages, JavaScript is a very flexible and dynamic language. The structure and properties of objects can be modified at run time, while the optimized compiler optimizes the code for a fixed structure. Once the structure of an object is changed dynamically during execution, The optimized code is bound to become invalid code. In this case, the optimized compiler needs to perform the de-optimization operation, and the de-optimized code will fall back to the interpreter for the next execution.

conclusion

V8 is an open source JavaScript engine developed by Google, also known as virtual machine, which simulates various functions of real computers to achieve code compilation and execution. So, to understand the inner workings and principles of V8, we can start by analyzing how the computer compiles and executes the language.

Since computers can only read binary instructions, there are usually two ways to get a computer to execute a high-level language. One is to convert high-level code into binary code and then have the computer execute it. Another way is to install an interpreter on the computer and let the interpreter interpret the execution.

Explain execution and compile execution have their own advantages and disadvantages. Explain execution is fast to start but slow to execute, while compile execution is slow to start but fast to execute. Explained in order to make full use of advantages of executed and compilation, avoid its shortcomings, V8 adopted a trade-off strategy, adopting the tactics of interpretation in the process of start, but if one piece of code execution frequency exceeds a value, the V8 will optimize the compiler to compile it into execution efficiency more efficient machine code.

With this in mind, we can delve into the main flow V8 goes through to execute a piece of JavaScript code, which includes:

  1. Initialize the base environment;
  2. Parsing source code to generate AST and scope;
  3. Bytecode generation based on AST and scope;
  4. Interprets execution of bytecode; Listen for hotspot code;
  5. Optimized hot code to binary machine code;
  6. De-optimizes the generated binary machine code.

It is important to note that JavaScript is a dynamic language, and some of the optimized constructs may be dynamically modified by V8 during runtime. This will invalidate the previously optimized code, and if the optimized code fails, the compiler will need to de-optimize it.