This is the third day of my participation in the August More text Challenge. For details, see:August is more challenging

A piece of code can work on a computer without: parse -> compile -> execute. “Compiler”, “translator” and “interpreter” are divided according to the different processing methods, and “compiled language” and “interpreted language” are defined according to the implementation of the language.

So what’s the difference? What are the ways in which the language is implemented? I need to get this straight.

What’s the difference between “compile” and “interpret”?

  • Compilation is the process of translating code from one language to another. The translation of a high-level language into a low-level or machine language, and the translation of a high-level language into another high-level language, is called compilation. Compiling returns code that has to be converted to machine execution to get results.
  • Interpretation is the process of interpreting code to produce results. The explanation returns the result of the execution.

Compilation and interpretation always seem to be placed in opposite positions, either or. But the real application is not so absolute, the two can play together.

Whether compiling or interpreting, parsing is the first step. Let’s look at the parser first.

Parser

What parsers do is parse source code into an abstract syntax tree (AST), transforming human-readable source code into a tree structure that computers can process.

The broad analytical process includes three steps: lexical analysis, grammatical analysis and semantic analysis.

  • Lexical analysis breaks up the source codetoken, through finite state machine processing words, and finally generatetokenFlow;
  • Parsing willtokenThe stream is transformed into a tree representing the syntactic structure of the codeAST (Abstract Syntax Tree);
  • Semantic analysis is based onASTDo some semantic checking and information extraction, such as
    • Variable type checking
    • Validation of control flow correctness: For example, Java’s continue statement can only be used in the body of a loop
    • Deterministic assignment of variables: variables must be given an initial assignment before they can be used
    • Deterministic variables are not assigned repeatedly: immutable variables (such as Java final variables) can be assigned at most once

If you’re making a whole roast chicken, parsing is the process of plucking and gutting the chicken, which is essential, but how it gets there is none of the parser’s business.

Parsing is essential, and then there are schools of approaches. According to different treatment methods, it is divided into:

  • Compiler: Output target language code (e.g., bytecode, machine code, other language code) based on AST
  • Translator: Output object code after converting the AST
  • Interpreter: interprets the input and executes the output

Compiler (compiler)

The parser takes the AST and tries to output another piece of code.

  1. Parse: Parse the source code into an abstract syntax tree (AST);
  2. Linearization: If the compilation target is a low-level instruction language (assembly code or bytecode), you need to linearize the tree structure first. Recursive AST, translate each node to generate linear IR (linear intermediate code), and then optimize the linear IR;
  3. Generation: Generates low-level instruction code from linear IR. If the object of generation is an executable, it needs to be compiled into machine code by the compiler of the code, and then linked to the standard library to generate the executable.

Transpiler

Translation requires converting an AST into another AST. The operation process of the translator is as follows:

  1. Parser parses source code into an abstract syntax tree (AST)
  2. Conversion: Transformer can add, delete and modify the AST
  3. Generation: The Generator generates the target language code from the AST

Translation processes are often used in the front end, such as Babel, TypeScript, ESLint, and Prettier

  • Babel parses ES6 code into AST, uses Transfomer to generate processed AST, and prints out ES5 code
  • TypeScript regenerates code by converting the AST back to the target version of the AST
  • ESLint checks for rule compliance against the AST, but it can also pass--fixTransform the AST tree to achieve the repair effect
  • Prettier applies rules on the AST to convert them to the target AST, and generates code again

JavaScript parsers are currently available with acorn, @babel/ Parser, espree and other open source tools. Different parsers need to deal with unifying issues. For example, in front end projects where Babel and ESLint work together, they use different parsers to produce different results, unifying parsers can be a big deal.

Beyond JavaScript, Markdown, CSS, GraphQL, Webpack, and so on all apply to translation ideas. The AstExplorer.net/ online parsing tool can be a great help when you need to do related development.

Interpreter = interpreter

The biggest characteristic of the interpreter is “can output the result directly”, which is the ability of “understand + execute”. After parsing the AST, it can generate machine code directly from the AST for execution, or it can be transformed into intermediate code and then into machine code for execution. The interpreter itself is a program written in a high-level language, such as the Google V8 engine implemented in C++.

Compiled language? Interpreted language?

In some articles, languages are divided into compiled and interpreted languages. In fact, there is no difference between compiled and interpreted languages. What really differentiates is the mainstream implementation of the language.

To be usable, a programming language requires the ability to “understand” and “perform.” The common implementation can be summarized as “compile-execution separation”, “compile-interpreted execution” and “interpreted execution”.

Compile – Perform separation

The executable file (machine code) is compiled by the compiler and is persisted in the storage body. When executed, the executable code is loaded to get the results. A typical example is the C/C++ language, which compiles the EXE executable. In this mode, you only need to compile Once, and you don’t need time and space to compile again when executing, so it’s called “Write Once, Run forever”.

This mechanism for compiling into machine code before running is called AHEAD-of-time (AOT).

Compile – interpret execution

In this pattern, instead of generating executable code, the compiler generates an intermediate format (such as AST or bytecode). The execution phase takes on some of the work that would otherwise be part of the compilation phase. For example, the compiler only compiles to the AST, and the interpreter does the semantic analysis, executes, and outputs the results.

A typical example is the Java language, where Javac compiles.class bytecode files that the JVM loads, links, and executes to produce results.

AOT saves machine code and runs more efficiently with less repetitive work, so why does Java opt for compile-interpret execution?

To achieve cross-platform features. The instruction set and API of different operating systems are different. Machine code compiled in one operating system cannot Run in another operating system, and a piece of code cannot Run across platforms, so called “Write Once, Run Anywhere”. Java leaves translating into machine code to the JVM, which itself is written in C/C++ and can provide different JVM programs for different platforms.

Compile-interpretation execution, the first half of the compilation generated intermediate code to avoid repeated operations, the second half of the interpretation of the implementation of dynamic. Of course, you also need to create an intermediate format between the source code and the machine code that can be recognized by the JVM: bytecode.

Although implemented cross-platform. But it also sacrifices operational efficiency. So I tried to optimize the JVM. For example, add a JUST-in-time (JIT) compiler to the parser.

When the program runs, the interpreter comes into play first, and the code can be executed directly. Over time, just-in-time compilers come into play, compiling more and more code into optimized local code for higher execution efficiency. The interpreter can then be used as a means of degrading the compilation run, switching back to the interpretation run if some unreliable compilation optimization becomes a problem, and keeping the program running properly.

After Java 9, AOT compilation was provided to precompile bytecode to machine code. Today Java is a language that supports both AOT and JIT.

Explain to perform

Explain the execution of a process that does not return an intermediate, but inputs source code and outputs the result. The most typical example is the JavaScript language.

Many optimizations have also been made in the Google V8 build process

  • Source -> AST: Source code is parsed to AST by Parser
  • AST -> Bytecode: Ignition The interpreter generates Bytecode as intermediate code stored in memory, which is translated into machine code and executed. Why do you need an intermediate state of bytecode? Machine code is larger than bytecode and takes up more memory space, which is too much of a burden for mobile devices with less memory. Ignition is all about optimizing memory usage.
  • Optimized Code: Bytecode -> Optimized Code: It is a repeated process that can dynamically analyze hot Code into machine Code and save. TurboFan is responsible for THE OPERATION of JIT mechanism.

conclusion

“Compile”, “translate” and “interpret”?

  • Compile: Generate and return code in another language from the AST
  • Translation: Generate and return code after converting the AST
  • Explanation: generates execution results directly, internally may generate intermediate code but will not be returned as a result

“Compiled language” or “interpreted language”?

There is no difference between a compiled or interpreted language; definitions are just a way of summarizing the main implementation.

Java is a “compiled language” if defined in terms of whether a product is returned or whether there is an execution, because it is given.class files; JavaScript can be described as an “interpreted language,” in that it does return results directly from the source code.

The relationship between compile and interpret

The advantage of compiling is to avoid repeating the process. The advantage of interpretation is that it is dynamic, can be optimized during operation, and can be flexibly adapted to different platforms and environments, at the cost of repeated interpretation at the expense of operational efficiency. In practice, it is often the integration of the two ideas to draw on each other’s strengths and make up for each other’s weaknesses.

Reference

A compilation principle for front-end engineers

Why does Java interpret execution instead of directly interpreting the source code? – zhihu (zhihu.com)

Can Java, like C, be run directly by a computer without being compiled into binary machine code via the JVM virtual machine?

V8’s Bytecode. V8 is Google’s Open Source JavaScript… | by Franziska Hinkelmann | DailyJS | Medium

Fire up the Ignition Interpreter · V8