preface
In this article, the first chapter of Programming Language Pragmatics shows the difference between Compilation and Interpretation, and the difference between Compilation and Interpretation is explained in the process.
Compilation and Interpretation
Anyone with some experience with code will quickly understand the difference between compiling and interpreting by borrowing the book’s diagrams:
Compilation:
Interpretation (Interpretation) :
The difference between compilation and interpretation is clear, but most languages, such as Java, mix the two:
The above is only the difference between compilation and interpretation, the specific implementation is more complex, for example, some languages have Preprocessor (Preprocessor), modern Java implementation has JIT compiler, and so on.
Specific not spread out, interested friends search by themselves.
General compilation process
The book gives a diagram of the compilation process:
The main tasks of each phase are explained next.
Scanner (lexical analysis)
The Scanner phase reads the strings of the source program and groups them into tokens, which are the smallest meaningful units in a computer program. This stage is also called lexical analysis.
For example, the lexical analysis of C language source programs seeking the greatest common divisor:
Here’s the source code:
After lexical analysis, disassemble token:
Note: A token is not just a cut string. A token consists of a token name and a token value (optional). For example, x = a + b * 2; The parsed token is: [(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b), (operator, *), (literal, 2), (separator, ;)] The above example is taken from – Lexical Analysis (Token) – Wikipedia
The main purpose of this phase is to simplify the work of the Parser in the next phase by the following methods:
- Reduce the input volume of Parser (. 2. She will reduce the size of the input.
- Remove irrelevant strings, such as whitespace strings
- Remove the comment
- Give the number of lines and columns where the token appears in the source code
, etc.
Parser(Syntax Analysis)
The Parser stage mainly does syntax analysis, which organizes the tokens from lexical analysis into a parse tree, which shows how to assemble tokens together to generate a program.
The parse tree for the highest common divisor program is as follows:
Part A and part B are as follows:
The construction of the entire tree relies primarily on a recursive rule called context-free grammar. For example, there are rules for loops:
The statement continues the recursive rule:
The loop in the A tree corresponds to this:
Parse Tree is also often referred to as concrete Syntax Tree because it contains exactly how a specific sequence of tokens can be obtained from context-free Grammar.
Semantic Analysis and Intermediate Code Generation
Semantic analysis is a logical phase whose main purpose is to analyze the meaning of a program, in addition to finding and keeping constant recurring identifiers that point to program entities. In most languages, the semantic parser also tracks the types of identifiers and expressions.
To do this, the semantic parser builds and maintains the ** Symbol table ** data structure. Symbol tables usually contain information about identifiers, such as types, internal structures, scopes, and so on.
With the help of symbol tables, the semantic parser can execute a large number of rules not found in context-free Grammar and Concrete Syntax Tree. In C, for example, you can guarantee:
- Each identifier needs to be defined before it is used
- An identifier must be in its scope to be used
- The subroutine call needs to provide the correct number of arguments
- Case on switch is a different constant
, etc.
Of course, not all semantic semantics can be done at compile time. The semantics that can be done at compile time are called static Semantics, and those that must be checked at run time are called Dynamic Semantics.
In the semantic analysis stage, the semantic analyzer will also remove the redundant information of concrete Syntax Tree and transform it into abstract Syntax Tree (also known as AST), and annotate the useful information for the remaining nodes, such as the pointer of identifier to symbol table.
The figure is the symbol table of the greatest common divisor with the AST:
Target Code Generation
In the code generation phase, the compiler converts the intermediate code into the target language. To generate assembly or machine language, the code generator would traverse the symbol table to assign variable addresses, then traverse the intermediate code of the sequence, generating references to load and store variables, and so on.
The greatest common divisor example above generates assembly language (the comments in the screenshot are not generated by the code generator, they are manually annotated) :
Code Improvement
Code Improvement, also known as optimization, is an optional phase of the compilation process whose main purpose is to make the generated program more efficient — faster or using less memory.
Some optimizations are machine independent, focusing on optimizing intermediate code after semantic analysis; Other optimizations require knowledge of the target machine, and this optimization is mainly carried out on the generated target program. Therefore, in our above process, Code Improvement appeared twice, which was also optional.
The following figure is an optimized assembly language version of the maximum common divisor above, which is much shorter:
Modern processors have very complex internal behavior, and compiler-generated assembler programs are often more efficient than those written by humans.
Explain the general process
The general process of explanation is shown as follows:
You can see that the front end section is interpreted exactly the same as the compiled one. Tree-walk ROUTINES are not covered in detail in Chapter 1 of this book, nor will they be covered here.
conclusion
After reading the first chapter of this book, I have a clear understanding of the general process and purpose of compilation.
The resources
Lexical Analysis — Lexical Analysis (Token) — Wikipedia