An overview of the
There will be an ongoing compiler topic here.
Self-study materials are from Google and Dragon Book.
The deadline is 20211231
Different compilers for different languages
roots
The compiler
Small demo
Assuming we enter a = b + 1 in the editor, the following figure shows the entire compilation process for the compiler.
Lexical analysis
If you think of the compiler as a black box, it can map source programs to semantically equivalent target programs.
First our source program will enter the lexical analysis phase. Lexical analysis reads the character stream input by the source program and generates the following lexical units:
. In this lexical unit, the first component token-name serves as an abstract symbol to be used in the grammar analysis stage, and the second component attribute-name refers to the explanatory information about this lexical unit in the symbol table. The symbol table is written when the compiler is made.
Suppose we enter the source program statement: A = b + 1
This assignment statement is combined into the following morphemes during the lexical analysis phase:
- A is a morpheme mapped to the lexical unit
,>
, where ID represents the abstract symbol of the identitifier and 1 points to the entry corresponding to A in the symbol table. A symbol table entry for an identifier stores information about the identifier, such as its name and type.
- The assignment symbol = is a morpheme mapped to a lexical unit <=>. Because this lexical unit does not require attribute values, we omit the second component. We could also use an abstract notation like Assign, but we use = for convenience.
- B is also a morpheme, mapped to the lexical unit
,>
, where 2 points to the corresponding entry for B in the symbol table.
- + is a morpheme that maps to <+>.
- 1 is a morpheme that maps to <1>.
Syntax analysis
The second step of the compiler is called parsing or parsing.
In the grammar analysis stage, the lexical units generated by the grammar analysis are used to construct the grammar analysis tree. Each inner node in the tree represents an operation, and the leaf nodes of that node represent the components of that operation.
Semantic analysis
The semantic parser uses information from the syntax tree and symbol table to check whether the source program is consistent with the semantics defined by the language. It also collects type information and stores this information in a grammar book or symbol table for later use.
An important stage of semantic analysis is type checking. The compiler checks whether each operator has a matching operation component. For example, the definition in many programming languages requires that the subscript of an array be an integer, and the compiler must report an error if a floating-point number is used as the subscript.
Programming languages also allow some type conversions, called automatic type conversions. For example, when an integer is added to a floating point number, the integer is automatically converted to a floating point number for calculation. In the program a = b + 1, we assume that b is a floating-point number, so inttofloat conversion occurs in the semantic analysis phase.
Intermediate code generation
This is the process of translating the source program into object code. A compiler may construct one or more intermediate representations. A syntax tree is an intermediate representation that is usually generated during parsing and semantic parsing.
After the source program has been parsed and semantically, many compilers generate an explicit intermediate representation of a low-level machine-like language. We can think of this as the program of some abstract machine. This intermediate representation has two important properties: it is easy to generate and can be easily translated into target machine code.
As shown in the figure above, we generally consider a three-address code.
Code optimization
After the optimization phase, better object code is generated. Better means: less energy. This stage is very important.
Code generation
The code generator takes an intermediate representation of the source program as input and maps it to the target machine language. If the target machine language is machine code, a register or memory location must be selected for each variable used by the program. A critical step in code generation is the proper allocation of registers and memory space for programs.
So that’s kind of an overview of how compilation works, and then we’ll go into the world of compilation with Fredrik Kjolstad.
Thank you
I would like to thank a professor at Zheng University for opening a window on how code is implemented.
Thanks to Assistant Professor Fredrik Kjolstad and For your public handout.
reference
-
Compilation Principles second edition
-
Handout by Assistant Professor Fredrik Kjolstad