Declare the translation from simviso team, the original video address: www.bilibili.com/video/av683…
directory
What is lexical analysis
What is grammatical analysis
What is semantic analysis
Optimization and code generation
The compiler has five stages: lexical analysis, syntax analysis, semantic analysis, optimization, and code generation
What is lexical analysis
And we’re going to use the analogy of how does a human understand English and how does a compiler understand this
The first step to understanding a program, whether a compiler or a human, is to understand words.
This is a sentence
In the English sentence above, there are four words “this is a” and “sentence”.
What we need to be careful about is recognizing separators (such as Spaces, punctuation marks such as periods), and words and symbols such as capital letters, because these can help us separate the group of letters into a bunch of words that you can understand
Let’s look at another set of more complex sentences: IST his ase nte nce
It is difficult to identify the above statements at once.
The purpose of lexical analysis is to segment the text of program code in the way it is specified, in other words, to distinguish words by the compiler.
For example, if x==y then z=1; else z=2;
We can see some keywords (lexical units) above: if, then, else; Variable names: x, y, z; Constants: 1, 2;
Punctuation marks, as well as separators, are lexical units designed to break program code up into a bunch of words that the compiler can understand;
What is grammatical analysis
For you to understand the words, you will understand the structure of the sentence.
Let’s look at This example: This line is a longer sentence.
The first step in parsing is to identify the role each word plays in the sentence. For example:
This –> article line –>noun is –> verb a –> article longer –> adjective sentence –>noun
The actual work after the analysis is to put these words together into a more advanced structure.
For example, in this sentence, it contains (subject), a verb, and an object.
From all of the above, a complete sentence is formed.
Parsing English sentences is very similar to parsing program code.
If x==y then z=1; else z=2;
Parsing it, it is simply an if-then-else statement; So if-then-else is the root of our parse tree.
First, analyze the if statement, which contains two variables x and y, and then compare the operator ‘==’; These are combined to form a relational expression. A corresponding Boolean value is obtained by judging x and y. Then the Boolean is entered into the corresponding THEN and else, and the parse tree structure corresponding to this structure is as follows:
What is semantic analysis
Once we understand the structure of the sentence, the next step is to try to understand what the sentence is about. In fact, it is relatively easy to carry out lexical and grammatical analysis of the sentence, but it is relatively difficult to understand the sentence. So it’s important to know that computers can only do a limited number of semantic analyses.
For the compiler, semantic analysis simply means correcting syntax errors and finding inconsistencies in the semantics. If there is a contradiction in the program, the compiler can usually find this information, but the compiler does not know what the program is really doing.
Let’s continue with an English sentence
Example:Jack said Jerry left his assignment at home.
There is a contradiction in this sentence. His in “his Assignment” can refer to Either Jack or Jerry, so we don’t know who “his” refers to without more information.
Jack said Jack left his assignment at home?
In this, Jack may be one person or there may be two Jacks. “His” may be Jack or another person.
In a program, this is a variable binding problem, such as the following code. When you write the code, it will give you an error when you define the second jack, telling you that jack has already been defined. Because when you go into semantic analysis, when you print a jack it will have two jacks, and the compiler doesn’t know which jack you’re printing.
There are strict specifications in the compiler to prevent the above variable binding problems.
Optimization and code generation
The fourth stage of the compiler, optimization, is not a very appropriate reference in everyday English usage, but it is a bit like editing. In fact, it’s much like a professional compiler that cuts the length of an article within a word limit.
But a little bit like editing
If we think this sentence is too long, we can replace it with two words: But akin to editing
The goal of program optimization is to modify the code so that it uses fewer resources. Maybe we want it to run faster in less time, or maybe we want it to use less space so we can store more data.
X=Y*0 is the same as X=0
This means that X=Y0 is the same thing as X=0. This optimization seems to be fine. Unfortunately, this is not the right rule. As it turns out, this optimization only works for Interger. For any non-number multiplied by 0 it does not equal 0: NAN0=NAN;
So you do this optimization, and you’re going to completely destroy this important algorithmic logic.
The final stage of the compiler is Code generation, often referred to as Code Gen. Code Gen can generate assembly Code, which is the most basic function of a compiler. That is, it can translate the target language into other languages.