preface
After the introduction and practice of a series of preliminary code compilation articles [public number: Front-end xiaocainiao 001], we should have a certain understanding and understanding of the implementation and execution process of compiler.
Let’s take a closer look at how compilers convert between languages.
(Language A => language B and does not affect the correct execution of the program)
The body of the
To get A clearer picture of how the compiler converts language A to language B, let’s use an overview diagram.
(For the convenience of this article, language A is abbreviated as L-A, and language B is abbreviated as L-B)
-
1. L-A / L-B
First of all, let’s introduce the differences between the original language A and the target language B:
// L-A
(add 2 (subtract 4 2))
// L-B
add(2, subtract(4, 2))
Copy the code
Its corresponding AST syntax tree structure:
Smart people should be able to see that in the process of realizing code conversion to different languages, the core work is not only the common lexical analysis, syntax analysis, but also the transformation of AST and the target code generation of the new AST. Each step is described below.
-
2. Tokenizer
The process of lexical analysis has been introduced and practiced many times in previous articles,
We won’t discuss it in detail here.
Tokens are as follows:
const input = '(add 2 (subtract 4 2))';
const tokens = [
{ type: 'paren'.value: '(' },
{ type: 'name'.value: 'add' },
{ type: 'number'.value: '2' },
{ type: 'paren'.value: '(' },
{ type: 'name'.value: 'subtract' },
{ type: 'number'.value: '4' },
{ type: 'number'.value: '2' },
{ type: 'paren'.value: ') ' },
{ type: 'paren'.value: ') '}];function tokenizer(input) {
let tokens = []
let current = 0;
while(current < input.length) {
let char = input[current]
// '(')' parentheses
if(char === '(' || char === ') ') {
tokens.push({
type: 'para'.value: char
})
current++;
continue;
}
/ / space
let WHITESPACE = /\s/;
if(WHITESPACE.test(char)) {
current++;
continue;
}
/ / 0-9 Numbers
let NUMBERS = / [0-9];
if(NUMBERS.test(char)) {
/ /...
let value = ' ';
while(NUMBERS.test(char)) {
value += char;
char = input[++current]
}
tokens.push({
type: 'number',
value
})
continue;
}
// string "xx" the character string
if(char === '"') {
// ...
}
// name
let LETTERS = /[a-z]/i;
if(LETTERS) {
// ...
}
throw new TypeErroe(char)
}
return tokens
}
Copy the code
-
3. Parser
The Tokens obtained by lexical analysis and the l-A AST syntax structure described above can be programmed to convert tokens into the AST syntax tree corresponding to L-A.
function parser(tokens) {
const current = 0;
// Define the AST root
let ast = {
type: 'Program'.body: [],};while(current < tokens.length) {
ast.body.push(walk());
}
// Define a recursively called 'walk' function
function walk() {
// Get the token currently being processed
let token = tokens[current];
// Returns an AST node if the current value is number or string
if (token.type === 'number') {current++;return {
type: 'NumberLiteral'.value: token.value,
};
}
if (token.type === 'string') {
current++;
return {
type: 'StringLiteral'.value: token.value,
};
}
// If the parentheses are '(' then a 'CallExpression' ast node is created
if (token.type === 'paren' && token.value === '(') {
// Get the token after the '(' parentheses
token = tokens[++current];
// Creates a 'CallExpression' ast node with the current token set to its name
let node = {
type: 'CallExpression'.name: token.value,
params: [],};// Get the token after name
token = tokens[++current];
// Processes descendants of the CallExpression node
while((token.type ! = ='paren') ||
(token.type === 'paren'&& token.value ! = =') ')) {// Recursively call 'walk' to attach the node it returns to Node.params
node.params.push(walk());
token = tokens[current];
}
// Skip the ')' parentheses
current++;
returnnode; }}return ast;
}
// Final AST result
const ast = {
type: 'Program'.body: [{
type: 'CallExpression'.name: 'add'.params: [{
type: 'NumberLiteral'.value: '2'
}, {
type: 'CallExpression'.name: 'subtract'.params: [{
type: 'NumberLiteral'.value: '4'
}, {
type: 'NumberLiteral'.value: '2'}}}}]]];Copy the code
At this point, we have the AST of L-A, which will be followed by the all-important Transform.
-
4. Translation (transformer)
Transformation phase, which takes the AST from the last step above and makes changes to it. It can manipulate the AST as if it were the same language, or it can translate it into an entirely new language.
Let’s look at how to convert the AST.
You may have noticed that the AST contains elements that all look similar. These objects all have a Type attribute, and each belongs to an AST node on which the attribute defined nodes describe a separate part of the tree.
When transforming an AST, nodes can be manipulated by adding/removing/replacing properties, adding new nodes, deleting nodes, or discarding an existing AST and creating a brand new AST based on it.
Because the goal here is to convert to a new language, the focus is on creating a brand new AST for the target language.
In order to access all nodes, a depth-first algorithm is used to traverse the entire AST.
If we were to manually manipulate this AST(add, subtract, modify) instead of creating a separate AST, we might introduce all sorts of abstractions, but for what we’re going to do here, we just need to visit each node in the tree one by one.
Visitor is defined as follows:
Var visitor = {NumberLiteral(node, parent) {}, CallExpression(node, parent) {}, }; // AST structure-program-callexpression - NumberLiteral - CallExpression - NumberLiteral - NumberLiteral // AST depth first algorithm processing // Backtrace -> Program (enter) -> CallExpression (enter) -> Number Literal (enter) < -number Literal (exit) -> CallExpression (enter) -> Number Literal (enter) <- Number Literal (exit) -> Number Literal (enter) <- Number Literal (exit) <- CallExpression (exit) < -callexpression (exit) < -program (exit) // To support the Enter /exit processing, Var visitor = {NumberLiteral: {enter(node, parent) {}, exit(node, parent) {},} //... };Copy the code
Now, with that visitor we can process different AST Node-types,
Let’s define a traverser function that accepts an AST and visitors.
function traverser(ast, visitor) {
// Support array type traversal
function traverseArray(array, parent) {
array.forEach(child= > {
traverseNode(child, parent);
});
}
//
function traverseNode(node, parent) {
let methods = visitor[node.type];
// The node enters the hook
if (methods && methods.enter) {
methods.enter(node, parent);
}
switch (node.type) {
case 'Program':
traverseArray(node.body, node);
break;
//
case 'CallExpression':
traverseArray(node.params, node);
break;
// NumberLiteral and StringLiteral types do not do any node processing
case 'NumberLiteral':
case 'StringLiteral':
break;
}
// The hook for node exit
if(methods && methods.exit) { methods.exit(node, parent); }}// Iterate over the AST node
traverseNode(ast, null);
}
Copy the code
Next, implement a Transformer and build a new AST
function transformer(ast) {
// Create a new AST root
let newAst = {
type: 'Program'.body: [],};// For convenience, copy the body reference to ast._context
ast._context = newAst.body;
/ / traverse the ast
traverser(ast, {
NumberLiteral: {
enter(node, parent) {
parent._context.push({
type: 'NumberLiteral'.value: node.value, }); }},StringLiteral: {
enter(node, parent) {
parent._context.push({
type: 'StringLiteral'.value: node.value, }); }},CallExpression: {
enter(node, parent) {
// Create a 'CallExpression'
let expression = {
type: 'CallExpression'.callee: {
type: 'Identifier'.name: node.name,
},
arguments: [],};// Define a new reference on the 'CallExpression' node
node._context = expression.arguments;
//
if(parent.type ! = ='CallExpression') {
// wrap the 'CallExpression' node with 'ExpressionStatement'.
expression = {
type: 'ExpressionStatement'.expression: expression, }; } parent._context.push(expression); }}})return newAst;
}
const newAst = {
type: 'Program'.body: [{
type: 'ExpressionStatement'.expression: {
type: 'CallExpression'.callee: {
type: 'Identifier'.name: 'add'
},
arguments: [{
type: 'NumberLiteral'.value: '2'
}, {
type: 'CallExpression'.callee: {
type: 'Identifier'.name: 'subtract'
},
arguments: [{
type: 'NumberLiteral'.value: '4'
}, {
type: 'NumberLiteral'.value: '2'}]}]}}]};Copy the code
-
5. Code-generator
The final stage is code generation, where the code generator calls itself recursively to output each node in the tree as a long string.
function codeGenerator(node) {
switch (node.type) {
// If it is 'Program', the map will traverse its body property and insert a newline character
case 'Program':
return node.body.map(codeGenerator)
.join('\n');
// If 'ExpressionStatement' is wrapped in parentheses and nested, call code generator expressions with a semicolon
case 'ExpressionStatement':
return (
codeGenerator(node.expression) +
'; '
);
// For 'CallExpression', it prints' callee 'and adds an open parenthesis,
// I will then iterate over each node in the 'arguments' array and run them through the code generator,
// Then connect them with a comma, and add a close parenthesis. Wrap it in parentheses
case 'CallExpression':
return (
codeGenerator(node.callee) +
'(' +
node.arguments.map(codeGenerator)
.join(', ') +
') '
);
case 'Identifier':
return node.name;
// For 'NumberLiteral' returns directly
case 'NumberLiteral':
return node.value;
// For 'StringLiteral', wrap it with ""
case 'StringLiteral':
return '"' + node.value + '"'; }}const output = "add(2, subtract(4, 2))";
Copy the code
At this point, the compiler is fully equipped to convert L-A to L-B. From input to output, the compiler’s code conversion is finally completed through four stages.
1. input => tokenizer => tokens
2. tokens => parser => ast
3. ast => transformer => newAst
4. newAst => generator => output
Copy the code
If you’re already here, congratulations, you’ve surpassed 80% of your readers
conclusion
So, this is the end of the preliminary code compilation series, I think if you can understand all four articles and implement the code involved by hand, it will give you a new understanding of daily front-end development, whether it is engineering using Babel, ESLint, etc., the underlying principles are very similar.
References:
- github.com/babel/babel
Ps: If there are any deficiencies welcome to correct
A front end little novice | eager | dream and love do not live up to