Compilation principle
It is recommended to learn the-super-tiny-Compiler before learning Webpack, mainly because its code is relatively short and easy to understand. What Webpack mainly does is to convert the relevant syntax sugar (JSX/ ES6 etc.) we wrote into the code that the browser engine can execute. In Webpack, Babel works on the same principle as “the-super-tiny-Compiler”. By learning the the-super-tiny-Compiler, Understand AST and compilation principles. If we enter the following code block
const input = '(add 2 (subtract 4 2))'
Copy the code
The expected conversion output is
const output = 'add(2, subtract(4, 2)); '
Copy the code
By studying the “the-super-tiny-compiler”, you will know which steps the input can take to output the output
Lexical analysis tokenizer
{type: ‘type’, value: ‘value’}
function tokenizer(input) {
// The current variable is used to record the position of the code, similar to the cursor concept
let current = 0;
// Tokens are the results of recording input input transformations
let tokens = [];
// Start traversing the string input
while (current < input.length) {
// char Records the characters traversed to the current position
let char = input[current];
// Check if it is left parenthesis' (')
if (char === '(') {
// The type of the left parenthesis is indicated by 'paren'
tokens.push({
type: 'paren'.value: '('});/ / pointer + 1
current++;
continue;
}
// The closing parenthesis' (' is handled the same way as the opening parenthesis above
if (char === ') ') {
tokens.push({
type: 'paren'.value: ') '}); current++;continue;
}
// Handle Spaces. For Spaces that do not need to be stored in tokens, move the current position
let WHITESPACE = /\s/;
if (WHITESPACE.test(char)) {
current++;
continue;
}
// Process data
// (add 123 456)
/ / ^ ^ ^ ^ ^ ^
let NUMBERS = / [0-9];
if (NUMBERS.test(char)) {
// Record the number
let value = ' ';
// Check whether the char in the next position is a number
while (NUMBERS.test(char)) {
value += char;
char = input[++current];
}
// Store continuous numbers in tokens. Type is number
tokens.push({ type: 'number', value });
continue;
}
// Process strings
// (concat "foo" "bar")
// ^^^ ^^^ string tokens
// Use double quotation marks to start and end the number
if (char === '"') {
let value = ' ';
char = input[++current];
while(char ! = ='"') {
value += char;
char = input[++current];
}
char = input[++current];
tokens.push({ type: 'string', value });
continue;
}
// The method name is handled in the same way as the above string, except that the above string can be interpreted as a variable and needs to be wrapped in double quotation marks
// Name token
// (add 2 4)
// ^^^
let LETTERS = /[a-z]/i;
if (LETTERS.test(char)) {
let value = ' ';
while (LETTERS.test(char)) {
value += char;
char = input[++current];
}
tokens.push({ type: 'name', value });
continue;
}
// Only parentheses/Spaces/numbers/strings/function names are handled
throw new TypeError('I dont know what this character is: ' + char);
}
return tokens;
}
Copy the code
Tokens = Tokenizer (‘(add 2 (Subtract 4 2))’
tokens = [
{ type: 'paren'.value: '(' },
{ type: 'name'.value: 'add' },
{ type: 'number'.value: '2' },
{ type: 'paren'.value: '(' },
{ type: 'name'.value: 'subtract' },
{ type: 'number'.value: '4' },
{ type: 'number'.value: '2' },
{ type: 'paren'.value: ') ' },
{ type: 'paren'.value: ') '}];Copy the code
Parser
The main function of the Parser is to convert tokens into AST structures
function parser(tokens) {
// current is still the position of the current pointer
let current = 0;
// Walk is a recursive process
function walk() {
let token = tokens[current];
// Handle the token for number
if (token.type === 'number') {
current++;
// The ast uses NumberLiteral to record types of type number
return {
type: 'NumberLiteral'.value: token.value,
};
}
// Handle the token that type is string
if (token.type === 'string') {
current++;
// Type of string in the AST is recorded with StringLiteral
return {
type: 'StringLiteral'.value: token.value,
};
}
// process the expressions, type: 'paren', starting with open parentheses, and continue to search until close parentheses end. Ast is CallExpressions record type, where node has params parameter
if (
token.type === 'paren' &&
token.value === '('
) {
token = tokens[++current];
let node = {
type: 'CallExpression'.name: token.value,
params: [],}; token = tokens[++current];// Find tokens whose type is paren and end with a close parenthesis.
while( (token.type ! = ='paren') ||
(token.type === 'paren'&& token.value ! = =') ')) {// Perform recursive processing on non-end tokens
node.params.push(walk());
token = tokens[current];
}
current++;
return node;
}
// Handle the number/string/paren type exception
throw new TypeError(token.type);
}
// Create the root node of the AST. Type is Program
let ast = {
type: 'Program'.body: [],};// ast.body is a recursive walk of the token
while (current < tokens.length) {
ast.body.push(walk());
}
return ast;
}
Copy the code
The ast data is obtained by executing ast = Parser (token). The result is shown below
const ast = {
type: 'Program'.body: [{
type: 'CallExpression'.name: 'add'.params: [{
type: 'NumberLiteral'.value: '2'
}, {
type: 'CallExpression'.name: 'subtract'.params: [{
type: 'NumberLiteral'.value: '4'
}, {
type: 'NumberLiteral'.value: '2'}}}}]]];Copy the code
traverser
The traversal here is mainly called in Transformer, so understand how it works in advance. Take two parameters ast and visitor, where the structure of parameter AST is shown above, and the roughly structure of visitor is shown below
visitor = {
NumberLiteral: {enter(node, parent) {}},
StringLiteral: {enter(node, parent) {}},
CallExpression: {enter(node, parent){}}},Copy the code
The traverser method essentially traverses the old AST, performing the Enter method passed by the visitor according to the corresponding type
function traverser(ast, visitor) {
// Execute the traverseNode method for array arrays
function traverseArray(array, parent) {
array.forEach(child= > {
traverseNode(child, parent);
});
}
function traverseNode(node, parent) {
// methods are mainly methods to get the type of visitor
let methods = visitor[node.type];
// Execute methods
if (methods && methods.enter) {
methods.enter(node, parent);
}
// traverseArray processes each type of child node in turn
switch (node.type) {
/ / the root node
case 'Program':
traverseArray(node.body, node);
break;
// Expression node
case 'CallExpression':
traverseArray(node.params, node);
break;
// Number/String No child node, no action required
case 'NumberLiteral':
case 'StringLiteral':
break;
default:
throw new TypeError(node.type);
}
if (methods && methods.exit) {
methods.exit(node, parent);
}
}
traverseNode(ast, null);
}
Copy the code
Convert the transformer
Transformer mainly processes the data from the AST into a new AST
function transformer(ast) {
// Create the root node of the new AST
let newAst = {
type: 'Program'.body: [],};// Attach the new AST to the _context of the old AST (this changes the structure of the old AST and causes data pollution). The tricky point here is the following operation node._context, which looks like the operation of the old AST. So newAst's code will change
ast._context = newAst.body;
// Execute the above traverser method
traverser(ast, {
NumberLiteral: {
enter(node, parent) {
// Insert child nodes of type number into parent's _context property
parent._context.push({
type: 'NumberLiteral'.value: node.value, }); }},StringLiteral: {
enter(node, parent) {
// Insert child nodes of type string into parent's _context property
parent._context.push({
type: 'StringLiteral'.value: node.value, }); }},CallExpression: {
enter(node, parent) {
// Add callee to record name/type,
let expression = {
type: 'CallExpression'.callee: {
type: 'Identifier'.name: node.name,
},
arguments: [],}; node._context = expression.arguments;if(parent.type ! = ='CallExpression') {
// Add a type that represents an expression statement
expression = {
type: 'ExpressionStatement'.expression: expression, }; } parent._context.push(expression); }}});return newAst;
}
Copy the code
Which newAst = transformer (ast), and the result is shown below, a change in the structure of CallExpression, much ExpressionStatement/Identifier type at the same time
const newAst = {
type: 'Program'.body: [{
type: 'ExpressionStatement'.expression: {
type: 'CallExpression'.callee: {
type: 'Identifier'.name: 'add'
},
arguments: [{
type: 'NumberLiteral'.value: '2'
}, {
type: 'CallExpression'.callee: {
type: 'Identifier'.name: 'subtract'
},
arguments: [{
type: 'NumberLiteral'.value: '4'
}, {
type: 'NumberLiteral'.value: '2'}]}]}}]};Copy the code
The generated code
The last step is to iterate through the new newAst, recursively processing the types inside, and converting them to the final type result that the machine, etc., can recognize or expect
function codeGenerator(node) {
switch (node.type) {
case 'Program':
return node.body.map(codeGenerator)
.join('\n');
// Add semicolons to expressions
case 'ExpressionStatement':
return (
codeGenerator(node.expression) +
'; '
);
case 'CallExpression':
return (
codeGenerator(node.callee) +
'(' +
node.arguments.map(codeGenerator)
.join(', ') +
') '
);
case 'Identifier':
return node.name;
case 'NumberLiteral':
return node.value;
case 'StringLiteral':
return '"' + node.value + '"';
default:
throw new TypeError(node.type); }}Copy the code
The result is output = add(2, subtract(4, 2)), and the compiler method can be seen in the code
function compiler(input) {
let tokens = tokenizer(input);
let ast = parser(tokens);
let newAst = transformer(ast);
let output = codeGenerator(newAst);
// and simply return the output!
return output;
}
Copy the code
In fact, it is mainly divided into the following four parts to complete the compilation
- Change the input string to tokens according to lexical analysis
- Convert tokens into simple AST structures based on syntax analysis
- The AST structure is modified based on the desired visitor to generate a new newAst that more closely matches the structure of the AST site
- Finally, convert newAst to
Refer to the link
- The super – tiny – a compiler (github.com/jamiebuilds…).
- ast(astexplorer.net/#)
- Compilation principles (www.ayqy.net/blog/the-su…