preface
I recently shared a Babel principle with the team, and I have put it into a blog post. It has 6059 words (including code), 3 minutes for speed reading and 5 minutes for general reading. If you are interested, check out my Github blog
babel
Let’s look at some code:
[1.2.3].map(n= > n + 1);
Copy the code
After Babel, the code looks like this:
[1.2.3].map(function (n) {
return n + 1;
});
Copy the code
Babel behind
Babel’s process: parse – transform – generate.
And then there’s another middle thing here, which is the Abstract Syntax tree.
AST parsing process
How does a JS statement parse into an AST? There are two steps in the middle, one is participle, the second is semantic analysis, how to understand these two things?
- participles
What is a participle?
For example, when we are reading a sentence, we also do participles, for example, “It’s a beautiful day,” and we cut it into “today,” “the weather,” and “nice.”
To replace the js parser, let’s look at the following statement console.log(1); ,js will see console,.,log,(,1,),; .
So we can put the smallest lexical unit that the JS parser can recognize.
Of course, we can easily implement such a participle.
// The string parameter is passed in, and then it is checked one character at a time. The result is stored in an array, with special processing for identifiers and numbers
function tokenCode(code) {
const tokens = [];
// String loop
for(let i = 0; i < code.length; i++) {
let currentChar = code.charAt(i);
// is the case with semicolon brackets
if (currentChar === '; ' || currentChar === '(' || currentChar === ') ' || currentChar === '} ' || currentChar === '{' || currentChar === '. ' || currentChar === '=') {
// For a single character syntax unit, add it directly to the result
tokens.push({
type: 'Punctuator'.value: currentChar,
});
continue;
}
// is the case of the operator
if (currentChar === '>' || currentChar === '<' || currentChar === '+' || currentChar === The '-') {
// Similar to the previous step except for the syntax unit type
tokens.push({
type: 'operator'.value: currentChar,
});
continue;
}
// This is the case with double or single quotes
if (currentChar === '"' || currentChar === '\' ') {
// The quotation marks indicate the beginning of a character transmission
const token = {
type: 'string'.value: currentChar, // Record the current contents of the syntax unit
};
tokens.push(token);
const closer = currentChar;
// Iterates through a nested loop to find the end of the string
for (i++; i < code.length; i++) {
currentChar = code.charAt(i);
// Add the current traversal character unconditionally to the contents of the string
token.value += currentChar;
if (currentChar === closer) {
break; }}continue;
}
if (/ [0-9].test(currentChar)) {
// Numbers start with characters from 0 to 9
const token = {
type: 'number'.value: currentChar,
};
tokens.push(token);
for (i++; i < code.length; i++) {
currentChar = code.charAt(i);
if (/ / [0-9 \].test(currentChar)) {
// If the character traversed is still part of the number (0 through 9 or decimal point)
// We will not consider the case of multiple decimal points and other bases
token.value += currentChar;
} else {
// Exit if you encounter a character that is not a number.
// Because the current character is not part of the number, it needs to be parsed later
i--;
break; }}continue;
}
if (/[a-zA-Z\$\_]/.test(currentChar)) {
// Identifiers start with letters, $, and _
const token = {
type: 'identifier'.value: currentChar,
};
tokens.push(token);
// The same is true with numbers
for (i++; i < code.length; i++) {
currentChar = code.charAt(i);
if (/[a-zA-Z0-9\$\_]/.test(currentChar)) {
token.value += currentChar;
} else {
i--;
break; }}continue;
}
if (/\s/.test(currentChar)) {
// Consecutive whitespace characters are grouped together
const token = {
type: 'whitespace'.value: currentChar,
};
// The same is true with numbers
for (i++; i < code.length; i++) {
currentChar = code.charAt(i);
if (/\s]/.test(currentChar)) {
token.value += currentChar;
} else {
i--;
break; }}continue;
}
throw new Error('Unexpected ' + currentChar);
}
return tokens;
}
Copy the code
- Semantic analysis
Semantic analysis is more difficult. Why?
Because this is not like participles so there is a standard, some things have to rely on their own to explore.
In fact, semantic analysis is divided into two parts, one is the statement, and the other is the expression.
What is a statement? What is an expression?
Expressions such as a > b; a + b; This kind of thing, can be nested, can also be used in statements.
Var a = 1, b = 2, c =3; And so on, we understand a statement in. It is similar to a sentence in Chinese.
Of course, one might ask, console.log(1); What is this?
In fact, this case can be classified as a single statement expression, which you can either view as an expression or as a statement, where an expression becomes a statement.
Now that we’re done, we can try to write a simpler statement analysis here. Such as the var definition statement, or the more complex if block.
The form of AST generation can be referred to this site, and some AST syntax can be tested from this site
// Define a method to analyze an expression, a method to analyze a statement, and a method to analyze a single statement expression. The whole process is divided into several steps. You have more control over Pointers.
function parse (tokens) {
// The location staging stack is used to support the need to return to a previous location many times
const stashStack = [];
let i = - 1; // Used to identify the current traversal location
let curToken; // To record the current symbol
// Store the current position
function stash () {
stashStack.push(i);
}
// Move the read pointer backward
function nextToken () {
i++;
curToken = tokens[i] || { type: 'EOF' };;
}
function parseFalse () {
// Failed to parse and returned to the previous staging location
i = stashStack.pop();
curToken = tokens[i];
}
function parseSuccess () {
// No return is required
stashStack.pop();
}
const ast = {
type: 'Program'.body: [].sourceType: "script"
};
// Read the next statement
function nextStatement () {
// Store the current I, and return to it if no condition is found
stash();
// Read the next symbol
nextToken();
if (curToken.type === 'identifier' && curToken.value === 'if') {
// Parse the if statement
const statement = {
type: 'IfStatement'};// if must be followed by (
nextToken();
if(curToken.type ! = ='Punctuator'|| curToken.value ! = ='(') {
throw new Error('Expected ( after if');
}
// The following expression is the condition of if
statement.test = nextExpression();
// Must be ()
nextToken();
if(curToken.type ! = ='Punctuator'|| curToken.value ! = =') ') {
throw new Error('Expected ) after if test expression');
}
// The next statement is executed when if is true
statement.consequent = nextStatement();
// If the next symbol is else, there is logic where if fails
if (curToken === 'identifier' && curToken.value === 'else') {
statement.alternative = nextStatement();
} else {
statement.alternative = null;
}
parseSuccess();
return statement;
}
// If it is a block of curly braces
if (curToken.type === 'Punctuator' && curToken.value === '{') {
// Starting with {indicates a code block
const statement = {
type: 'BlockStatement'.body: [],};while (i < tokens.length) {
// Check the next symbol.
stash();
nextToken();
if (curToken.type === 'Punctuator' && curToken.value === '} ') {
//} indicates the end of the code block
parseSuccess();
break;
}
// Restore to the original position and add the next parsed statement to the body
parseFalse();
statement.body.push(nextStatement());
}
// Return the result when the block statement is parsed
parseSuccess();
return statement;
}
// No special statement flag was found. Return to the beginning of the statement
parseFalse();
// Try to parse a single expression statement
const statement = {
type: 'ExpressionStatement'.expression: nextExpression(),
};
if (statement.expression) {
nextToken();
returnstatement; }}// Read the next expression
function nextExpression () {
nextToken();
if (curToken.type === 'identifier' && curToken.value === 'var') {
// If var is defined
const variable = {
type: 'VariableDeclaration'.declarations: [].kind: curToken.value
};
stash();
nextToken();
// A semicolon indicates the end of a single sentence
if(curToken.type === 'Punctuator' && curToken.value === '; ') {
parseSuccess();
throw new Error('error');
} else {
/ / loop
while (i < tokens.length) {
if(curToken.type === 'identifier') {
variable.declarations.id = {
type: 'Identifier'.name: curToken.value
}
}
if(curToken.type === 'Punctuator' && curToken.value === '=') {
nextToken();
variable.declarations.init = {
type: 'Literal'.name: curToken.value
}
}
nextToken();
/ / meet; The end of the
if (curToken.type === 'Punctuator' && curToken.value === '; ') {
break;
}
}
}
parseSuccess();
return variable;
}
// Constant expression
if (curToken.type === 'number' || curToken.type === 'string') {
const literal = {
type: 'Literal'.value: eval(curToken.value),
};
// But if the next symbol is an operator
// We do not consider the connection of multiple operations or the existence of variables
stash();
nextToken();
if (curToken.type === 'operator') {
parseSuccess();
return {
type: 'BinaryExpression'.operator: curToken.value,
left: literal,
right: nextExpression(),
};
}
parseFalse();
return literal;
}
if(curToken.type ! = ='EOF') {
throw new Error('Unexpected token '+ curToken.value); }}// Parse the top-level statement one at a time
while (i < tokens.length) {
const statement = nextStatement();
if(! statement) {break;
}
ast.body.push(statement);
}
return ast;
}
Copy the code
About the transformation and generation, the author is still studying, but generation is actually the reverse of the parsing process, transformation, or quite worth in-depth, because AST this thing is used in many aspects, such as:
- Eslint checks code for errors or styles to find potential bugs
- IDE error, formatting, highlighting, autocomplete, etc
- UglifyJS compression code
- Code packaging tool WebPack
This article is over, in fact, do not understand the code does not matter, the overall train of thought on the line.