preface

I recently shared a Babel principle with the team, and I have put it into a blog post. It has 6059 words (including code), 3 minutes for speed reading and 5 minutes for general reading. If you are interested, check out my Github blog

babel

Let’s look at some code:

[1.2.3].map(n= > n + 1);
Copy the code

After Babel, the code looks like this:

[1.2.3].map(function (n) {
  return n + 1;
});
Copy the code

Babel behind

Babel’s process: parse – transform – generate.

And then there’s another middle thing here, which is the Abstract Syntax tree.

AST parsing process

How does a JS statement parse into an AST? There are two steps in the middle, one is participle, the second is semantic analysis, how to understand these two things?

  • participles

What is a participle?

For example, when we are reading a sentence, we also do participles, for example, “It’s a beautiful day,” and we cut it into “today,” “the weather,” and “nice.”

To replace the js parser, let’s look at the following statement console.log(1); ,js will see console,.,log,(,1,),; .

So we can put the smallest lexical unit that the JS parser can recognize.

Of course, we can easily implement such a participle.

// The string parameter is passed in, and then it is checked one character at a time. The result is stored in an array, with special processing for identifiers and numbers
function tokenCode(code) {
    const tokens = [];
    // String loop
    for(let i = 0; i < code.length; i++) {
        let currentChar = code.charAt(i);
        // is the case with semicolon brackets
        if (currentChar === '; ' || currentChar === '(' || currentChar === ') ' || currentChar === '} ' || currentChar === '{' || currentChar === '. ' || currentChar === '=') {
            // For a single character syntax unit, add it directly to the result
            tokens.push({
              type: 'Punctuator'.value: currentChar,
            });
            continue;
        }
        // is the case of the operator
        if (currentChar === '>' || currentChar === '<' || currentChar === '+' || currentChar === The '-') {
            // Similar to the previous step except for the syntax unit type
            tokens.push({
              type: 'operator'.value: currentChar,
            });
            continue;
        }      
        // This is the case with double or single quotes
        if (currentChar === '"' || currentChar === '\' ') {
            // The quotation marks indicate the beginning of a character transmission
            const token = {
              type: 'string'.value: currentChar,       // Record the current contents of the syntax unit
            };
            tokens.push(token);
      
            const closer = currentChar;
      
            // Iterates through a nested loop to find the end of the string
            for (i++; i < code.length; i++) {
              currentChar = code.charAt(i);
              // Add the current traversal character unconditionally to the contents of the string
              token.value += currentChar;
              if (currentChar === closer) {
                break; }}continue;
          }
        if (/ [0-9].test(currentChar)) {
            // Numbers start with characters from 0 to 9
            const token = {
              type: 'number'.value: currentChar,
            };
            tokens.push(token);
      
            for (i++; i < code.length; i++) {
              currentChar = code.charAt(i);
              if (/ / [0-9 \].test(currentChar)) {
                // If the character traversed is still part of the number (0 through 9 or decimal point)
                // We will not consider the case of multiple decimal points and other bases
                token.value += currentChar;
              } else {
                // Exit if you encounter a character that is not a number.
                // Because the current character is not part of the number, it needs to be parsed later
                i--;
                break; }}continue;
          }
      
          if (/[a-zA-Z\$\_]/.test(currentChar)) {
            // Identifiers start with letters, $, and _
            const token = {
              type: 'identifier'.value: currentChar,
            };
            tokens.push(token);
      
            // The same is true with numbers
            for (i++; i < code.length; i++) {
              currentChar = code.charAt(i);
              if (/[a-zA-Z0-9\$\_]/.test(currentChar)) {
                token.value += currentChar;
              } else {
                i--;
                break; }}continue;
          }
          
          if (/\s/.test(currentChar)) {
            // Consecutive whitespace characters are grouped together
            const token = {
              type: 'whitespace'.value: currentChar,
            };      
            // The same is true with numbers
            for (i++; i < code.length; i++) {
              currentChar = code.charAt(i);
              if (/\s]/.test(currentChar)) {
                token.value += currentChar;
              } else {
                i--;
                break; }}continue;
          }
          throw new Error('Unexpected ' + currentChar);
        }
    return tokens;
}
Copy the code
  • Semantic analysis

Semantic analysis is more difficult. Why?

Because this is not like participles so there is a standard, some things have to rely on their own to explore.

In fact, semantic analysis is divided into two parts, one is the statement, and the other is the expression.

What is a statement? What is an expression?

Expressions such as a > b; a + b; This kind of thing, can be nested, can also be used in statements.

Var a = 1, b = 2, c =3; And so on, we understand a statement in. It is similar to a sentence in Chinese.

Of course, one might ask, console.log(1); What is this?

In fact, this case can be classified as a single statement expression, which you can either view as an expression or as a statement, where an expression becomes a statement.

Now that we’re done, we can try to write a simpler statement analysis here. Such as the var definition statement, or the more complex if block.

The form of AST generation can be referred to this site, and some AST syntax can be tested from this site

// Define a method to analyze an expression, a method to analyze a statement, and a method to analyze a single statement expression. The whole process is divided into several steps. You have more control over Pointers.

function parse (tokens) {
    // The location staging stack is used to support the need to return to a previous location many times
    const stashStack = [];
    let i = - 1;     // Used to identify the current traversal location
    let curToken;   // To record the current symbol

    // Store the current position
    function stash () {
        stashStack.push(i);
    }
      // Move the read pointer backward
    function nextToken () {
        i++;
        curToken = tokens[i] || { type: 'EOF' };;
    }

    function parseFalse () {
      // Failed to parse and returned to the previous staging location
      i = stashStack.pop();
      curToken = tokens[i];
    }

    function parseSuccess () {
      // No return is required
      stashStack.pop();
    }
  
    const ast = {
        type: 'Program'.body: [].sourceType: "script"
    };

  // Read the next statement
  function nextStatement () {
    // Store the current I, and return to it if no condition is found
    stash();
    
    // Read the next symbol
    nextToken();

    if (curToken.type === 'identifier' && curToken.value === 'if') {
      // Parse the if statement
      const statement = {
        type: 'IfStatement'};// if must be followed by (
      nextToken();
      if(curToken.type ! = ='Punctuator'|| curToken.value ! = ='(') {
        throw new Error('Expected ( after if');
      }

      // The following expression is the condition of if
      statement.test = nextExpression();

      // Must be ()
      nextToken();
      if(curToken.type ! = ='Punctuator'|| curToken.value ! = =') ') {
        throw new Error('Expected ) after if test expression');
      }

      // The next statement is executed when if is true
      statement.consequent = nextStatement();

      // If the next symbol is else, there is logic where if fails
      if (curToken === 'identifier' && curToken.value === 'else') {
        statement.alternative = nextStatement();
      } else {
        statement.alternative = null;
      }
      parseSuccess();
      return statement;
    }
    // If it is a block of curly braces
    if (curToken.type === 'Punctuator' && curToken.value === '{') {
      // Starting with {indicates a code block
      const statement = {
        type: 'BlockStatement'.body: [],};while (i < tokens.length) {
        // Check the next symbol.
        stash();
        nextToken();
        if (curToken.type === 'Punctuator' && curToken.value === '} ') {
          //} indicates the end of the code block
          parseSuccess();
          break;
        }
        // Restore to the original position and add the next parsed statement to the body
        parseFalse();
        statement.body.push(nextStatement());
      }
      // Return the result when the block statement is parsed
      parseSuccess();
      return statement;
    }
    
    // No special statement flag was found. Return to the beginning of the statement
    parseFalse();

    // Try to parse a single expression statement
    const statement = {
      type: 'ExpressionStatement'.expression: nextExpression(),
    };
    if (statement.expression) {
      nextToken();
      returnstatement; }}// Read the next expression
  function nextExpression () {
    nextToken();
    if (curToken.type === 'identifier' && curToken.value === 'var') {
      // If var is defined
        const variable = {
          type: 'VariableDeclaration'.declarations: [].kind: curToken.value
        };
        stash();
        nextToken();
        // A semicolon indicates the end of a single sentence
        if(curToken.type === 'Punctuator' && curToken.value === '; ') {
          parseSuccess();
          throw new Error('error');
        } else {
          / / loop
          while (i < tokens.length) {
            if(curToken.type === 'identifier') {
              variable.declarations.id = {
                type: 'Identifier'.name: curToken.value
              }
            }
            if(curToken.type === 'Punctuator' && curToken.value === '=') {
              nextToken();
              variable.declarations.init = {
                type: 'Literal'.name: curToken.value
              }
            }
            nextToken();
            / / meet; The end of the
            if (curToken.type === 'Punctuator' && curToken.value === '; ') {
              break;
            }
          }
        }
        parseSuccess();
        return variable;
    }
      // Constant expression
    if (curToken.type === 'number' || curToken.type === 'string') {
      const literal = {
        type: 'Literal'.value: eval(curToken.value),
      };
      // But if the next symbol is an operator
      // We do not consider the connection of multiple operations or the existence of variables
      stash();
      nextToken();
      if (curToken.type === 'operator') {
        parseSuccess();
        return {
          type: 'BinaryExpression'.operator: curToken.value,
          left: literal,
          right: nextExpression(),
        };
      }
      parseFalse();
      return literal;
    }

    if(curToken.type ! = ='EOF') {
      throw new Error('Unexpected token '+ curToken.value); }}// Parse the top-level statement one at a time
  while (i < tokens.length) {
    const statement = nextStatement();
    if(! statement) {break;
    }
    ast.body.push(statement);
  }
  return ast;
}
Copy the code

About the transformation and generation, the author is still studying, but generation is actually the reverse of the parsing process, transformation, or quite worth in-depth, because AST this thing is used in many aspects, such as:

  • Eslint checks code for errors or styles to find potential bugs
  • IDE error, formatting, highlighting, autocomplete, etc
  • UglifyJS compression code
  • Code packaging tool WebPack

This article is over, in fact, do not understand the code does not matter, the overall train of thought on the line.