Compilation principle

It is recommended to learn the-super-tiny-Compiler before learning Webpack, mainly because its code is relatively short and easy to understand. What Webpack mainly does is to convert the relevant syntax sugar (JSX/ ES6 etc.) we wrote into the code that the browser engine can execute. In Webpack, Babel works on the same principle as “the-super-tiny-Compiler”. By learning the the-super-tiny-Compiler, Understand AST and compilation principles. If we enter the following code block

    const input = '(add 2 (subtract 4 2))'
Copy the code

The expected conversion output is

   const output = 'add(2, subtract(4, 2)); '
Copy the code

By studying the “the-super-tiny-compiler”, you will know which steps the input can take to output the output

Lexical analysis tokenizer

{type: ‘type’, value: ‘value’}

function tokenizer(input) {

  // The current variable is used to record the position of the code, similar to the cursor concept
  let current = 0;

  // Tokens are the results of recording input input transformations
  let tokens = [];
  
  // Start traversing the string input
  while (current < input.length) {
    // char Records the characters traversed to the current position
    let char = input[current];
    // Check if it is left parenthesis' (')
    if (char === '(') {
      // The type of the left parenthesis is indicated by 'paren'
      tokens.push({
        type: 'paren'.value: '('});/ / pointer + 1
      current++;
      continue;
    }

    // The closing parenthesis' (' is handled the same way as the opening parenthesis above
    if (char === ') ') {
      tokens.push({
        type: 'paren'.value: ') '}); current++;continue;
    }

    // Handle Spaces. For Spaces that do not need to be stored in tokens, move the current position
    let WHITESPACE = /\s/;
    if (WHITESPACE.test(char)) {
      current++;
      continue;
    }

    // Process data
    // (add 123 456)
    / / ^ ^ ^ ^ ^ ^
  
    let NUMBERS = / [0-9];
    if (NUMBERS.test(char)) {
      // Record the number
      let value = ' ';
      // Check whether the char in the next position is a number
      while (NUMBERS.test(char)) {
        value += char;
        char = input[++current];
      }
      // Store continuous numbers in tokens. Type is number
      tokens.push({ type: 'number', value });
      continue;
    }

    // Process strings
    // (concat "foo" "bar")
    // ^^^ ^^^ string tokens
    // Use double quotation marks to start and end the number
  
    if (char === '"') {
      let value = ' ';
      char = input[++current];
      while(char ! = ='"') {
        value += char;
        char = input[++current];
      }
      char = input[++current];
      tokens.push({ type: 'string', value });
      continue;
    }

    // The method name is handled in the same way as the above string, except that the above string can be interpreted as a variable and needs to be wrapped in double quotation marks
    // Name token
    // (add 2 4)
    //    ^^^

    let LETTERS = /[a-z]/i;
    if (LETTERS.test(char)) {
      let value = ' ';
      while (LETTERS.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({ type: 'name', value });

      continue;
    }

    // Only parentheses/Spaces/numbers/strings/function names are handled
    throw new TypeError('I dont know what this character is: ' + char);
  }

  return tokens;
}
Copy the code

Tokens = Tokenizer (‘(add 2 (Subtract 4 2))’

tokens = [
    { type: 'paren'.value: '('        },
    { type: 'name'.value: 'add'      },
    { type: 'number'.value: '2'        },
    { type: 'paren'.value: '('        },
    { type: 'name'.value: 'subtract' },
    { type: 'number'.value: '4'        },
    { type: 'number'.value: '2'        },
    { type: 'paren'.value: ') '        },
    { type: 'paren'.value: ') '}];Copy the code

Parser

The main function of the Parser is to convert tokens into AST structures

function parser(tokens) {

  // current is still the position of the current pointer
  let current = 0;
  
  // Walk is a recursive process
  function walk() {
    let token = tokens[current];
    
    // Handle the token for number
    if (token.type === 'number') {
      current++;
      // The ast uses NumberLiteral to record types of type number
      return {
        type: 'NumberLiteral'.value: token.value,
      };
    }

    // Handle the token that type is string
    if (token.type === 'string') {
      current++;
      // Type of string in the AST is recorded with StringLiteral
      return {
        type: 'StringLiteral'.value: token.value,
      };
    }

    // process the expressions, type: 'paren', starting with open parentheses, and continue to search until close parentheses end. Ast is CallExpressions record type, where node has params parameter
    if (
      token.type === 'paren' &&
      token.value === '('
    ) {

      token = tokens[++current];
   
      let node = {
        type: 'CallExpression'.name: token.value,
        params: [],}; token = tokens[++current];// Find tokens whose type is paren and end with a close parenthesis.
      while( (token.type ! = ='paren') ||
        (token.type === 'paren'&& token.value ! = =') ')) {// Perform recursive processing on non-end tokens
        node.params.push(walk());
        token = tokens[current];
      }
      current++;
      return node;
    }

    // Handle the number/string/paren type exception
    throw new TypeError(token.type);
  }

  // Create the root node of the AST. Type is Program
  let ast = {
    type: 'Program'.body: [],};// ast.body is a recursive walk of the token
  while (current < tokens.length) {
    ast.body.push(walk());
  }
  return ast;
}
Copy the code

The ast data is obtained by executing ast = Parser (token). The result is shown below

  const ast = {
    type: 'Program'.body: [{
      type: 'CallExpression'.name: 'add'.params: [{
        type: 'NumberLiteral'.value: '2'
      }, {
        type: 'CallExpression'.name: 'subtract'.params: [{
          type: 'NumberLiteral'.value: '4'
        }, {
          type: 'NumberLiteral'.value: '2'}}}}]]];Copy the code

traverser

The traversal here is mainly called in Transformer, so understand how it works in advance. Take two parameters ast and visitor, where the structure of parameter AST is shown above, and the roughly structure of visitor is shown below

    visitor = {
     NumberLiteral: {enter(node, parent) {}},
     StringLiteral: {enter(node, parent) {}},
     CallExpression: {enter(node, parent){}}},Copy the code

The traverser method essentially traverses the old AST, performing the Enter method passed by the visitor according to the corresponding type

function traverser(ast, visitor) {

  // Execute the traverseNode method for array arrays
  function traverseArray(array, parent) {
    array.forEach(child= > {
      traverseNode(child, parent);
    });
  }

  function traverseNode(node, parent) {

    // methods are mainly methods to get the type of visitor
    let methods = visitor[node.type];

    // Execute methods
    if (methods && methods.enter) {
      methods.enter(node, parent);
    }

  
    // traverseArray processes each type of child node in turn
    switch (node.type) {
      / / the root node
      case 'Program':
        traverseArray(node.body, node);
        break;
      // Expression node
      case 'CallExpression':
        traverseArray(node.params, node);
        break;
      // Number/String No child node, no action required
      case 'NumberLiteral':
      case 'StringLiteral':
        break;
      default:
        throw new TypeError(node.type);
    }

    if (methods && methods.exit) {
      methods.exit(node, parent);
    }
  }
  traverseNode(ast, null);
}
Copy the code

Convert the transformer

Transformer mainly processes the data from the AST into a new AST

function transformer(ast) {

  // Create the root node of the new AST
  let newAst = {
    type: 'Program'.body: [],};// Attach the new AST to the _context of the old AST (this changes the structure of the old AST and causes data pollution). The tricky point here is the following operation node._context, which looks like the operation of the old AST. So newAst's code will change
  ast._context = newAst.body;

  // Execute the above traverser method
  traverser(ast, {
  
    NumberLiteral: {
      enter(node, parent) {
        // Insert child nodes of type number into parent's _context property
        parent._context.push({
          type: 'NumberLiteral'.value: node.value, }); }},StringLiteral: {
      enter(node, parent) {
        // Insert child nodes of type string into parent's _context property
        parent._context.push({
          type: 'StringLiteral'.value: node.value, }); }},CallExpression: {
      enter(node, parent) {

        // Add callee to record name/type,
        let expression = {
          type: 'CallExpression'.callee: {
            type: 'Identifier'.name: node.name,
          },
          arguments: [],}; node._context = expression.arguments;if(parent.type ! = ='CallExpression') {
          // Add a type that represents an expression statement
          expression = {
            type: 'ExpressionStatement'.expression: expression, }; } parent._context.push(expression); }}});return newAst;
}
Copy the code

Which newAst = transformer (ast), and the result is shown below, a change in the structure of CallExpression, much ExpressionStatement/Identifier type at the same time

 const newAst = {
    type: 'Program'.body: [{
      type: 'ExpressionStatement'.expression: {
        type: 'CallExpression'.callee: {
          type: 'Identifier'.name: 'add'
        },
        arguments: [{
          type: 'NumberLiteral'.value: '2'
        }, {
          type: 'CallExpression'.callee: {
            type: 'Identifier'.name: 'subtract'
          },
          arguments: [{
            type: 'NumberLiteral'.value: '4'
          }, {
            type: 'NumberLiteral'.value: '2'}]}]}}]};Copy the code

The generated code

The last step is to iterate through the new newAst, recursively processing the types inside, and converting them to the final type result that the machine, etc., can recognize or expect

function codeGenerator(node) {

  switch (node.type) {

    case 'Program':
      return node.body.map(codeGenerator)
        .join('\n');

    // Add semicolons to expressions
    case 'ExpressionStatement':
      return (
        codeGenerator(node.expression) +
        '; '
      );
    
    case 'CallExpression':
      return (
        codeGenerator(node.callee) +
        '(' +
        node.arguments.map(codeGenerator)
          .join(', ') +
        ') '
      );
 
    case 'Identifier':
      return node.name;

    case 'NumberLiteral':
      return node.value;

    case 'StringLiteral':
      return '"' + node.value + '"';

    default:
      throw new TypeError(node.type); }}Copy the code

The result is output = add(2, subtract(4, 2)), and the compiler method can be seen in the code

function compiler(input) {
  let tokens = tokenizer(input);
  let ast    = parser(tokens);
  let newAst = transformer(ast);
  let output = codeGenerator(newAst);

  // and simply return the output!
  return output;
}
Copy the code

In fact, it is mainly divided into the following four parts to complete the compilation

  1. Change the input string to tokens according to lexical analysis
  2. Convert tokens into simple AST structures based on syntax analysis
  3. The AST structure is modified based on the desired visitor to generate a new newAst that more closely matches the structure of the AST site
  4. Finally, convert newAst to

Refer to the link

  1. The super – tiny – a compiler (github.com/jamiebuilds…).
  2. ast(astexplorer.net/#)
  3. Compilation principles (www.ayqy.net/blog/the-su…