Recently, I saw a lot of ES2020 articles from nuggets and front end public account. I want to say: Let go of me, I can still learn!

Can you get away from ES6 on a daily basis?

One, foreword

For the front-end students, the compiler may be suitable for the magic box 🎁, ordinary surface, but often give us a surprise.

A compiler, as the name suggests, compiles, and compiles what? Of course, the compiled code 🌹.



In fact, we often see the use of compiler scenarios:

  • React JSX converts to JS code;
  • Convert ES6 and above specification code to ES5 code via Babel;
  • Convert Less/Scss code to browser-supported CSS code using various Loaders;
  • Convert TypeScript to JavaScript code.
  • and so on…

There are more scenarios than I can count on my hands. 😄 While there are a number of tools available in the community that can do this, it is important to understand how compilation works. Next comes the subject of this article: 200 lines of JS code that will take you through implementing the code compiler.

Second, compiler introduction

2.1 Program running mode

Modern programs have two main compilation modes: static compilation and dynamic interpretation. The recommended article Angular 2 JIT vs AOT covers this in great detail.

Static compilation

Referred to as”AOT(Ahead – Of – Time)To compileA statically compiled program will use the specified compiler to compile all the code into machine code before execution.



(Image from:Segmentfault.com/a/119000000…)



The development process in Angular aoT-compiled mode is as follows:

  • Develop Angular applications using TypeScript
  • Run NGC to compile the application
    • Templates are compiled using the Angular Compiler and typically output TypeScript code
    • Run TSC to compile TypeScript code
  • Build projects using other tools such as Webpack or Gulp, such as code compression, consolidation, etc
  • The deployment of application

Dynamic interpretation

Referred to as”JIT(Just – In – Time)Instantaneous compilingDynamically interpreted programs execute the program as they compile, using the specified interpreter.

(Image from:Segmentfault.com/a/119000000…)



The jIT-compiled Angular development process is as follows:

  • Develop Angular applications using TypeScript
  • Run TSC to compile TypeScript code
  • Build projects using other tools such as Webpack or Gulp, such as code compression, consolidation, etc
  • The deployment of application

AOT vs JIT

AOT compilation process:(Image from:Segmentfault.com/a/119000000…)

JIT compilation process:(Image from:Segmentfault.com/a/119000000…)

features AOT JIT
Compile platform The Server (Server) Browser (Browser)
Compile time Build (Construction phase) Runtime
Packet size smaller larger
performance better
The startup time shorter

In addition, AOT has the following advantages:

  • We don’t need to import the bulky Angular compiler on the client side, which reduces the size of our JS script library
  • Aot-compiled applications no longer contain any HTML fragments, but instead compile generated TypeScript code so that the TypeScript compiler can detect errors ahead of time. In summary, with AOT-compiled mode, our template is type-safe.

2.2 Modern compiler workflow

Excerpt from wikipedia’s description of the compiler workflow:

The main workflow of a modern compiler is as follows: Source code → Preprocessor → Compiler → Assembler → Object Code → linker → Compiler Executables, “executables,” the final package of files that can be read and executed by a computer.

The compiler’s role is highlighted here: it takes the original program as input and translates it into its equivalent in the target language.

Most modern compilers today have similar workflows, consisting of three core phases:

  1. Parsing: Parsing the original code string into an Abstract Syntax Tree through lexical analysis and Parsing;
  2. Transformation: Transforms the abstract syntax tree.
  3. Code Generation: The transformed AST objects generate Code strings for the target language.

Third, compiler implementation

This article will use The Super Tiny Compiler source code interpretation to learn how to implement a lightweight Compiler that ultimately compiles The following raw code strings (Lisp-style function calls) into JavaScript executable code.

Lisp style (before compilation) JavaScript Style (compiled)
2 + 2 (add 2 2) add(2, 2)
4-2 (subtract 4 2) subtract(4, 2)
2 plus 4 minus 2. (add 2 (subtract 4 2)) add(2, subtract(4, 2))


wordsThe Super Tiny CompilerClaims to beProbably the smallest compiler ever built“And its author, James Kyle, is one of Babel’s active defenders.



Let’s get started

3.1 The Super Tiny Compiler workflow

Now look at the three core stages of the compilerThe Super Tiny CompilerCompiler core workflow:



The detailed process in the figure is as follows:

  1. Execute the entry function, entering the original code string as an argument;
// Raw code string
(add 2 (subtract 4 2))
Copy the code
  1. Parsing came in, where the original code string was converted to an array of lexical units through a Tokenizer, which was then converted to an Abstract Syntax Tree through the Parser, and returned.





  1. Transformation. Import the AST object generated in the previous step into the Transformer and convert the code to the new AST object we need through the Traverser in the Transformer.



  1. Enter the Code Generation stage, and convert the new AST object returned in the previous step into JavaScript Code through the CodeGenerator.



  1. When the Code is compiled, JavaScript Code is returned.





Keep your head clear and get an idea of the process. Let’s read the code:

3.2 Entry Method

First, we define an entry method compiler that accepts the original Code string as an argument and returns the final JavaScript Code:

// Compiler entry method argument: raw code string input
function compiler(input) {
  let tokens = tokenizer(input);
  let ast    = parser(tokens);
  let newAst = transformer(ast);
  let output = codeGenerator(newAst);
  return output;
}
Copy the code

3.3 Parsing Phase

In the parsing phase, we define the lexical analyzer methods Tokenizer and parser and implement them respectively:

// Lexical parser argument: raw code string input
function tokenizer(input) {};

// Parser parameters: tokens
function parser(tokens) {};
Copy the code

Lexical analyzer

Lexical analyzer method tokenizerThe primary task of theTokensAnd return.

During traversal, each character is matched and processed intoLexical unitPush theArray of lexical units, such as when matching to the left parenthesis ((), will goTokensInto oneLexical unit object({type: 'paren', value:'('}).



// Lexical parser argument: raw code string input
function tokenizer(input) {
  let current = 0;  // The current parsed character index, as a cursor
  let tokens = [];  // Initializes the lexical unit array
  // Loop through the raw code string, reading the lexical unit array
  while (current < input.length) {
    let char = input[current];
    {type: 'paren', value:'(')}
    if (char === '(') {
      tokens.push({
        type: 'paren'.value: '('
      });
      current++;
      continue; // Add current to complete the loop and enter the next loop
    }
    {type: 'paren', value:')'}
    if (char === ') ') {
      tokens.push({
        type: 'paren'.value: ') '
      });
      current++;
      continue;
    }
    
    // Matches whitespace characters. If the match is successful, skip it
    // Use \s matches, including Spaces, tabs, page feeds, line feeds, vertical tabs, etc
    let WHITESPACE = /\s/;
    if (WHITESPACE.test(char)) {
      current++;
      continue;
    }
    // To match numeric characters, use [0-9] : to match
    {type: 'number', value: value}
    For example, 123 and 456 are two numeric lexical units
    let NUMBERS = / [0-9];
    if (NUMBERS.test(char)) {
      let value = ' ';
      // Matches consecutive numbers as numeric values
      while (NUMBERS.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({ type: 'number', value });
      continue;
    }
    // Matches the string surrounded by double quotes
    {type: 'string', value: value}
    For example, "foo" and "bar" in (concat "foo" "bar") are two string lexical units
    if (char === '"') {
      let value = ' ';
      char = input[++current]; // Skip the left double quotation mark
      // Get all characters between two double quotes
      while(char ! = ='"') {
        value += char;
        char = input[++current];
      }
      char = input[++current];// Skip the right double quotation mark
      tokens.push({ type: 'string', value });
      continue;
    }
    Use [a-z] to match the I pattern
    {type: 'name', value: value}
    // In (add 2 4) add is a name lexical unit
    let LETTERS = /[a-z]/i;
    if (LETTERS.test(char)) {
      let value = ' ';
      // Get consecutive characters
      while (LETTERS.test(char)) {
        value += char;
        char = input[++current];
      }
      tokens.push({ type: 'name', value });
      continue;
    }
    // When an unrecognized character is encountered, an error message is thrown and exit
    throw new TypeError('I dont know what this character is: ' + char);
  }
  // The lexical parser returns an array of lexical units at the end
  return tokens;
}
Copy the code

Parser

Parser method parserThe main task: willLexical analyzerThe returnedArray of lexical units, to intermediate forms that describe grammatical components and their relationships (Abstract syntax tree AST).



// Parser parameters: tokens
function parser(tokens) {
  let current = 0; // Sets the index of the currently parsed lexical unit as a cursor
  // Iterate recursively (since function calls allow nesting), turning lexical units into AST nodes of LISP
  function walk() {
    // Get the lexical unit token under the current index
    let token = tokens[current]; 

    // Numeric type lexical unit
    if (token.type === 'number') {
      current++; // Increments the current value
      // Generate an AST node, 'NumberLiteral', representing numeric literals
      return {
        type: 'NumberLiteral'.value: token.value,
      };
    }

    // String type lexical unit
    if (token.type === 'string') {
      current++;
      // generate an AST node, 'StringLiteral', representing a StringLiteral
      return {
        type: 'StringLiteral'.value: token.value,
      };
    }

    // Function type lexical unit
    if (token.type === 'paren' && token.value === '(') {
      // Skip the open parenthesis and get the next lexical unit as the function name
      token = tokens[++current];

      let node = {
        type: 'CallExpression'.name: token.value,
        params: []};// Increment the current variable again to get the lexical unit of the parameter
      token = tokens[++current];

      // Iterates through each lexical unit to get function arguments until close parenthesis ") "appears
      while((token.type ! = ='paren') || (token.type === 'paren'&& token.value ! = =') ')) {
        node.params.push(walk());
        token = tokens[current];
      }

      current++; // Skip the closing parenthesis
      return node;
    }
    // Unrecognized character, an error message is thrown
    throw new TypeError(token.type);
  }

  // Initialize the AST root node
  let ast = {
    type: 'Program'.body: [],};// Loop to fill ast.body
  while (current < tokens.length) {
    ast.body.push(walk());
  }

  // Finally return ast
  return ast;
}
Copy the code

3.4 Conversion Phase

In the transformation phase, transformer functions are defined to convert the AST object into a new AST object, using the LISP AST object returned by the lexical analyzer as a parameter. To facilitate code organization, we define a traverser method that handles each node operation.

// Traverser parameters: AST and visitor
function traverser(ast, visitor) {
  // Define the method traverseArray
  // traverseNode is used to traverse an array of AST nodes, invoking the traverseNode method for each element in the array.
  function traverseArray(array, parent) {
    array.forEach(child= > {
      traverseNode(child, parent);
    });
  }

  TraverseNode traverseNode
  // For each AST node, take a node and its parent as arguments
  function traverseNode(node, parent) {
    // Get the object of the corresponding method on the visitor
    let methods = visitor[node.type];
    // Get the Enter method of visitor to process the current node
    if (methods && methods.enter) {
      methods.enter(node, parent);
    }

    switch (node.type) {
      / / the root node
      case 'Program':
        traverseArray(node.body, node);
        break;
      // Function call
      case 'CallExpression':
        traverseArray(node.params, node);
        break;
      // Values and strings are ignored
      case 'NumberLiteral':
      case 'StringLiteral':
        break;

      // When an unrecognized character is encountered, an error message is thrown and exit
      default:
        throw new TypeError(node.type);
    }
    if(methods && methods.exit) { methods.exit(node, parent); }}// For the first time, start traversal
  traverseNode(ast, null);
}
Copy the code

When looking at the traverser method, it is recommended to read it in conjunction with the transformer method described below:

// Converter parameters: AST
function transformer(ast) {
  // create newAST, similar to the previous AST, Program: as the root node of the newAST
  let newAst = {
    type: 'Program'.body: [],};// Maintain the old and new AST through _context. Note that _context is a reference from the old AST to the new AST.
  ast._context = newAst.body;

  // Process the old AST through traversal
  traverser(ast, {
    // A numeric value, directly insert the new AST as is. The type name is NumberLiteral
    NumberLiteral: {
      enter(node, parent) {
        parent._context.push({
          type: 'NumberLiteral'.value: node.value, }); }},// A string, directly insert the new AST as is. The type name is StringLiteral
    StringLiteral: {
      enter(node, parent) {
        parent._context.push({
          type: 'StringLiteral'.value: node.value, }); }},// Function call
    CallExpression: {
      enter(node, parent) {
        // Create different AST nodes
        let expression = {
          type: 'CallExpression'.callee: {
            type: 'Identifier'.name: node.name,
          },
          arguments: [],};// Function calls have subclasses that establish node mappings for use by child nodes
        node._context = expression.arguments;

        // Top-level function calls are statements wrapped in special AST nodes
        if(parent.type ! = ='CallExpression') {

          expression = {
            type: 'ExpressionStatement'.expression: expression, }; } parent._context.push(expression); }}});return newAst;
}
Copy the code

Importantly, the _context reference is used to maintain the old and new AST objects, which is easy to manage and avoids polluting the old AST objects.

3.5 Code Generation

In the final step, we define a codeGenerator method that recursively converts the new AST object code into a JavaScript executable code string.

// Code generator argument: new AST object
function codeGenerator(node) {

  switch (node.type) {
    // Iterate over the nodes in the body property and recursively call codeGenerator to output the result line by line
    case 'Program':
      return node.body.map(codeGenerator)
        .join('\n');

    // expression, which processes the expression content and ends with a semicolon
    case 'ExpressionStatement':
      return (
        codeGenerator(node.expression) +
        '; '
      );

    // Function call, add left and right parentheses, arguments separated by commas
    case 'CallExpression':
      return (
        codeGenerator(node.callee) +
        '(' +
        node.arguments.map(codeGenerator)
          .join(', ') +
        ') '
      );

    // Return the name of the identifier
    case 'Identifier':
      return node.name;
    // A value is returned
    case 'NumberLiteral':
      return node.value;

    // A string wrapped in double quotation marks
    case 'StringLiteral':
      return '"' + node.value + '"';

    // When an unrecognized character is encountered, an error message is thrown and exit
    default:
      throw new TypeError(node.type); }}Copy the code

3.6 Compiler tests

As of the previous step, we finished the code development of the simple compiler. Now test how the compiler works by using the previous code for the original requirement:

const add = (a, b) = > a + b;
const subtract = (a, b) = > a - b;
const source = "(add 2 (subtract 4 2))";
const target = compiler(source); // "add(2, (subtract(4, 2));"

const result = eval(target); // Ok result is 4
Copy the code

3.7 Work process summary

Summary of The Super Tiny Compiler workflow: Tokens => Parser => AST ast => Transformer => newAst 4, newAst => Generator => Output

Most compilers work in much the same way:

4. Handwritten Webpack compiler

According to the introductionThe Super Tiny CompilerCompiler core workflow, and then handwritten Webpack compiler, you will have a feeling of enjoying the silky ~

You know, some interviewers like to ask this question. Of course, writing it by hand gives us a better idea of the Webpack build process, which we’ll briefly cover in this section.

4.1 Analysis of Webpack construction process

A sequence of steps from starting the build to producing the results:

  1. Initialization parameter

Parses the Webpack configuration parameters and merges the parameters passed in by the Shell and configured by the webpack.config.js file to form the final configuration result.

  1. Begin to compile

The parameter obtained in the previous step initializes the Compiler object, registers all configured plug-ins, and the plug-in listens to the event nodes of the Webpack build life cycle, reacts accordingly, and executes the object’s run method to start compiling.

  1. Determine the entrance

From the entry entry of the configuration, start parsing the file to build the AST syntax tree, find the dependencies, and recursively continue.

  1. Compile the module

In the recursion, according to the file type and Loader configuration, all configured Loaders are called to convert files, and then the module dependent on the module is found, and then this step is recursed until all dependent files have been processed in this step.

  1. Complete module compilation and output

At the end of the recursion, the result of each file, including each module and the dependencies between them, generates chunk of code according to the Entry configuration.

  1. Output complete

Output allchunkTo the file system.



Note: There are a number of plug-ins in the build life cycle that do the right thing at the right time, such asUglifyPluginThe result is used after the loader transforms recursivelyUglifyJsThe compressionOverwrite the previous results.

4.2 Code Implementation

Handwriting Webpack needs to implement the following three core methods:

  • createAssets: code for collecting and processing files;
  • createGraph: According to the entry file, return all file dependency graph;
  • bundle: Output the entire code according to the dependency graph;

1. createAssets

function createAssets(filename){
    const content = fs.readFileSync(filename, "utf-8"); // Read the contents of the file according to the file name
  
  	// Convert the read code content into an AST
    const ast = parser.parse(content, {
        sourceType: "module" // Specify the source type
    })
    const dependencies = []; // The path to collect file dependencies

  	Traverse the AST to get the dependent paths for each node
    traverse(ast, {
        ImportDeclaration: ({node}) = >{ dependencies.push(node.source.value); }});// Convert ES6 code to ES5 code via AST
    const { code } = babel.transformFromAstSync(ast, null, {
        presets: ["@babel/preset-env"]});let id = moduleId++;
    return {
        id,
        filename,
        code,
        dependencies
    }
}
Copy the code

2. createGraph

function createGraph(entry) {
    const mainAsset = createAssets(entry); // Get the contents of the entry file
    const queue = [mainAsset];
    for(const asset of queue){
        const dirname = path.dirname(asset.filename);
        asset.mapping = {};
        asset.dependencies.forEach(relativePath= > {
            const absolutePath = path.join(dirname, relativePath); // Convert the file path to an absolute path
            const child = createAssets(absolutePath);
            asset.mapping[relativePath] = child.id;
            queue.push(child); // Recursively iterate through the files of all child nodes})}return queue;
}
Copy the code

3. bunlde

function bundle(graph) {
    let modules = "";
    graph.forEach(item= > {
        modules += `
            ${item.id}: [
                function (require, module, exports){
                    ${item.code}
                },
                The ${JSON.stringify(item.mapping)}
            ],
        `
    })
    return ` (function(modules){ function require(id){ const [fn, mapping] = modules[id]; function localRequire(relativePath){ return require(mapping[relativePath]); } const module = { exports: {} } fn(localRequire, module, module.exports); return module.exports; } require(0); ({})${modules}})
    `
}
Copy the code

Five, the summary

This article starts with compiler concepts and basic workflows, and then goes throughThe Super Tiny CompilerInterpreter source code, detailed implementation of the core workflow, includingLexical analyzer,Parser,Traverse deviceandconverterThe basic implementation of the final passCode generator, the various stages of the code together, to achieve this so-calledProbably the smallest compiler ever built.

This article also briefly introducedHandwriting Webpack implementation, need readers to improve and in-depth yo! Don’t you think it’s amazing



Of course, learning through this article is only the edge of the compiler related knowledge, there are a lot of knowledge to learn, but a good start, can promote our learning motivation. Come on!



Finally, the code described in this article is stored on Github:

  1. [learning]the-super-tiny-compiler.js
  2. [writing]webpack-compiler.js

Vi. Reference materials

  1. The Super Tiny Compiler
  2. The smallest compiler source code parsing ever
  3. Angular 2 JIT vs AOT