Babel part 1: Architecture and Principles + actual combat

National Day holiday, I am still using the fragments of time in writing articles, do not know the long holiday there is no one to see, try the water!

This article series will take you through the basics of Babel. This series will be divided into two parts. The first part will focus on the architecture and principles of Babel. Babel-plugin-macros, which can be used to write macros in Javascript,

✨ Full of dry goods, not to be missed. Writing is not easy, but liking is the greatest encouragement.

Note: This is not a basic tutorial for using Babel! If you’re not familiar with Babel, check out the official website, or this user manual

The next post has been updated:Babel: Plugin, MacrosIt’s a little quiet, nice. Welcome to reprint, let more people see my article, reprint please indicate the source

The article Outlines

Babel process
The Babel of architecture
Visitor pattern
- Traversal of nodes
- Context of a node
- Treatment of side effects
- Scoped processing
Get a plugin
The last
extension

Babel process

Babel process

The above illustration shows Babel’s process, which should be quite familiar to those who have studied the principles of compilers.

Starting with source Parsing, Parsing consists of two steps:

1️ Lexical Analysis: The Tokenizer converts the code in the form of string into Tokens at this stage. Tokens can be regarded as an array composed of some grammar fragments. For (const item of items) {}

As you can see from the figure above, each Token contains a syntax fragment, location information, and some type information. This information is useful for subsequent parsing.

2️ Syntactic Analysis: The Parser transforms Tokens into an Abstract Syntax Tree (AST) at this stage

What is an AST?

It is an ‘object tree’ that represents the syntax structure of the code. For example, console.log(‘ Hello world’) will parse into:

Program, CallExpression, and Identifier are all types of nodes, and each node is a meaningful syntactic unit. These node types define attributes that describe information about the node.

JavaScript syntax has become increasingly complex, and Babel supports JSX, Flow, and now Typescript in addition to the latest JavaScript specification syntax. Imagine how many node types there are in an AST. We don’t need to remember that many types and can’t. Plugin developers use ASTExplorer to review the parsed AST tree, which is very powerful 👍.

The AST is the core data structure for Babel translation, and subsequent operations depend on the AST.

Next comes the Transform, which traverses the AST, adding, deleting, and changing nodes during the process. All Babel plug-ins work in this phase, such as syntax transformation and code compression.

Javascript In Javascript Out. The final stage is to convert the AST back to Javascript In string form. This stage also generates the Source Map.

The Babel of architecture

In Looking beyond the superficial: Common Front-end Architectural Styles and Examples 🔥, I mentioned that Both Babel and Webpack use a microkernel architectural style to accommodate complex customization requirements and frequent functional changes. This means they have a very small core, and most of their functionality is extended by plug-ins.

So a brief overview of Babel’s architecture and some basic concepts, as well as an understanding of what to expect in subsequent articles, and how to use Babel is helpful.

A picture is worth a thousand words. Those of you who have read my articles carefully will notice that my style is to use pictures, not words, and words, not code. Although my original articles are long, the pictures are worth looking at.

Babel is a MonoRepo project, but the organization is very clear. The following is a breakdown of the modules we can see in the source code, along with the architecture diagram above to give you a general idea of Babel:

1 ️ ⃣ core:

This is also the ‘kernel’ in the ‘microkernel’ architecture mentioned above. For Babel, the kernel does these things:

Load and Process configuration (Config)
Load the plug-in
callParserParsing and generatingAST
callTraverserTraversing the AST and usingVisitor patternApply the ‘plug-in’ to transform the AST
Generate code, including SourceMap transformation and source code generation

2️ core peripheral support

Parser(@babel/ Parser) : This is how source code is parsed into an AST. It already has built-in support for a lot of syntax. Examples include JSX, Typescript, Flow, and the latest ECMAScript specification. Currently, parser does not support extensions and is maintained officially for efficiency. If you want to support custom syntax, you can fork it, but this is rare.
Traverser(@babel/ Traverse) : Implements the visitor mode, traversing the AST through which the Traverser picks up interested AST nodes and traverses them. The Traverser mode is described in more detail below.
Generator(@babel/ Generator) : Converts the AST into source code, supporting SourceMap

3 ️ ⃣ plug-in

Open the source code for Babel and you’ll find several types of ‘plug-ins’.

Syntax plugin (@babel/plugin-syntax-*) : As stated above, @babel/ Parser already supports many JavaScript syntax features, and Parser does not support extensions. So plugin-syntax-* is really just used to enable or configure a feature of the Parser.

The average user doesn’t need to worry about this. The Transform plugin already includes the plugin-syntax-* plugin. Users can also configure the Parser directly using the parserOpts configuration item
Transformation plug-in: Used to transform the AST to ES5 code, compression, function enhancement, etc. The Babel repository divides transformation plug-ins into two types (just naming differences) :
- @babel/plugin-transform-*: A common transformation plug-in
- @babel/plugin-proposal-*: Still in the ‘proposal stage ‘(informal) language features currently availablethese
Predefined collection (@babel/ Presets -*) : A collection or group of plug-ins, which is mainly for users to manage and use plug-ins. For example, preset-env contains all the latest standard features; Another example is preset-react, which includes all of the react related plug-ins.

4️ plug-in development assistance

Babel /template: Some scenarios are too cumbersome to manipulate the AST directly, like we did with the DOM, so Babel implements a simple template engine that converts string code to an AST. This library is used to generate some helper code, for example
Babel /types: AST node constructors and assertions. Plug-ins are used frequently during development
Babel/Helper -* : Some helper for assisting plug-in development, such as simplifying AST operations
Babel/Helper: Helper code. Simple syntax conversions may not make the code work. For example, if the class keyword is not recognized by older browsers, you need to add helper code to simulate the class.

5 ️ ⃣ tools

@babel/node: node.js CLI, which directly runs JavaScript files that require Babel processing
@babel/register: Require method of Patch NodeJs, support to import JavaScript modules that need Babel processing
@babel/ CLI: CLI tool

Visitor pattern

The converter traverses the AST tree, finds the node type of interest, and performs the conversion operation. This process is similar to what we would do with a DOM tree, but for a different purpose. The VISITOR pattern is typically used for AST traversal and transformation.

Imagine if Babel had so many plugins, each of them traversed the AST by itself, performing different operations on different nodes and maintaining its own state. Not only is this inefficient, but their logic is scattered all over the place, making the system difficult to understand and debug, resulting in tangled relationships between plug-ins.

Therefore, the converter operation AST generally uses the Visitor pattern, by which the Visitor (Visitor) to (1) perform a unified traversal operation, (2) provide the operation method of the node, (3) maintain the relationship between the nodes in a responsive manner; A plug-in (called a “specific visitor” in design patterns) simply defines the type of node it is interested in, and calls the plug-in’s visit method when the visitor visits that node.

Traversal of nodes

Suppose our code looks like this:

function hello(v) {
  console.log('hello' + v + '! ')}Copy the code

The AST structure after parsing is as follows:

File Program (program) FunctionDeclaration (body) Identifier (id) #hello Identifier (params[0]) #v BlockStatement (body)  ExpressionStatement ([0]) CallExpression (expression) MemberExpression (callee) #console.log Identifier (object) #console Identifier (property) #log BinaryExpression (arguments[0]) BinaryExpression (left) StringLiteral (left) #'hello' Identifier (right) #v StringLiteral (right) #'! 'Copy the code

The visitor traverses the AST in a depth-first, or recursive, order, as shown in the figure below:

In the figure above, the green line indicates entering the node, and the red line indicates leaving the node. Let’s write a super simple ‘concrete visitor’ to restore the above traversal:

const babel = require('@babel/core')
const traverse = require('@babel/traverse').default

const ast = babel.parseSync(code)

let depth = 0
traverse(ast, {
  enter(path) {
    console.log(`enter ${path.type}(${path.key}) `)
    depth++
  },
  exit(path) {
    depth--
    console.log(`  exit ${path.type}(${path.key}) `)}})Copy the code

View the results of code execution

 enter Program(program)
   enter FunctionDeclaration(0)
     enter Identifier(id)
     exit Identifier(id)
     enter Identifier(0)
     exit Identifier(0)
     enter BlockStatement(body)
       enter ExpressionStatement(0)
         enter CallExpression(expression)
           enter MemberExpression(callee)
             enter Identifier(object)
             exit Identifier(object)
             enter Identifier(property)
             exit Identifier(property)
           exit MemberExpression(callee)
           enter BinaryExpression(0)
             enter BinaryExpression(left)
               enter StringLiteral(left)
               exit StringLiteral(left)
               enter Identifier(right)
               exit Identifier(right)
             exit BinaryExpression(left)
             enter StringLiteral(right)
             exit StringLiteral(right)
           exit BinaryExpression(0)
         exit CallExpression(expression)
       exit ExpressionStatement(0)
     exit BlockStatement(body)
   exit FunctionDeclaration(0)
 exit Program(program)
Copy the code

The Enter method is called when a visitor enters a node, and the exit method is called when the visitor leaves the node. In general, plugins do not use the Enter method directly and only care about a few node types, so specific visitors can also declare access methods like this:

traverse(ast, {
  // Access identifier
  Identifier(path) {
    console.log(`enter Identifier`)},// Access the calling expression
  CallExpression(path) {
    console.log(`enter CallExpression`)},// This is short for Enter, and can also be used if you want to handle exit
  // Binary operator
  BinaryExpression: {
    enter(path) {},
    exit(path) {},
  },
  // More advanced, use the same method to access multiple types of nodes
  "ExportNamedDeclaration|Flow"(path) {}
})
Copy the code

So how does the Babel plugin work?

Babel applies access methods in the order that the plugins define them. For example, if you register multiple plugins, the data structure babel-core passes to the accessor looks something like this:

{
  Identifier: {
    enter: [plugin-xx, plugin-yy,] // Array}}Copy the code

When entering a node, these plug-ins are executed in the order in which they were registered. Most plugins do not require the developer to care about the order in which they are defined. There are a few cases that require a little attention, such as plugin-proposal-decorators:

{
  "plugins": [
    "@babel/plugin-proposal-decorators".// This must precede plugin-proposal-class-properties
    "@babel/plugin-proposal-class-properties"]}Copy the code

The order in which all plug-ins are defined, by convention, is that the new or experimental plug-ins are defined first, and the older plug-ins are defined later. This is because you may need a new plug-in to convert the AST for the old plug-in to recognize the syntax (backward compatibility). The following is an official configuration example. To ensure compatibility, the plugins in stage-* are executed first:

{
  "presets": ["es2015"."react"."stage-2"]}Copy the code

Note that Preset is in reverse order, see the official documentation

Context of a node

When a visitor visits a node, he calls the Enter method indiscriminately. How do we know where the node is and how it relates to other nodes?

Each visit method receives a Path object. You can think of it as a ‘context’ object, similar to the JQuery(const $el = $(‘.el’)) object, which contains a lot of information:

Current Node Information
Node association information. Parent, child, sibling, and so on
Scope information
Context information
Node operations. Nodes are added, deleted, checked, and modified
Assertion methods. isXXX, assertXXX

Here is its main structure:

export class NodePath<T = Node> {
    constructor(hub: Hub, parent: Node); parent: Node; hub: Hub; contexts: TraversalContext[]; data: object; shouldSkip: boolean; shouldStop: boolean; removed: boolean; state: any; opts: object; skipKeys: object; parentPath: NodePath; context: TraversalContext; container: object | object[]; listKey: string; // If the nodes are in an array, this is the key of the array. InList: Boolean; parentKey: string; key: string | number; // Key or index_node: T; // 🔴 Current node scope: scope; / / 🔴 current node in the scope of the type: T extends undefined | null? string | null : string; // 🔴 Node type typeAnnotation: object; / /... There are many ways to add, delete, check, and changeCopy the code

You can use this manual to learn how to convert an AST from a Path. There are also code examples later, but I won’t go into the details here

Treatment of side effects

In fact, the visitor’s job is more complicated than we thought. The above example is a static AST traversal. The AST transformation itself has side effects, such as the plug-in replacing the old node, so the visitor does not have to go down to the old node, but continues to visit the new node, as follows.

traverse(ast, {
  ExpressionStatement(path) {
    Log ('hello' + v + '! ') replace 'return' hello '+ v'
    const rtn = t.returnStatement(t.binaryExpression('+', t.stringLiteral('hello'), t.identifier('v')))
    path.replaceWith(rtn)
  },
}
Copy the code

Log (‘hello’ + v + ‘! ‘) return “hello” + v; , the following is the traversal process:

We can perform any operation on the AST, such as deleting the sibling of the parent node, deleting the first child node, adding a sibling node… When these operations’ pollute ‘the AST tree, the visitor needs to record these states and update the Path objects’ associations in a Reactive manner to ensure the correct order of traversal and hence the correct translation result.

Scoped processing

The visitor can ensure that the nodes are traversed and modified correctly, but another tricky part for the converter is the scope, which falls on the plug-in developer. Plug-in developers must be very careful with the scope so as not to break the execution logic of existing code.

const a = 1, b = 2
function add(foo, bar) {
  console.log(a, b)
  return foo + bar
}
Copy the code

For example, if you want to change the identifier of foo as the first argument to add to a, you need to recursively traverse the subtree, find all references to the identifier of foo, and then replace it:

traverse(ast, {
  // Convert the first argument name to a
  FunctionDeclaration(path) {
    const firstParams = path.get('params.0')
    if (firstParams == null) {
      return
    }

    const name = firstParams.node.name
    // Recursive traversal. This is a common pattern for plug-ins. This avoids affecting the outer scope
    path.traverse({
      Identifier(path) {
        if (path.node.name === name) {
          path.replaceWith(t.identifier('a'))}}})})console.log(generate(ast).code)
// function add(a, bar) {
// console.log(a, b);
// return a + bar;
// }
Copy the code

🤯 Wait a minute, it’s not that easy, replace it with a, and the behavior of console.log(a, b) is broken. So you can’t use a here, you have to use a different identifier, like C.

This is the scope problem that the converter needs to consider. The premise of AST transformation is to ensure the correctness of the program. When we add and modify references, we need to make sure that we don’t conflict with any existing references. Babel itself cannot detect such exceptions, leaving it up to the plug-in developer to handle them with care.

Javascript uses lexical scope, that is, the scope is determined by the lexical structure of the source code:

In a lexical block, identifiers created as a result of new variables, functions, classes, function parameters, etc., belong to this block scope. These identifiers are also called bindings, and the use of these bindings is called references

In Babel, Scope is represented by Scope objects. We can get the scope object of the current node from the scope field of the Path object. Its structure is as follows:

{
  path: NodePath;
  block: Node;         // The lexical block node, such as function node, conditional statement node
  parentBlock: Node;   // Parent lexical block node
  parent: Scope;       ⚛️ points to the parent scope
  bindings: { [name: string]: Binding; }; // ⚛️ All bindings below the scope (that is, identifiers created by the scope)
}
Copy the code

The Scope object is similar to the Path object in that it contains the relationships between scopes (parent refers to the parent), collects all bindings under the Scope, and provides rich methods for scope-only operations.

We can obtain all bindings (identifiers) in the current scope through the Bindings attribute, and each Binding is represented by the Binding class:

export class Binding {
  identifier: t.Identifier;
  scope: Scope;
  path: NodePath;
  kind: "var" | "let" | "const" | "module";
  referenced: boolean;
  references: number;              // The number of references
  referencePaths: NodePath[];      // ⚛️ Obtain the paths of all nodes where the identifier is applied
  constant: boolean;               // Whether it is constant
  constantViolations: NodePath[];
}
Copy the code

The Binding object allows us to determine if the identifier is referenced.

Ok, with Scope and Binding, you now have the ability to implement secure variable renaming transformations. To better illustrate scope interaction, let’s add a little more difficulty to the code above:

const a = 1, b = 2
function add(foo, bar) {
  console.log(a, b)
  return (a)= > {
    const a = '1' // A variable declaration has been added
    return a + (foo + bar)
  }
}
Copy the code

Now you want to rename the function parameter foo, not only to consider the external scope, but also to consider the binding of the lower scope, to make sure that the two do not conflict.

The above code scope and identifier references look like the figure below:

Go ahead, take the challenge and try renaming the first argument of the function to a shorter identifier:

// Used to get a unique identifier
const getUid = (a)= > {
  let uid = 0
  return (a)= > ` _${(uid++) || ' '}`
}

const ast = babel.parseSync(code)
traverse(ast, {
  FunctionDeclaration(path) {
    // Get the first argument
    const firstParam = path.get('params.0')
    if (firstParam == null) {
      return
    }

    const currentName = firstParam.node.name
    const currentBinding = path.scope.getBinding(currentName)
    const gid = getUid()
    let sname

    // Loop to find out which variable names are not occupied
    while(true) {
      sname = gid()

      // 1️ look at whether the variable has been defined by the parent scope first
      if (path.scope.parentHasBinding(sname)) {
        continue
      }

      // 2️ Checks whether variables are defined on the current scope
      if (path.scope.hasOwnBinding(sname)) {
        / / has been occupied
        continue
      }

      // Check the current reference of the first argument,
      // If it is in a scope that defines variables of the same name, we have to give up
      if (currentBinding.references > 0) {
        let findIt = false
        for (const refNode of currentBinding.referencePaths) {
          if(refNode.scope ! == path.scope && refNode.scope.hasBinding(sname)) { findIt =true
            break}}if (findIt) {
          continue}}break
    }

    // Start the substitution
    const i = t.identifier(sname)
    currentBinding.referencePaths.forEach(p= > p.replaceWith(i))
    firstParam.replaceWith(i)
  },
})

console.log(generate(ast).code)
// const a = 1,
// b = 2;

// function add(_, bar) {
// console.log(a, b);
// return () => {
// const a = '1'; // A variable declaration has been added

// return a + (_ + bar);
/ /};
// }
Copy the code

The above example, while not useful and buggy (label is not considered), illustrates the complexity of scoping.

Babel’s Scope object actually provides a generateUid method to generate unique, non-conflicting identifiers. Let’s simplify our code again using this method:

traverse(ast, {
  FunctionDeclaration(path) {
    const firstParam = path.get('params.0')
    if (firstParam == null) {
      return
    }
    let i = path.scope.generateUidIdentifier('_') // You can also use generateUid
    const currentBinding = path.scope.getBinding(firstParam.node.name)
    currentBinding.referencePaths.forEach(p= > p.replaceWith(i))
    firstParam.replaceWith(i)
  },
})
Copy the code

Could you make it shorter?

traverse(ast, {
  FunctionDeclaration(path) {
    const firstParam = path.get('params.0')
    if (firstParam == null) {
      return
    }
    let i = path.scope.generateUid('_') // You can also use generateUid
    path.scope.rename(firstParam.node.name, i)
  },
})
Copy the code

View the implementation code of generateUid

generateUid(name: string = "temp") {
  name = t
    .toIdentifier(name)
    .replace(/ ^ _ + /."")
    .replace(/[0-9]+$/g."");

  let uid;
  let i = 0;
  do {
    uid = this._generateUid(name, i);
    i++;
  } while (
    this.hasLabel(uid) ||
    this.hasBinding(uid) ||
    this.hasGlobal(uid) ||
    this.hasReference(uid)
  );

  const program = this.getProgramParent();
  program.references[uid] = true;
  program.uids[uid] = true;

  return uid;
}
Copy the code

Pretty neat, huh? The most typical scenario for scoping is code compression, which compresses variable names, function names, etc. In practice, however, very few plug-in scenarios require complex interactions with scopes, so I’ll stop there.

Get a plugin

Wait, don’t go. It’s not over. It’s only two thirds. Learned the above knowledge, must write a toy plug-in try water?

Now I’m going to write a minimalist plug-in, mimicking Babel-plugin-import, to implement the import of modules on demand. In this plugin, we will import statements like this:

import {A, B, C as D} from 'foo'
Copy the code

To:

import A from 'foo/A'
import 'foo/A/style.css'
import B from 'foo/B'
import 'foo/B/style.css'
import D from 'foo/C'
import 'foo/C/style.css'
Copy the code

First take a look at the AST node structure of the import statement using the AST Explorer:

With the results shown above, we need to handle the ImportDeclaration node type and take its Specifiers out and walk through it. In addition, if the user uses the default import statement, we will throw an error to remind the user that the default import cannot be used.

The basic implementation is as follows:

// The module to identify
const MODULE = 'foo'
traverse(ast, {
  // Access the import statement
  ImportDeclaration(path) {
    if(path.node.source.value ! == MODULE) {return
    }

    // If the import is empty, delete it directly
    const specs = path.node.specifiers
    if (specs.length === 0) {
      path.remove()
      return
    }

    // Determine whether the default import and namespace import are included
    if (specs.some(i= > t.isImportDefaultSpecifier(i) || t.isImportNamespaceSpecifier(i))) {
      // Throw an error, and Babel will show you the wrong code frame
      throw path.buildCodeFrameError("Cannot use default import or namespace import")}// Convert named import
    const imports = []
    for (const spec of specs) {
      const named = MODULE + '/' + spec.imported.name
      const local = spec.local
      imports.push(t.importDeclaration([t.importDefaultSpecifier(local)], t.stringLiteral(named)))
      imports.push(t.importDeclaration([], t.stringLiteral(`${named}/style.css`)))}// Replace the original import statement
    path.replaceWithMultiple(imports)
  }
})
Copy the code

The logic is fairly simple; babel-plugin-import is much more complicated than that.

Next, we’ll wrap it up as a standard Babel plug-in. According to the specification, we need to create a package name prefixed with babel-plugin-* :

mkdir babel-plugin-toy-import
cd babel-plugin-toy-import
yarn init -y
touch index.js
Copy the code

You can also generate project templates using the Generator-babel-plugin.

Fill in our code in the index.js file. By default, index.js exports a function with the following structure:

// Accept a Babel-core object
export default function(babel) {
  const {types: t} = babel
  return {
    pre(state) {
      // An optional pre operation can be used to prepare some resources
    },
    visitor: {
      // Our visitor code will be put here
      ImportDeclaration(path, state) {
        // ...
      }
    },
    post(state) {
      // Post operation is optional}}}Copy the code

We can get the parameter passed in by the user from the second parameter state of the accessor method. Assume that the user is configured as:

{
  plugins: [['toy-plugin', {name: 'foo'}}]]Copy the code

We can get the parameter passed by the user like this:

export default function(babel) {
  const {types: t} = babel
  return {
    visitor: {
      ImportDeclaration(path, state) {
        const mod = state.opts && state.opts.name
        if (mod == null) {
          return
        }
        // ...}}}},Copy the code

Finished work 🙏, release!

yarn publish # good luck
Copy the code

The last

The door to a new world has been opened: ⛩

This article mainly introduces the architecture and principles of Babel, but also the practice of Babel plug-in development, read here, you are in Babel door.

Next you can read the Babel manual, it is the best tutorial so far, ASTExplorer is the best practice site, write more code and think more. You can also check out the official Babel plugin implementation to take things to the next level.

There is a next article in this article, where I will introduce Babel-plugin-Macros. Stay tuned!

Like is the best encouragement for me.

extension

ASTExplorer
babel-handbook
generator-babel-plugin
the-super-tiny-compiler