preface

At the weekend, WHEN I was happily writing a small book, I accidentally knocked over my drink and spilled some on the keyboard. Although I cleaned it up quickly, the computer was suddenly shut down. I tried to restart it, but it didn’t work, and I finally confirmed that it was broken.

It’s not the computer that I’m most worried about, it’s the fact that I promised a lot of readers that I’d put the book online next week, so I can’t do it anymore, but now I have to because the code is all on that computer.

I was thinking about how to make up for it. I remembered that a lot of people were looking forward to the final case of “Easy To Write Babel”. I happened to be writing that case recently, so I thought I might share my thoughts in advance. Some kind of compensation (also announced in a few days).

The overall train of thought

Babel compilation process

We know that Babel’s main compilation processes are Parse, Transform, and generate.

  • Parse is about turning source code into an AST
  • Transform is to add or delete the AST
  • Generate prints the AST into object code and generates the Sourcemap

Packages built into Babel7

Babel 7 puts implementations of these features in different packages:

  • @babel/parserParse the source code into an AST, which corresponds to the Parse phase
  • @babel/traverseTraverse the AST and call the Visitor function, corresponding to the Transform phase
  • @babel/generatePrint the AST, generate the object code and sorucemap, corresponding to the Generate phase

Where, the AST needs to be created during traversal, which is used:

  • @babel/typesCreate and determine the AST
  • @babel/templateCreate the AST in batches based on modules

The above is the function of each stage, and the entrance to Babel’s overall function is at:

  • @babel/coreParse configuration, application Plugin, preset, whole compilation process

There are some common functions between plugins. These are in:

  • @babel/helpersUsed to transform the TEMPLATES created AST required by es Next code, such as _typeof, _defineProperties, and so on
  • @babel/helper-xxxCommon functions that other plug-ins share to manipulate the AST

Of course, in addition to compile-time conversions, there are also run-time public functions, which are placed in:

  • @babel/runtimeIt mainly includes corejs, helpers and Regenerator.
    • Helper: Runtime version of the helper function (not injected via AST, but runtime imported code)
    • Corejs: Implementation of es Next’s API, Corejs 2 only supports static methods, and Corejs 3 also supports instance methods
    • Regenerator: Implementation of async await, maintained by Facebook

(Babel does its own syntax transformation as a helper, but it does its own polyfill using third-party corejs and regenerators.)

Which packages are we going to implement

These are some of the packages that are built into Babel completion. If we were to write a simple Babel, we would have to implement these packages, but we could simplify a bit.

  • Parser packageYes. Babel Parser is based on Acorn fork, and we’re extending it a little bit. Complete the conversion from source to AST.
  • Traverse by packageIt’s traversing the AST, and you need to know which keys are traversed by the different types of AST, which is defined in the @babel/types package, and we do it in a similar way, and we call the corresponding visitor, Some apis that implement Path and path.scope are then passed in.
  • Generate packagesIs to print the AST into object code and generate sourcemap. The sourcemap is generated using the source-map package, and each mapping is generated using the loC recorded during parse and the location calculated during print.
  • Types packageFor creating an AST, the API for creating and judging the various AST’s is maintained, and the attributes that each AST needs to traverse are provided for the traverse process
  • The template packageHere we implement a simple version that passes a string and parses it back as an AST.
  • The core packageIt is a cascade of the entire process, supporting plugins and presets, invoking plug-ins, merging into final visitors, and then traverses.
  • Helper packageWe will also implement one, because plugins are supported, so there are some common functions that can be reused
  • The runtime packageWe also provide them, but just add some auxiliary functions for doing syntax conversions

This is the sort of thing that we might do, and it’s a fairly complete Babel to implement all of this. In doing so, we will learn more about Babel, about translators, and not just about Babel itself.

Here’s a detailed analysis of the specific ideas of each step:

Code implementation

(Because the code in the broken computer can not be taken out, plus this is not in the booklet, so will only provide ideas, when the booklet online will provide the full source code)

To keep things simple, we don’t do any subcontracting. We implement all the code in one package.

parser

Mainstream Parsers include Esprima, Acorn, etc. Acorn is the most popular, and The Babel Parser is a fork of Acorn with many changes. We don’t need to fork, just make some extensions based on acorn’s socket machine.

The AST that Acorn parsed, for example, had only Literal types, with no distinction between strings, numbers, or Boolean literals. Babel Parser refines them into StringLiteral, NumericLiteral, BooleanLiteral, etc.

Let’s implement a parser that does this extension to the AST.

Let’s start with the original Acorn Parser:

const acorn = require("acorn");

const Parser = acorn.Parser;

const ast = Parser.parse(` const a = 1; `);
console.log(JSON.stringify(ast, null.2));
Copy the code

Print as follows:

You can see that the numeric Literal parse results in Literal, so judging the type requires looking at the type of the value to determine what Literal it is, which is cumbersome. That’s why Babel refined them.

Let’s also elaborate:

Acorn extends by inheriting + overriding, inheriting the previous Parser, overriding some methods, and returning a new Parser.

const acorn = require("acorn");

const Parser = acorn.Parser;

var literalExtend = function(Parser) {
  return class extends Parser { parseLiteral (... args) {const node = super.parseLiteral(... args);switch(typeof node.value) {
            case 'number':
                node.type = 'NumericLiteral';
                break;
            case 'string':
                node.type = 'StringLiteral';
                break;
        }
        returnnode; }}}const newParser = Parser.extend(literalExtend);

const ast = newParser.parse(` const a = 1; `);
console.log(JSON.stringify(ast, null.2));

Copy the code

When parse, we determine the type of the literal and set the type.

Try the effect:

In this way, we have implemented an extension similar to Babel Parser for Acorn.

Of course, there are many extensions to Babel Parser, but here we simply implement them to clarify our thinking.

traverse

Traversing the AST is a depth-first search process, and we need to know how to continue traversing the child AST nodes when we get to the specific AST node.

The Babel Types package defines how the different AST is traversed (visitor), created (Builder), validated (fidelds.validate), and alias (alias).

Here we also need to maintain data on how each AST traverses:

const AST_DEFINATIONS_MAP = new Map(a); AST_DEFINATIONS_MAP.set('Program', {
    visitor: ['body']}); AST_DEFINATIONS_MAP.set('VariableDeclaration', {
    visitor: ['declarations']}); AST_DEFINATIONS_MAP.set('VariableDeclarator', {
    visitor: ['id'.'init']}); AST_DEFINATIONS_MAP.set('Identifier'{}); AST_DEFINATIONS_MAP.set('NumericLiteral'{});Copy the code

The AST is then depth-first traversed based on this data:

function traverse(node) {
    const defination = astDefinationsMap.get(node.type);

    console.log(node.type);

    if (defination.visitor) {
        defination.visitor.forEach(key= > {
            const prop = node[key];
            if (Array.isArray(prop)) { // If the property is an array
                prop.forEach(childNode= >{ traverse(childNode); })}else{ traverse(prop); }}}})Copy the code

The print result is as follows:

Compared to the AST structure just now, it does implement a depth-first traversal.

visitor

After traversal, we shall realize the function of visitors and add, delete and modify AST during traversal. This is how to call that visitor function based on node.type during the traversal:

function traverse(node, visitors) {
    const defination = astDefinationsMap.get(node.type);

    const visitorFunc = visitors[node.type];

    if(visitorFunc && typeof visitorFunc === 'function') {
        visitorFunc(node);
    }


    if (defination.visitor) {
        defination.visitor.forEach(key= > {
            const prop = node[key];
            if (Array.isArray(prop)) { // If the property is an array
                prop.forEach(childNode= >{ traverse(childNode, visitors); })}else{ traverse(prop, visitors); }}}})Copy the code

Let’s try it out:

traverse(ast, {
    Identifier(node) {
        node.name = 'b'; }});Copy the code

When I look at the AST again, I find that the Identifier name has changed from A to B

Babel’s visitor also supports specifying enter and exit to select calls before and after traversing the child node, and if a function is passed, it is treated as an Enter:

function traverse(node, visitors) {
    const defination = astDefinationsMap.get(node.type);

    let visitorFuncs = visitors[node.type] || {};

    if(typeof visitorFuncs === 'function') {
        visitorFuncs = {
            enter: visitorFuncs
        }
    }

    visitorFuncs.enter && visitorFuncs.enter(node);

    if (defination.visitor) {
        defination.visitor.forEach(key= > {
            const prop = node[key];
            if (Array.isArray(prop)) { // If the property is an array
                prop.forEach(childNode= >{ traverse(childNode, visitors); })}else {
                traverse(prop, visitors);
            }
        })
    }
    visitorFuncs.exit && visitorFuncs.exit(node);

}
Copy the code

Thus, our incoming visitor can also be written like this:

traverse(ast, {
    Identifier: {
        exit(node) {
            node.name = 'b'; }}});Copy the code

Is called after the child node has been traversed.

path

The visitor we implemented is the incoming node directly, but there is no information about the parent node in the AST, so we will pass the parent node in as well.

Babel provides the capability of path, which is a path from the current node to the root node, concatenated by parent.

We encapsulate a NodePath class:

class NodePath {
    constructor(node, parent, parentPath) {
        this.node = node;
        this.parent = parent;
        this.parentPath = parentPath; }}Copy the code

The path object is created when the visitor is called:

function traverse(node, visitors, parent, parentPath) {
    const defination = astDefinationsMap.get(node.type);

    let visitorFuncs = visitors[node.type] || {};

    if(typeof visitorFuncs === 'function') {
        visitorFuncs = {
            enter: visitorFuncs
        }
    }
    const path = new NodePath(node, parent, parentPath);

    visitorFuncs.enter && visitorFuncs.enter(path);

    if (defination.visitor) {
        defination.visitor.forEach(key= > {
            const prop = node[key];
            if (Array.isArray(prop)) { // If the property is an array
                prop.forEach(childNode= >{ traverse(childNode, visitors, node, path); })}else {
                traverse(prop, visitors, node, path);
            }
        })
    }
    visitorFuncs.exit && visitorFuncs.exit(path);
}
Copy the code

This way, we can get the parent node, the parent node of the parent node, in the visitor. Let’s try it:

traverse(ast, {
    Identifier: {
        exit(path) {
            path.node.name = 'b';
            let curPath = path;
            while (curPath) {
                console.log(curPath.node.type); curPath = curPath.parentPath; }}}});Copy the code

The print result is as follows:

The AST is available from the current node to the root node.

The path of the API

Parent can save, so can sibling, which means we can get all the AST through path. But working with the AST directly is a little tricky, so we’ll provide some apis to make it easier.

The first thing we need to do is to save the keys of the attributes of the AST we’re traversing and the list keys of the attributes we’re traversing if it’s an array.

class NodePath {
    constructor(node, parent, parentPath, key, listKey) {
        this.node = node;
        this.parent = parent;
        this.parentPath = parentPath;
        this.key = key;
        this.listKey = listKey; }}function traverse(node, visitors, parent, parentPath, key, listKey) {
    const defination = astDefinationsMap.get(node.type);

    let visitorFuncs = visitors[node.type] || {};

    if(typeof visitorFuncs === 'function') {
        visitorFuncs = {
            enter: visitorFuncs
        }
    }
    const path = new NodePath(node, parent, parentPath, key, listKey);

    visitorFuncs.enter && visitorFuncs.enter(path);

    if (defination.visitor) {
        defination.visitor.forEach(key= > {
            const prop = node[key];
            if (Array.isArray(prop)) { // If the property is an array
                prop.forEach((childNode, index) = >{ traverse(childNode, visitors, node, path, key, index); })}else {
                traverse(prop, visitors, node, path, key);
            }
        })
    }
    visitorFuncs.exit && visitorFuncs.exit(path);
}
Copy the code

Then we implement the replaceWith and Remove apis based on key and listKey:

class NodePath {
    constructor(node, parent, parentPath, key, listKey) {
        this.node = node;
        this.parent = parent;
        this.parentPath = parentPath;
        this.key = key;
        this.listKey = listKey;
    }
    replaceWith(node) {
        if (this.listKey) {
            this.parent[this.key].splice(this.listKey, 1, node);
        }
        this.parent[this.key] = node
    }
    remove () {
        if (this.listKey) {
            this.parent[this.key].splice(this.listKey, 1);
        }
        this.parent[this.key] = null; }}Copy the code

Effect under test:

traverse(ast, {
    NumericLiteral(path) {
        path.replaceWith({ type: 'Identifier'.name: 'bbbbbbb'}); }});Copy the code

The result is:

NumericLiteral is replaced with Identifier. We successfully implemented Path.replacewith.

path.scope

Path. scope is scoped information that records the binding of declared variables, their references, where they were modified (constantViolations), and the parent scope. Is an implementation of a static scope chain.

Implementation ideas:

First of all, functions, blocks and modules will generate scopes. When these AST are processed, a Scope object will be created, which has bindings attribute. A binding is created for each declaration (such as VariableDeclaration VariableDeclaration, function declaration FuncitonDeclaration, parameter, import, etc.)

References will be recorded when these scope binding are referenced through Identifier. If modified, the path corresponding to the AST of the modified statement will be recorded, such as the assignment statement.

You also need to provide a set of apis to simplify the analysis and operation of scopes, such as finding getBinding, removing removeBinding, renaming rename, and so on.

We won’t do the implementation here, but the complete implementation will be found in the Babel Plugin Playbook.

types

On traverse we implement the path.replaceWith API, which replaces the AST with a new AST. We pass literal objects directly in, which is a bit cumbersome. Babel provides the ability to create aN AST through the types package.

Creating the AST node is also a recursive process, and we need to make sure that every part is correct. We save the keys of the visitor while traversing, and we still create the AST corresponding to these keys at creation time, but we need to check the parameters passed in.

defineType("BinaryExpression", {
    builder: ["operator"."left"."right"].fields: {
      operator: {
        validate: assertOneOf(... BINARY_OPERATORS), },left: {
        validate: assertNodeType("Expression"),},right: {
        validate: assertNodeType("Expression"),}},visitor: ["left"."right"].aliases: ["Binary"."Expression"]});Copy the code

Babel internally defines the AST type creation logic with the defineType method, where the Fileds attribute contains what attributes the AST needs and how each attribute is checked. After the verification, the AST is created according to the corresponding parameters.

template

The Babel Template creates the AST in batches using strings, and we can implement a simple template based on Parser

function template(code) {
    return parse(code);
}
template.expression = function(code) {
    return template(code).body[0].expression;
}
Copy the code

The above code can be:

traverse(ast, {
    NumericLiteral(path) {     
        path.replaceWith(template.expression('bbb')); }});Copy the code

generate

The AST is generated, and the AST is printed as object code.

It’s just a concatenation of strings:

class Printer {
    constructor () {
        this.buf = ' ';
    }

    space() {
        this.buf += ' ';
    }

    nextLine() {
        this.buf += '\n';
    }

    Program (node) {
        node.body.forEach(item= > {
            this[item.type](item) + '; ';
            this.nextLine();
        });

    }
    VariableDeclaration(node) {
        this.buf += node.kind;
        this.space();
        node.declarations.forEach((declaration, index) = > {
            if(index ! =0) {
                this.buf += ', ';
            }
            this[declaration.type](declaration);
        });
        this.buf += '; ';
    }
    VariableDeclarator(node) {
        this[node.id.type](node.id);
        this.buf += '=';
        this[node.init.type](node.init);
    }
    Identifier(node) {
        this.buf += node.name;
    }
    NumericLiteral(node) {
        this.buf += node.value; }}class Generator extends Printer{

    generate(node) {
        this[node.type](node);
        return this.buf; }}function generate (node) {
    return new Generator().generate(node);
}
Copy the code

Let’s try it out:

const sourceCode = `
const a = 1,b=2,c=3;
const d=4,e=5;
`;

ast = parse(sourceCode);
traverse(ast, {
    NumericLiteral(path) {
        if (path.node.value === 2) {
            path.replaceWith(template.expression('aaaaa')); }}})console.log(generate(ast));
Copy the code

The print result is as follows:

const a=1,b=aaaaa,c=3;
const d=4,e=5;
Copy the code

We successfully implemented the generate method.

sourcemap

In addition to printing the object code, the generator generates the Sourcemap, which is an important function of the translator.

The sourcemap implementation is also simple:

After parse, the AST retains the location information (column and column numbers) in the source code. The new column and column numbers are computed when the object code is printed. With the new and old column numbers, sourcemap can be generated using the source-Map API.

var map = new SourceMapGenerator({
  file: "source-mapped.js"
});

map.addMapping({
  generated: {
    line: 10.column: 35
  },
  source: "foo.js".original: {
    line: 33.column: 2
  },
  name: "christopher"
});

console.log(map.toString());
// '{"version":3,"file":"source-mapped.js",
// "sources":["foo.js"],"names":["christopher"],"mappings":";;;;;;;;; mCAgCEA"}'
Copy the code

core

We have implemented the full flow function above, but usually we rarely use API, more use @babel/core of the whole flow, so we will implement the core package based on the above package, then support plugin and PRESET.

function transformSync(code, options) {
    const ast = parse(code);

    const pluginApi = {
        template
    }
    const visitors = {};
    options.plugins.forEach(([plugin, options]) = > {
        const res = plugin(pluginApi, options);
        Object.assign(visitors, res.visitor);
    })

    traverse(ast, visitors);
    return generate(ast);
}
Copy the code

The plugin supports passing in options, and the API and options are available in the plugin. The return value is a visitor function:

const sourceCode = `
const a = 1;
`;

const code = transformSync(sourceCode, {
    plugins: [[function plugin1(api, options) {
                return {
                    visitor: {
                        Identifier(path) {
                                // path.node.value = 2222;path.replaceWith(api.template.expression(options.replaceName)); }}}}, {replaceName: 'ddddd'}}]]);console.log(code);

Copy the code

The result is:

const ddddd=1;
Copy the code

At this point we have completed a simple version implementation of all Babel’s built-in features. Helper is a package that holds public functions, and Runtime is an API that is introduced at runtime. These two packages are simpler, but not implemented. This will be implemented in detail in the Babel Plugin Secret Book.)

conclusion

We combed through Babel’s compilation process and the functionality of the built-in packages, and then identified the packages we wanted to implement: Parser, Traverse, generate, types, Template, core. Next in turn to do the realization or comb the realization of ideas.

The Parser package is based on Acorn, Babel is a fork from Acorn, and we are modifying the AST directly based on the Acorn plug-in. We implemented an extension to Literal AST.

Traverse packets are responsible for traversing the AST. We implement a depth-first traversal of the AST by recording the visitor key and calling the visitor during the traversal, as well as supporting the enter and exit phases. The arguments passed in to the visitor support path, get the parent, and call apis such as replaceWith and Remove. We also combed through the ideas for implementing Scope.

Types and template are used to create aN AST. We have combed through the implementation of types, which is to recursively create an AST and then assemble it. We have implemented a simple template by parse directly from strings.

The generate package prints the modified AST into object code and generates the Sourcemap, which we do. The sourcemap thread is combed.

The core package is an integration of the entire build process and supports plugins and preset, we have implemented the transformSync API and support for plugin calls.

The above is Babel implementation idea, refinement is able to achieve a complete function of Babel.

This is the final example of the Babel Plugin tutorial. The implementation ideas and code in the tutorial will be clearer, and the source code will also be available.

The computer broke down this weekend, and the code may have been lost, so it has to be kept for a while. But a lot of people are looking forward to the release of this little book, I really don’t want to, so you are interested in “Easy to write Babel” implementation ideas, hope to help you better grasp Babel and similar compilers. (I’ll finish the book as soon as MY computer is fixed.)