An overview of the

Babel can compile ECMAScript2015+ syntax into ES5 syntax, such as:

const square = n= > n * n;
Copy the code

convert

"use strict";

var square = function square(n) {
  return n * n;
};
Copy the code

You can try this at Babel Repl.

How does Babel do it

The natural idea is that js code is really just a long string, and Babel is replacing one string with another, so writing a string substitution program to replace that string seems to do the job.

But if you start to write, you’ll find you can’t start. First of all, the substitution rule must be very complicated, and the way the regular substitution is done makes the whole regular expression very complicated. Secondly, there are some complex syntactic sugar in ES6, such as class. How to implement it? Simple substitution is not easy to do.

We may have to do this with some data structure (AST).

Babel uses three run-time phases: parse, transform, and generate.

Clone the Babel repository to the local directory and build it in the Babel root directory.

yarn install
npm run build
Copy the code

Create a new file named test.js

const { parse } = require("./packages/babel-parser");
const traverse = require("./packages/babel-traverse").default;
const generate = require("./packages/babel-generator").default;

const code = "const square = n => n * n";

// parse the code -> ast
const ast = parse(code);

// transform the ast
traverse(ast, {
  enter(path) {
    if(path.node.type ! = ="ArrowFunctionExpression") return;

    path.arrowFunctionToExpression({
      allowInsertArrow: false.noNewArrows: true.specCompliant: false}); }});// generate code <- ast
const output = generate(ast, code);
console.log(output.code);
// const square = function (n) {
// return n * n;
/ /};
Copy the code

Run this code and you can see that the converted code is printed on the console. Parse, traverse and generate correspond to three stages of the conversion process respectively. Now I will try to explain these three specific processes.

code -> AST -> transformed AST -> transformed code
Copy the code

parsing

The parsing stage is divided into two steps, lexical analysis and syntax analysis. Finally, a JS file is parsed into an abstract syntax tree (AST).

The first term comes up, what is AST? If you’ve read the DEFINITIVE GUIDE to VS CODE, you probably remember this term. AST contains all the necessary information to analyze a piece of CODE (keywords, variable names, variable values, etc.) and weed out useless information (punctuation, comments, etc.), seeing is believing. Start by generating an AST using Babel’s parser. This part of the code is in Babel/Packages/Babel-Parser.

We print out the AST in test.js

console.log(ast)
Copy the code

You can see that the Babel parser outputs the AST. It is very long, we only have one line of function declaration, and the AST corresponding to it is almost 200 lines. Let’s take a look at this simplified AST without some code location information.

{
    "type": "File"."program": {
        "type": "Program"."sourceType": "script"."body": [{"type": "VariableDeclaration"."declarations": [{"type": "VariableDeclarator"."id": {
                            "type": "Identifier"."name": "square"
                        },
                        "init": {
                            "type": "ArrowFunctionExpression"."id": null."generator": false."async": false."params": [{"type": "Identifier"."name": "n"}]."body": {
                                "type": "BinaryExpression"."left": {
                                    "type": "Identifier"."name": "n"
                                },
                                "operator": "*"."right": {
                                    "type": "Identifier"."name": "n"}}}}],"kind": "const"}}}]Copy the code

Each code block has a type field that identifies the type of the code block, such as Program Program, VariableDeclaration VariableDeclaration, ArrowFunctionExpression arrow function, BinaryExpression binomial, etc. The structure of each code block is different. For example, BinaryExpression contains left, right, and operator, representing n * N, and ArrowFunctionExpression contains params, body, representing (… Params) = > body.

So now we know what the AST is, right

  • N fork tree
  • Each node contains at least two kinds of information,typeNode type, and the information needed to describe that type (this is enough to regenerate the code later in the AST-> Code phase).

I recently the work is to develop a low code platform, see this structure is very kind, with our platform configuration items at the bottom of the data structure is like, don’t understand the low code platform may have a look here, in addition, I had a sudden thought of browser parsing HTML documents, CSS files will be resolved as a DOM tree and CSSOM tree, These two trees could also be AST.

Any language can be parsed into an AST. An AST is an intermediate product of parsing and compiling in various languages. How is it generated? Note: The following process is theoretical and not exactly the same as the concrete implementation of Babel

Two steps: lexical analysis, grammatical analysis

Lexical analysis, as its name implies, is the parsing of the original meanings of words. First, scanner will scan the codes and divide them into lexemes one by one, such as words and punctuation. const square = n => n * n; Will be partitioned into [const, squara, =, n, = >, n, *, n]. It doesn’t matter what language the code is written in. The tokenizer then interprets the lexemes, such that const is identified as a keyword, and the =, > symbols are identified as arrows, depending on the language used. Const is identified as a keyword in JS, but not necessarily in other languages. And finally the tokens are

[{"type": "Keyword"."value": "const"
    },
    {
        "type": "Identifier"."value": "square"
    },
    {
        "type": "Punctuator"."value": "="
    },
    {
        "type": "Identifier"."value": "n"
    },
    {
        "type": "Punctuator"."value": "= >"
    },
    {
        "type": "Identifier"."value": "n"
    },
    {
        "type": "Punctuator"."value": "*"
    },
    {
        "type": "Identifier"."value": "n"
    },
    {
        "type": "Punctuator"."value": ";"}]Copy the code

Once we have tokens, we can do syntax analysis. The Parser converts tokens into a parse tree, also known as CST concrete Syntax tree.If we look at this CST carefully, we can see a lot of useless information. For example, there are many nodes with only one child node. This kind of node can be compressed and removed completely, because it does not provide us with additional useful information.If you look at the tree after compression, you can see that some punctuation marks and operators can be represented by the structure of the n-tree itself, so let’s simplify it a little bit more.We ended up with the structure we wanted, a very abstract (compared to CST), simplified AST.

You can go to this website and type in any code to see tokens and AST.

Once we have the AST, we can operate on it to transform it into the structure we want for our code. This part is called transformation…

References

Parsing Up One’s Game With ASTs

Babel under the hood

Beginner’s Guide to Babel plug-in development

Step-by-step guide for writing a custom babel transformation

Babel User manual