What is an abstract syntax tree (Abstract Syntax Tree ,AST)?

Baidu Baike explains it this way:

In computer science, an Abstract Syntax Tree (AST), or Syntax Tree for short, is an Abstract representation of the syntactic structure of source code. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code.

It still sounds convoluted, but you can simply think of it as a tree-like, structured representation of the code you’re writing.

With this tree, we can manipulate this tree, precise positioning to statement, assignment statement, operation statement and so on, to achieve code analysis, optimization, change and other operations.

AST may be hard to cover in your daily business, you probably haven’t heard of it yet, but in many cases you already use it, you just don’t pay much attention to it. Popular plugins and packages such as WebPack, ESLint, etc

What can an abstract syntax tree do?

When it comes to the uses of AST, there are a lot of them. Here are a few:

  • IDEError prompt, code formatting, code highlighting, code completion, etc
  • JSLint,JSHintCode errors or style checks, etc
  • webpack,rollupCode packaging and so on
  • CoffeeScript,TypeScript,JSXAnd so on into the originalJavascript

If you want to write awesome frameworks like React and Vue, or build your own automated front-end packaging tools like WebPack and Rollup, then you must understand AST.

How do I generate an AST?

Before you know how to generate an AST, it is important to know about Parser (common parsers are Esprima, Traceur, Acorn, Shift, etc.)

The JS Parser is a Parser that converts JS source code into an abstract syntax tree (AST).

The whole parsing process is divided into the following two steps:

  • Word segmentation: Splits the entire code string into an array of minimal syntax units
  • Grammatical analysis: to establish and analyze the relationships between grammatical units based on word segmentation

What is a grammar unit?

A grammatical unit is the smallest unit of parsed grammar that has practical meaning and is simply understood as a word in a natural language.

Take, for example, the following:

2019 marks the 70th anniversary of our motherland

We can break this sentence down into the smallest units, namely: 2019, is, motherland, 70th anniversary.

This is what we call a participle, and it’s the smallest unit, because if we break it up again, it doesn’t really make any sense.

Syntax units in Javascript code include the following:

  • Keywords: for examplevar,let,constEtc.
  • Identifier: A contiguous character that is not enclosed in quotation marks, which may or may not be a variableif,elseThese keywords, or maybetrue,falseThese built-in constants
  • Operator:+,-,*,/
  • Numbers: syntax such as hexadecimal, decimal, octal, and scientific expressions
  • String: Because the contents of a string are evaluated or displayed by the computer
  • Space: successive Spaces, line feeds, indentation, etc
  • Comment: A line comment or a block comment is the smallest syntax unit that cannot be split
  • Others: braces, parentheses, semicolons, colons, etc

If we take the simplest copy statement as an example, right?

    var a = 1;
Copy the code

Through word segmentation, we can get the following results:

[{"type": "Keyword"."value": "var"
    },
    {
        "type": "Identifier"."value": "a"
    },
    {
        "type": "Punctuator"."value": "="
    },
    {
        "type": "Numeric"."value": "1"
    },
    {
        "type": "Punctuator"."value": ";"}]Copy the code

To keep things simple, I’m using esprima/ Parser

What is grammatical analysis?

We have obtained the results of word segmentation above, which need to carry out a three-dimensional combination of words, determine the relationship between words and determine the final expression meaning of words.

Simply put, parsing is a recursive process of identifying statements and expressions and determining their relationships.

Through the above grammatical analysis, we can get the following results:

{
    "type": "Program"."body": [{"type": "VariableDeclaration"."declarations": [{"type": "VariableDeclarator"."id": {
                        "type": "Identifier"."name": "a"
                    },
                    "init": {
                        "type": "Literal"."value": 1,
                        "raw": "1"}}]."kind": "var"}]."sourceType": "script"
}
Copy the code

This is the AST converted by var a = 1;

Astexplorer is a visual tool that can directly convert code to AST

How does AST work?

There is a lot of talk about what AST is and how it is generated, but at the end of the day, I still don’t know what AST is for and how to use it.

Ok ~ let’s witness the miracle together.

I’m sure most of you are familiar with the Babel library, which is a necessary part of the front-end modular development process, because it can help you convert ECMAScript 2015+ code into backward compatible JavaScript syntax. In order to be able to run on current and older browsers or other environments, you don’t have to worry about compatibility with the new syntax

In fact, much of Babel’s functionality is implemented by modifying the AST.

First, let’s look at a simple example of how we can convert an arrow function in ES6 to a normal function in ES5:

const sum=(a,b)=>a+b;
Copy the code

How can we convert the simple sum arrow function above to the following form:

const sum = function(a,b){
    return a+b;
}
Copy the code

Think about it. What’s the idea?

If you don’t know AST, this is a very difficult problem to start with. If you know AST, this is a very easy example.

Now let’s see how do we do that?

Install dependencies

To manipulate the AST code, we need to use two libraries, @babel/core and babel-types.

Where @babel/core is the core library of Babel, used to realize the core conversion engine, babel-types type judgment, used to generate AST parts

CD to a directory you like, after initialization with NPM -y init, install dependencies with NPM i@babel /core babel-types -d

Target analysis

Before we can do the transformation, we need to do some analysis. First, let’s use AstExplorer to see how the target code differs from our current code.

The AST structure of the source code is as follows:

// Source code AST {"type": "Program"."start": 0."end": 21."body": [{"type": "VariableDeclaration"."start": 0."end": 21."declarations": [{"type": "VariableDeclarator"."start": 6,
                    "end": 20."id": {
                        "type": "Identifier"."start": 6,
                        "end": 9,
                        "name": "sum"
                    },
                    "init": {
                        "type": "ArrowFunctionExpression"."start": 10,
                        "end": 20."id": null,
                        "expression": true."generator": false."async": false."params": [{"type": "Identifier"."start": 11."end": 12."name": "a"
                            },
                            {
                                "type": "Identifier"."start": 13."end": 14."name": "b"}]."body": {
                            "type": "BinaryExpression"."start": 17."end": 20."left": {
                                "type": "Identifier"."start": 17."end": 18."name": "a"
                            },
                            "operator": "+"."right": {
                                "type": "Identifier"."start": 19,
                                "end": 20."name": "b"}}}}],"kind": "const"}]."sourceType": "module"
}
Copy the code

The AST structure of the object code is as follows:

// Object code 'AST' {"type": "Program"."start": 0."end": 48."body": [{"type": "VariableDeclaration"."start": 0."end": 48."declarations": [{"type": "VariableDeclarator"."start": 6,
                    "end": 47."id": {
                        "type": "Identifier"."start": 6,
                        "end": 9,
                        "name": "sum"
                    },
                    "init": {
                        "type": "FunctionExpression"."start": 12."end": 47."id": null,
                        "expression": false."generator": false."async": false."params": [{"type": "Identifier"."start": 22."end": 23."name": "a"
                            },
                            {
                                "type": "Identifier"."start": 25."end": 26."name": "b"}]."body": {
                            "type": "BlockStatement"."start": 28."end": 47."body": [{"type": "ReturnStatement"."start": 32."end": 45,
                                    "argument": {
                                        "type": "BinaryExpression"."start": 39,
                                        "end": 44,
                                        "left": {
                                            "type": "Identifier"."start": 39,
                                            "end": 40."name": "a"
                                        },
                                        "operator": "+"."right": {
                                            "type": "Identifier"."start": 43."end": 44,
                                            "name": "b"}}}]}}}],"kind": "const"}]."sourceType": "module"
}
Copy the code

We don’t care about the start and end, they just mark where the code is.

We care about what’s inside the body, and the main difference is init, one is rowFunctionExpression, the other is FunctionExpression, We just need to change the contents under ArrowFunctionExpression to the same FunctionExpression.

A profound

We build an arrow.js file and import the above two libraries, i.e

Const Babel = require(const Babel = require('@babel/core'Const types = require(const types = require('babel-types') // Source code const code = 'const sum=(a,b)=>a+b; '// Object code const sum =function(a,b){ return a + b }
Copy the code

Here we need to use the transform method in Babel, which can transform JS code into AST. In the process, we can use plugins to transform AST, and finally generate new AST and JS code. A relatively appropriate figure on the Internet shows the whole process:

For details on how to use Babel Transform, there is not much discussion here. If you are interested, go to the official website for more information

Its main uses are as follows:

// Babel converts code to ast, iterates, and outputs codelet result = babel.transform(code,{
    plugins:[
    	{
            visitor
    	}
    ]
})

Copy the code

The core lies in the implementation of the plug-in visitor.

It is a plug-in object that can process nodes of a specific type. The node we need to process here is ArrowFunctionExpression, which can be configured in two common ways:

One is a single processing, structured as follows, where path represents the current path traversed. Path. node represents the node of the current variable

let visitor = {
    ArrowFunctionExpression(path){
    
    }
}
Copy the code

The other is used for input and output bidirectional processing, the structure is as follows, the parameter node represents the node currently traversed

let visitor = {
     ArrowFunctionExpression : {
    	enter(node){
    	    
    	},
     	leave(node){
     	    
     	}
    }
}
Copy the code

We’re only going to do it once, so we’re going to do it the first way.

By analyzing the AST of the object code, we find that we need a FunctionExpression, in which case we need babel-types, which helps us produce these nodes

The parameters required to build FunctionExpression are as follows:

By referring to the AST, we can see that its ID is null, params is params in the original ArrowFunctionExpression, and body is a blockStatement. We can also check the babel-types document. Use T.lock Statement(body, directives) and so on and so on and the result is as follows:

/ / the original paramsletparams = path.node.params; Create a blockStatementlet blockStatement = types.blockStatement([
    	types.returnStatement(types.binaryExpression(
            '+',
            types.identifier('a'),
            types.identifier('b')))); // Create a functionlet func = types.functionExpression(null, params, blockStatement, false.false);
Copy the code

Finally pass path.replacewith (func); Just replace it;

The completion code is as follows:

Const Babel = require(const Babel = require('@babel/core'Const types = require(const types = require('babel-types') // Source code const code = 'const sum=(a,b)=>a+b; '// Object code const sum =function(a,b){ returnA + b} // plug-in object that can handle nodes of a specific typeletVisitor = {// represents processing ArrowFunctionExpression node ArrowFunctionExpression(path){letparams = path.node.params; Create a blockStatementlet blockStatement = types.blockStatement([
            types.returnStatement(types.binaryExpression(
            	'+',
            	types.identifier('a'),
            	types.identifier('b')))); // Create a functionlet func = types.functionExpression(null, params, blockStatement, false.false); / / replace path. ReplaceWith (func); // Babel converts code to ast, iterates, and outputs codelet result = babel.transform(code,{
    plugins:[
        {
            visitor
        }
    ]
})

console.log(result.code);

Copy the code

Execute the code and print the following result:

At this point, our function conversion is complete, achieving the desired effect.

How about it? It’s amazing!!

In fact, it is not that complicated, mainly to analyze the structure of its AST, as long as we understand what we need to do, then go ahead and do it

Pass: the above code can be optimized, because our returnStatement is the same as the source code returnStatement, so we just need to reuse it. Therefore, the above code can also be changed to the following:

    let blockStatement = types.blockStatement([
        types.returnStatement(path.node.body)
    ]);
Copy the code

Strike while the iron is hot

Now that you’ve seen how to modify code using AST, try this problem again.

How to convert es6 class to ES5 function

The source code

Class Person {constructor(name) {this.name=name; }sayName() {
      returnthis.name; }}Copy the code

The target code

// Object codefunction Person(name) {
    this.name = name;
}

Person.prototype.getName = function () {
    return this.name;
};

Copy the code

With the above foundation, follow the cat’s path, here I will not repeat, the process is very important, here I only paste the core conversion code for reference.

ClassDeclaration (path) {
    letnode = path.node; // The current nodeletid = node.id; / / the node idletmethods = node.body.body; // Array of methodsletconstructorFunction = null; // constructorlet functions= []; ForEach (method => {// if it is a constructorif ( method.kind === 'constructor' ) {
    		constructorFunction = types.functionDeclaration(id, method.params, method.body, false.false);
    		functions.push(constructorFunction)
    	} else{// Common methodlet memberExpression = types.memberExpression(types.memberExpression(id, types.identifier('prototype'), false), method.key, false);
    		let functionExpression = types.functionExpression(null, method.params, method.body, false.false);
    		let assignmentExpression = types.assignmentExpression('=', memberExpression, functionExpression); functions.push(types.expressionStatement(assignmentExpression)); }}) // replaceWithMultiple for multiple substitutionsif(functions.length === 1){
    	path.replaceWith(functions[0])}else{
    	path.replaceWithMultiple(functions)}}Copy the code

conclusion

In our daily work, most of the time, we only focus on the JS code itself, rather than using AST to re-understand and interpret our own code ~

This article is only through some introduction to AST, it will serve as a primer for you to have a preliminary understanding of it, not feel so strange to it.

The quest for code is endless

If you want, you can build any JS code you want from it

Come on!