What is an AST abstract syntax tree?

As I got further down the front-end road, I started to discover all sorts of awesome build tools: WebPack, Vite, ESLint, Babel… All of these awesome tools are powered by something called the AST. So what is AST? AST (Abstract Syntax Tree) is called Abstract Syntax Tree in Chinese.

What is an abstract syntax tree, one might ask? Why is it abstract?

Before answering, let’s take a look at the code organization

const a = 'hello world';
Copy the code

The code above is the JavaScript code we normally write. Observant eyes can see that the left and right sides of the code are concatenated using the = sign.

Take a look at the AST abstract syntax tree transformed into code

{
  "type": "Program",
  "start": 0,
  "end": 23,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 23,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 23,
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 7,
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "start": 10,
            "end": 23,
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "module"
}
Copy the code

You can see why this is abstract when you compare the difference before and after the transformation.

The original code is composed of various delimiters, but the converted code is not associated with delimiters, and is described by JSON, so it is an abstract syntax tree. There are no separators. That’s too abstract…

An AST abstract syntax tree is simply a multi-test site that parses strings into JSON and (cleanly) organizes them into JSON rather than parsing strings directly.

JavaScript conversion AST tool

For JavaScript, it is possible to convert JS code to an AST using JS Parser. The most common JS Parser options are as follows:

Esprima (Popular Library)
Babylon (used in Babel)
Acorn (used in Webpack)
Espree (derived from Acorn, used in ESLint)
Astexplorer (online generation tool with different JS Parser options for real-time viewing)

The examples in this article are implemented using Esprima.

How do I convert code to AST

There are two important stages in transforming code into an AST: Lexical Analysis and Syntax Analysis.

Lexical analysis

Also known as word segmentation, it is the process of converting strings of code into sequences of tokens. The token here is a string, the smallest unit of source code, similar to a word in English. Lexical analysis can also be understood as the process of combining English letters into words. Lexical analysis does not concern itself with the relationships between words. For example, during lexical analysis, parentheses can be marked as tokens, but matching parentheses is not verified.

Tokens in JavaScript mainly include the following types:

Keywords: var, let, const, etc
Identifier: A contiguous character that is not enclosed in quotes. It may be a variable, or keywords such as if or else, or built-in constants such as true or false
Operators: +, -, *, /, etc
Numbers: hexadecimal, decimal, octal and scientific expressions
String: value of a variable, etc
Space: successive Spaces, line feeds, indentation, etc
Comment: A line comment or a block comment is the smallest syntax unit that cannot be split
Punctuation: braces, parentheses, semicolons, colons, etc

Const a = ‘hello world’; const a = ‘hello world’;

[
    {
        "type": "Keyword",
        "value": "const"
    },
    {
        "type": "Identifier",
        "value": "a"
    },
    {
        "type": "Punctuator",
        "value": "="
    },
    {
        "type": "String",
        "value": "'hello world'"
    }
]
Copy the code

Syntax analysis

Also known as a parser, it is the process of converting tokens generated by lexical analysis into an AST according to a given formal grammar. This is the process of putting words together into sentences. The syntax is validated during the conversion process and syntax errors are thrown if there are any errors.

Const a = ‘hello world’; const a = ‘hello world’;

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "value": "hello world",
            "raw": "'hello world'"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}
Copy the code

After we get the AST, we can analyze the AST and do some things of our own based on it. As simple as replacing a variable in your code with another name.

practice

Let’s implement the substitution of variable A with variable B defined in the above code. To do this, we need to convert the source code into an AST, then do something on top of that, change the contents of the tree, and then convert the AST into object code. That is to go through the process of parsing -> transformation -> generation.

First we need to analyze how the AST generated by source code differs from the AST generated by object code. The AST generated by const b = ‘hello world’ follows:

{ "type": "Program", "body": [ { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": {" type ":" Identifier ", "name" : "b" / / different} here, "init" : {" type ":" Literal ", "value" : "hello world", "raw" : "'hello world'" } } ], "kind": "const" } ], "sourceType": "script" }Copy the code

Through comparative analysis, it is found that the only difference is the different value of the name attribute of the ID whose type is Identifier. We can then modify the AST to fulfill our requirements.

We need to install the packages Estraverse (iterating the AST) and EscodeGen (generating JS from the AST).

const esprima = require('esprima'); const estraverse = require('estraverse'); const escodegen = require('escodegen'); const program = "const a = 'hello world'"; const ASTree = esprima.parseScript(program); function changeAToB(node) { if (node.type === 'Identifier') { node.name = 'b'; } } estraverse.traverse(ASTree, { enter(node) { changeAToB(node); }}); const ASTreeAfterChange = escodegen.generate(ASTree); console.log(ASTreeAfterChange); // const b = 'hello world'Copy the code

See how easy it is to do that. With the knowledge of AST, we can do a lot of things, and the various Babel plug-ins are created in the same way, just with different libraries.

How to implement a Babel plug-in can refer to the official Babel plug-in manual

Reference

AST Abstract syntax tree
Abstract syntax tree AST
Transformation of mediocre front-end code farmers – AST
Manual of the Babel plug-in

What is an AST abstract syntax tree?

JavaScript conversion AST tool

How do I convert code to AST

Lexical analysis

Syntax analysis

practice

Reference

Related Posts

New content for Flutter 2.5

Some features about the new ES6 syntax

Design patterns — Single responsibility principle