What is the AST

Abstract Syntax Tree (AST for short) is an Abstract representation of source code Syntax structure. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code.

What is AST useful for

AST is widely used, for example:

  • Editor error prompt, code formatting, code highlighting, code auto-completion;
  • elint,pretiierChecks for code errors or styles;
  • webpackthroughbabeltranslationjavascriptGrammar;

And if you want to understand how JS builds and executes, then you need to know AST.

How an AST is generated

The first step of JS execution is to read the character stream in the JS file, then generate tokens through lexical analysis, then generate AST through Parser, and finally generate machine code execution.

The whole parsing process is divided into the following two steps:

  • Word segmentation: Splits the entire code string into an array of minimal syntax units
  • Grammatical analysis: to establish and analyze the relationships between grammatical units based on word segmentation

JS Parser is a JS grammar Parser, which can convert JS source code into AST. Common Parser includes Esprima, Traceur, Acorn, Shift, etc.

Lexical analysis

Lexical analysis, also known as scanner, simply means calling the next() method to read characters letter by letter, and then comparing them with the defined JavaScript key characters to generate the corresponding Token. Token is an indivisible minimum unit:

For example, the three characters var can only be considered as a whole and cannot be decomposed semantically. Therefore, it is a Token.

In lexical analysis, each keyword is a Token, each identifier is a Token, each operator is a Token, and each punctuation mark is a Token. In addition, comments and whitespace characters (newlines, Spaces, tabs, and so on) are filtered out of the source program.

Eventually, the entire code is split into a list of tokens (or a one-dimensional array).

Syntax analysis

The syntax analysis will transform the tokens from lexical analysis into an abstract syntax tree structure with grammatical meaning. Also, validate the syntax and throw syntax errors if there are any.

So with that said let’s take a look at what happens when we convert a javaScript snippet into an AST and show you a simple line of code, okay

🌰 example 1

const fn = a => a;
Copy the code

From this AST syntax tree, we can clearly see the specific meaning of a code, and what syntax, methods, etc.

Const fn refers to an arrow function expression. Its argument is a and the function body is a.

🌰 example 2

const fn = a => {
    let i = 1;
  return a + i;
};
Copy the code

Let’s look at the body:

🌰 example 3

A function call

function test() {let a = 1;
  console.log(a)
}
Copy the code

Basically see MemberExpression

The above screenshots are resolved using Acorn. The reason for using Acorn is that as far as I know Acorn is recognized as the fastest parser. And one of the Webpack packaging tools we use for Babel is Acorn.

The properties in the screenshot above are part of the AST, a structure that contains many properties.

  • VariableDeclaration VariableDeclaration
  • Description of VariableDeclarator variable declarations
  • Expression Expression node

More properties:

  1. You can go to the AST Explorer and see the AST generated by parsing javascript code from different Parsers online.
  2. See all ESTree ESTree on Github
  3. Document Abstract syntax tree AST introduction to property introduction

Practical AST application

The title

Console. log AST (xx) is called with a function name in front of it so that the user can see which function is called when printing.

For example,

/ / the source code
function getData() {
  console.log("data")}// --------------------

// The converted code
function getData() {
  console.log("getData"."data");
}
Copy the code

introduce

Let’s start with the tool we need to use, Babel

  • @babel/parser : 将 js 代码 ——->>> ASTAbstract syntax tree;
  • @babel/traverseASTNodes are recursively traversed;
  • @babel/typesAbout specificASTNode to modify;
  • @babel/generator : ASTAbstract syntax tree ——->>> new JS code;

Why Babel? Mainly is relatively easy to use (only this is more familiar with 😭).

Babel/Parser uses Acorn to parse JS code into AST syntax trees.

Start coding

  1. Create a new file to open the package required for console installation
cnpm i @babel/parser @babel/traverse @babel/types @babel/generator -D
Copy the code
  1. Create the JS file and write the rough layout as follows using AST
const generator = require("@babel/generator");
const parser = require("@babel/parser");
const traverse = require("@babel/traverse");
const types = require("@babel/types");

function compile(code) {
  // 1. Parse code into an abstract syntax tree (AST)
  const ast = parser.parse(code);

  // 2,traverse switch code
  traverse.default(ast, {});

  // 3. Generator converts the AST back into code
  return generator.default(ast, {}, code);
}

const code = ` function getData() { console.log("data") } `;
const newCode = compile(code)
Copy the code

Using Node to run the result, since nothing is being processed and the output is the original code,

Improve compile method

function compile(code) {
  // 1.parse
  const ast = parser.parse(code);

  // 2,traverse
  const visitor = {
    CallExpression(path) {
      // Get callee data
      const { callee } = path.node;
      // Check if console.log is called
      // 1. Check whether it is a member expression node, as detailed in the screenshot above
      // 2. Check whether it is a console object
      // 3. Check whether the attribute of the object is log
      const isConsoleLog =
        types.isMemberExpression(callee) &&
        callee.object.name === "console" &&
        callee.property.name === "log";
      if (isConsoleLog) {
        // If it is a console.log call to find the last parent node is a function
        const funcPath = path.findParent(p= > {
          return p.isFunctionDeclaration();
        });
        // Take the function name
        const funcName = funcPath.node.id.name;
        // Place the name in front of function arguments with typespath.node.arguments.unshift(types.stringLiteral(funcName)); }}};// traverse switch code
  traverse.default(ast, visitor);

  // 3. Generator converts the AST back into code
  return generator.default(ast, {}, code);
}
Copy the code

It’s a little hard to understand in pure code. Here’s what I’m going to do: I’m going to write the above path.node into a file to show you the data format.

{
  "type": "CallExpression"."start": 24."end": 43."loc": {
    "start": { "line": 3."column": 2 },
    "end": { "line": 3."column": 21}},"callee": {
    "type": "MemberExpression"."start": 24."end": 35."loc": {
      "start": { "line": 3."column": 2 },
      "end": { "line": 3."column": 13}},"object": {
      "type": "Identifier"."start": 24."end": 31."loc": {
        "start": { "line": 3."column": 2 },
        "end": { "line": 3."column": 9 },
        "identifierName": "console"
      },
      "name": "console"
    },
    "property": {
      "type": "Identifier"."start": 32."end": 35."loc": {
        "start": { "line": 3."column": 10 },
        "end": { "line": 3."column": 13 },
        "identifierName": "log"
      },
      "name": "log"
    },
    "computed": false
  },
  "arguments": [{"type": "StringLiteral"."start": 36."end": 42."loc": {
        "start": { "line": 3."column": 14 },
        "end": { "line": 3."column": 20}},"extra": { "rawValue": "data"."raw": "'data'" },
      "value": "data"}}]Copy the code

We removed the unnecessary start, end, and LOC attributes so that the code can be read at a glance against the data

Run the file again

Console. log (console.log, console.log, console.log, console.log)

For your convenience, the following is the complete code

const generator = require("@babel/generator");
const parser = require("@babel/parser");
const traverse = require("@babel/traverse");
const types = require("@babel/types");
const fs = require("fs");


function compile(code) {
  // 1.parse
  const ast = parser.parse(code);

  // 2,traverse
  const visitor = {
    CallExpression(path) {
      const { callee } = path.node;
      const isConsoleLog =
        types.isMemberExpression(callee) &&
        callee.object.name === "console" &&
        callee.property.name === "log";
      if (isConsoleLog) {
        const funcPath = path.findParent(p= > {
          return p.isFunctionDeclaration();
        });
        const funcName = funcPath.node.id.name;
        fs.writeFileSync("./funcPath.json".JSON.stringify(funcPath.node), err => {
          if (err) throw err;
          console.log("Write succeeded"); }); path.node.arguments.unshift(types.stringLiteral(funcName)); }}}; traverse.default(ast, visitor);// 3. generator
  return generator.default(ast, {}, code);
}

const code = ` function getData() { console.log('data') } `;
console.log(compile(code).code);

Copy the code

If you’re comfortable with this, you’ll have a good understanding of AST and how Babel compiles code, so you won’t be unfamiliar with writing WebPack configurations in the future.

conclusion

We also usually use webpack packaging to compile our code to degrade ES6 syntax for compatibility with older browsers, such as making arrow functions normal. Changing const, let declarations to var, etc., is done through AST, but the implementation process is more complicated and refined. But it’s all the same:

  1. Js syntax parses to AST;
  2. Modify the AST;
  3. AST to JS syntax;

The last

If you have time, you can also try some common code conversions such as arrow function to normal function, which can be a good impression.

The whole article, if there are mistakes or not rigorous place, please be sure to give correction, thank you!

reference

  • github ES Tree

  • Babel’s official website

  • Abstract syntax tree