Before you read this article, open up your package.json project and you’ll find that many tools are already in every corner of your development routine: JavaScript translation, CSS preprocessing, code compression, ESLint, Prettier, and so on. Most of these tool modules will not be delivered into production, but their presence is essential to our development.

Have you ever wondered how the functionality of these tools is implemented? Yes, the Abstract Syntax Tree is the cornerstone of this tool.

What is AST & how is it generated

The AST is a tree representation of the abstract syntactic structure of source code. Each node in the tree represents a construct that appears in the source code.

So how is AST generated? Why do YOU need AST?

Those of you who know compilation know that a computer needs to go through a “long” analysis process to understand a string of source code:

  1. Lexical Analysis
  2. Syntax Analysis
  3. .
  4. Code Generation

  • Lexical analysis

In the lexical analysis stage, input source code strings are scanned to generate a series of tokens, including numbers, punctuation marks, and operators. Lexical units are independent of each other, meaning that at this stage we don’t care how each line of code is put together.

Lexical analysis stage — as when we first learned English, we divided a sentence into many independent words. We first remembered the type and meaning of each word, but did not care about the specific relationship between words.

  • Syntax analysis

Then, the syntax analysis phase transforms the token list generated in the previous phase into the AST shown on the right side of the figure below. Based on this data structure, you can roughly see the basic structure of the source code before conversion.

Grammar analysis stage – the teacher teaches us the specific role and meaning of each word in the context of the whole sentence.

  • Code generation

Finally, there is the code generation phase, which is a very free-form process that can consist of multiple steps. At this stage we can iterate over the initial AST, modify its structure, and then generate the corresponding code string from the modified structure.

Code generation phase – we know the grammatical structure of each sentence and how to write a grammatically correct English sentence. With this basic structure, we can perfectly convert an English sentence into a Chinese sentence or classical Chinese (if you can).

Basic structure of AST

Regardless of the specific compiler and programming language, everything in the AST world is Node. Nodes of different types are nested into each other to form a complete tree structure.

{
  "program": {
    "type": "Program"."sourceType": "module"."body": [{"type": "FunctionDeclaration"."id": {
          "type": "Identifier"."name": "foo"
        },
        "params": [{"type": "Identifier"."name": "x"}]."body": {
          "type": "BlockStatement"."body": [{"type": "IfStatement"."test": {
                "type": "BinaryExpression"."left": {
                  "type": "Identifier"."name": "x"
                },
                "operator": ">"."right": {
                  "type": "NumericLiteral"."value": 10}}}]}... }... ] }Copy the code

The structure of AST varies from language compiler to language compiler, from compiler to compiler, and even from language version to language version. Here are some basic definitions of AST structure in ESTree, the current common specification for JavaScript compilers. Different compilation tools are based on the corresponding extension of this structure.

AST usage & Combat 🌰

Application scenario and usage

After understanding the concept and structure of AST, you may ask: What are the application scenarios of AST and how to use it?

As mentioned in the beginning, the dependencies and VSCode plug-ins in our current project have revealed the answer. AST can be used in a wide range of scenarios, such as front-end development:

  • Code highlighting, formatting, error prompts, autocomplete, etc. : ESlint, Prettier, Vetur, etc.
  • Code compression confusion: uglifyJS etc.
  • Code translation: Webpack, Babel, TypeScript, etc.

As for how to use the AST, it can be summarized into the following steps:

  1. Resolution (Parsing): This process is implemented by the compiler and goes throughLexical analysis process and grammatical analysisProcess, thereby generatingAST.
  1. Read/Traverse: Depth-first traversalASTTo access the information of each Node in the tree.
  1. Modify/Transform: You can modify node information to generate new nodes during traversalAST.
  1. Output (Printing): the initialASTAfter the transformation, depending on the scenario, you can directly output the newAST, can also be translated into new code blocks.

Typically with an AST, we focus on steps 2 and 3. Tools such as Babel, ESLint, and others expose the common ability to access and modify the original AST.

The two-step implementation is based on a design pattern called the Visitor pattern, which defines a visitor object on which access methods to various types of nodes are defined so that different processing can be done for different nodes. For example, writing the Babel plug-in is essentially constructing a visitor instance to process the individual node information to produce the desired result.

const visitor = {

    CallExpression(path){... }FunctionDeclaration(path){... }ImportDeclaration(path){... }... } traverse(AST, visitor)Copy the code

In actual combat

In the last part of the series, we’ll look at how Bable can be used to manipulate the AST.

The development tools

  • AST Explorer: An online AST conversion tool that integrates multiple languages and parsers
  • @babel/ Parser: Parses JS code into the corresponding AST
  • Babel /traverse: Recursively traverse AST nodes
  • @babel/types: integrates methods to quickly build, modify, and delete AST nodes
  • Babel /generator: Generates new JS code based on the modified AST

🌰

Objective: Convert normal log printing in all functions to error printing, and append a string of function names to the print

// Before

function add(a, b) {

    console.log(a + b)

    return a + b

}



// => After

function add(a, b) {

    console.error('add', a + b)

    return a + b

}
Copy the code

Ideas:

  • Iterates through all function CallExpression (CallExpression) nodes
  • Change the attribute of the function call method from log to error
  • Find the FunctionDeclaration parent node and extract the function name information
  • The function name information is wrapped as StringLiteral nodes and inserted into the array of parameter nodes in the function call expression
const compile = (code) = > {

  // 1. tokenizer + parser

  const ast = parser.parse(code)

  // 2. traverse + transform

  const visitor = {

    // Access the function call expression

    CallExpression(path) {

      const { callee } = path.node

      if (types.isCallExpression(path.node) && types.isMemberExpression(callee)) {

        const { object, property } = callee

        // Change the attribute of the member expression to log -> error

        if (object.name === 'console' && property.name === 'log') {

          property.name = 'error'

        } else {

          return

        }

        // Walk up to find the function declaration node in the parent node of the function calling node

        const FunctionDeclarationNode = path.findParent(parent= > {

          return parent.type === 'FunctionDeclaration'

        })

        // Extract the function name information, wrap it into a string literal node, and insert it into the parameter array of the current node

        constfuncNameNode = types.stringLiteral(FunctionDeclarationNode.node.id.name) path.node.arguments.unshift(funcNameNode) } } }  traverse.default(ast, visitor)// 3. code generator

  const newCode = generator.default(ast, {}, code).code

}
Copy the code

🌰 🌰

Goal: Add error capture for all functions and implement custom processing operations during the capture phase

// Before

function add(a, b) {

  console.log('23333')

  throw new Error('233 Error')

  return a + b;

}



// => After

function add(a, b) {

  // Only synchronous code execution errors can be caught

  try {    

    console.log('23333')

    throw new Error('233 Error')

    return a + b;

  } catch (myError) {

      mySlardar(myError) // Custom processing (eg: function error automatically reported)}}Copy the code

Ideas:

  • Iterate over the FunctionDeclaration node
  • Extract the entire code block node under this node and process the contents of the block as a try statement (tryStatement)
  • Construct a custom catch clause node as part of the try exception handling block
  • Treat the entire try statement node as a child of a new function declaration node and replace the original function declaration node with the newly generated node
const compile = (code) = > {

  // 1. tokenizer + parser

  const ast = parser.parse(code)

  // utils.writeAst2File(ast) // View ast results

  // 2. traverse

  const visitor = {

    FunctionDeclaration(path) {

      const node = path.node

      const { params, id } = node // Function parameters and function body points

      const blockStatementNode = node.body

      // Stop traversal of try-catch blocks to prevent circle loops

      if (blockStatementNode.body && types.isTryStatement(blockStatementNode.body[0]) {return

      }

      // Construct a cATH block node

      const catchBlockStatement = types.blockStatement(

        [types.expressionStatement(

          types.callExpression(types.identifier('mySlardar'), [types.identifier('myError'))))// Catch clause node

      const catchClause = types.catchClause(types.identifier('myError'), catchBlockStatement)

      // Try statement node

      const tryStatementNode = types.tryStatement(blockStatementNode, catchClause)

      // The try-catch node declares the node as a new function

      const tryCatchFunctionDeclare = types.functionDeclaration(id, params, types.blockStatement([tryStatementNode]))

      path.replaceWith(tryCatchFunctionDeclare)

    }

  }

  traverse.default(ast, visitor)

  // 3. code generator

  const newCode = generator.default(ast, {}, code).code

}
Copy the code

🌰 🌰 🌰

Goal: Implement on-demand import of imports in Webpack (beggar version babel-import-plugin)

// Before

import { Button as Btn, Dialog } from '233_UI'

import { HHH as hhh } from '233_UI'Set custom parameters:(moduleName) = > `233_UI/lib/src/${moduleName}/${moduleName} `



// => After

import { Button as Btn } from "233_UI/lib/src/Button/Button"

import { Dialog } from "233_UI/lib/src/Dialog/Dialog"

import { HHH as hhh } from "233_UI/lib/src/HHH/HHH"
Copy the code

Ideas:

  • Specify a custom find file path rule in the context state in which the plug-in is running
  • Traverse the import declaration node (ImportDeclaration)
  • Extract all imported variable nodes from the import node (ImportSpecifier)
  • The value of this node is generated by finding the file path rule to generate a new import source path, there are several import nodes have several new source paths
  • Combine the imported node with the source path node to generate a new import declaration node and replace it
// Beggar edition introduces the Babel plugin on demand

const visitor = ({types}) = > {

  return {

    visitor: {

      ImportDeclaration(path, {opts}) {

        const _getModulePath = opts.moduleName // Get the module's specified path, passed in by the plug-in's arguments

        

        const importSpecifierNodes = path.node.specifiers // The object node to import

        const importSourceNode = path.node.source // Import the source node

        const sourceNodePath = importSourceNode.value

        // Nodes that have been successfully replaced are no longer traversed

        if(! opts.libaryName || sourceNodePath ! == opts.libaryName) {return

        }

        

        const modulePaths = importSpecifierNodes.map(node= > {

          return _getModulePath(node.imported.name)

        })

        const newImportDeclarationNodes = importSpecifierNodes.map((node, index) = > {

          return types.importDeclaration([node], types.stringLiteral(modulePaths[index]))

        })

        path.replaceWithMultiple(newImportDeclarationNodes)

      }

    }

  }

}



const result = babel.transform(code, {

  plugins: [

    [

      visitor,

      {

        libaryName: '233_UI'.moduleName: moduleName= > `233_UI/lib/src/${moduleName}/${moduleName}`}}]])Copy the code

Detailed code and repository addresses for the above three 🌰 examples can be found at github.com/xunhui/ast_…

conclusion

We probably don’t spend a lot of time working with aN AST in our daily lives, and we don’t care much about the underlying compiler principles of the language, but knowing an AST can help us better understand the principles of everyday development tools, and make it easier to work with the apis exposed by those tools.

Every day at work, we pour out our joys and sorrows to the machine in front of us through line after line of code. How it reads your feelings and responds to them is something worth exploring 🙂

reference

ASTs – What are they and how to use them

AST implementation function errors are automatically reported

Babel Handbook