Before you read this article, open up your package.json project and you’ll find that many tools are already in every corner of your development routine: JavaScript translation, CSS preprocessing, code compression, ESLint, Prettier, and so on. Most of these tool modules will not be delivered into production, but their presence is essential to our development.
Have you ever wondered how the functionality of these tools is implemented? Yes, the Abstract Syntax Tree is the cornerstone of this tool.
What is AST & how is it generated
The AST is a tree representation of the abstract syntactic structure of source code. Each node in the tree represents a construct that appears in the source code.
So how is AST generated? Why do YOU need AST?
Those of you who know compilation know that a computer needs to go through a “long” analysis process to understand a string of source code:
- Lexical Analysis
- Syntax Analysis
- .
- Code Generation
- Lexical analysis
In the lexical analysis stage, input source code strings are scanned to generate a series of tokens, including numbers, punctuation marks, and operators. Lexical units are independent of each other, meaning that at this stage we don’t care how each line of code is put together.
Lexical analysis stage — as when we first learned English, we divided a sentence into many independent words. We first remembered the type and meaning of each word, but did not care about the specific relationship between words.
- Syntax analysis
Then, the syntax analysis phase transforms the token list generated in the previous phase into the AST shown on the right side of the figure below. Based on this data structure, you can roughly see the basic structure of the source code before conversion.
Grammar analysis stage – the teacher teaches us the specific role and meaning of each word in the context of the whole sentence.
- Code generation
Finally, there is the code generation phase, which is a very free-form process that can consist of multiple steps. At this stage we can iterate over the initial AST, modify its structure, and then generate the corresponding code string from the modified structure.
Code generation phase – we know the grammatical structure of each sentence and how to write a grammatically correct English sentence. With this basic structure, we can perfectly convert an English sentence into a Chinese sentence or classical Chinese (if you can).
Basic structure of AST
Regardless of the specific compiler and programming language, everything in the AST world is Node. Nodes of different types are nested into each other to form a complete tree structure.
{
"program": {
"type": "Program"."sourceType": "module"."body": [{"type": "FunctionDeclaration"."id": {
"type": "Identifier"."name": "foo"
},
"params": [{"type": "Identifier"."name": "x"}]."body": {
"type": "BlockStatement"."body": [{"type": "IfStatement"."test": {
"type": "BinaryExpression"."left": {
"type": "Identifier"."name": "x"
},
"operator": ">"."right": {
"type": "NumericLiteral"."value": 10}}}]}... }... ] }Copy the code
The structure of AST varies from language compiler to language compiler, from compiler to compiler, and even from language version to language version. Here are some basic definitions of AST structure in ESTree, the current common specification for JavaScript compilers. Different compilation tools are based on the corresponding extension of this structure.
AST usage & Combat 🌰
Application scenario and usage
After understanding the concept and structure of AST, you may ask: What are the application scenarios of AST and how to use it?
As mentioned in the beginning, the dependencies and VSCode plug-ins in our current project have revealed the answer. AST can be used in a wide range of scenarios, such as front-end development:
- Code highlighting, formatting, error prompts, autocomplete, etc. : ESlint, Prettier, Vetur, etc.
- Code compression confusion: uglifyJS etc.
- Code translation: Webpack, Babel, TypeScript, etc.
As for how to use the AST, it can be summarized into the following steps:
- Resolution (Parsing): This process is implemented by the compiler and goes throughLexical analysis process and grammatical analysisProcess, thereby generating
AST
.
- Read/Traverse: Depth-first traversal
AST
To access the information of each Node in the tree.
- Modify/Transform: You can modify node information to generate new nodes during traversal
AST
.
- Output (Printing): the initial
AST
After the transformation, depending on the scenario, you can directly output the newAST
, can also be translated into new code blocks.
Typically with an AST, we focus on steps 2 and 3. Tools such as Babel, ESLint, and others expose the common ability to access and modify the original AST.
The two-step implementation is based on a design pattern called the Visitor pattern, which defines a visitor object on which access methods to various types of nodes are defined so that different processing can be done for different nodes. For example, writing the Babel plug-in is essentially constructing a visitor instance to process the individual node information to produce the desired result.
const visitor = {
CallExpression(path){... }FunctionDeclaration(path){... }ImportDeclaration(path){... }... } traverse(AST, visitor)Copy the code
In actual combat
In the last part of the series, we’ll look at how Bable can be used to manipulate the AST.
The development tools
- AST Explorer: An online AST conversion tool that integrates multiple languages and parsers
- @babel/ Parser: Parses JS code into the corresponding AST
- Babel /traverse: Recursively traverse AST nodes
- @babel/types: integrates methods to quickly build, modify, and delete AST nodes
- Babel /generator: Generates new JS code based on the modified AST
🌰
Objective: Convert normal log printing in all functions to error printing, and append a string of function names to the print
// Before
function add(a, b) {
console.log(a + b)
return a + b
}
// => After
function add(a, b) {
console.error('add', a + b)
return a + b
}
Copy the code
Ideas:
- Iterates through all function CallExpression (CallExpression) nodes
- Change the attribute of the function call method from log to error
- Find the FunctionDeclaration parent node and extract the function name information
- The function name information is wrapped as StringLiteral nodes and inserted into the array of parameter nodes in the function call expression
const compile = (code) = > {
// 1. tokenizer + parser
const ast = parser.parse(code)
// 2. traverse + transform
const visitor = {
// Access the function call expression
CallExpression(path) {
const { callee } = path.node
if (types.isCallExpression(path.node) && types.isMemberExpression(callee)) {
const { object, property } = callee
// Change the attribute of the member expression to log -> error
if (object.name === 'console' && property.name === 'log') {
property.name = 'error'
} else {
return
}
// Walk up to find the function declaration node in the parent node of the function calling node
const FunctionDeclarationNode = path.findParent(parent= > {
return parent.type === 'FunctionDeclaration'
})
// Extract the function name information, wrap it into a string literal node, and insert it into the parameter array of the current node
constfuncNameNode = types.stringLiteral(FunctionDeclarationNode.node.id.name) path.node.arguments.unshift(funcNameNode) } } } traverse.default(ast, visitor)// 3. code generator
const newCode = generator.default(ast, {}, code).code
}
Copy the code
🌰 🌰
Goal: Add error capture for all functions and implement custom processing operations during the capture phase
// Before
function add(a, b) {
console.log('23333')
throw new Error('233 Error')
return a + b;
}
// => After
function add(a, b) {
// Only synchronous code execution errors can be caught
try {
console.log('23333')
throw new Error('233 Error')
return a + b;
} catch (myError) {
mySlardar(myError) // Custom processing (eg: function error automatically reported)}}Copy the code
Ideas:
- Iterate over the FunctionDeclaration node
- Extract the entire code block node under this node and process the contents of the block as a try statement (tryStatement)
- Construct a custom catch clause node as part of the try exception handling block
- Treat the entire try statement node as a child of a new function declaration node and replace the original function declaration node with the newly generated node
const compile = (code) = > {
// 1. tokenizer + parser
const ast = parser.parse(code)
// utils.writeAst2File(ast) // View ast results
// 2. traverse
const visitor = {
FunctionDeclaration(path) {
const node = path.node
const { params, id } = node // Function parameters and function body points
const blockStatementNode = node.body
// Stop traversal of try-catch blocks to prevent circle loops
if (blockStatementNode.body && types.isTryStatement(blockStatementNode.body[0]) {return
}
// Construct a cATH block node
const catchBlockStatement = types.blockStatement(
[types.expressionStatement(
types.callExpression(types.identifier('mySlardar'), [types.identifier('myError'))))// Catch clause node
const catchClause = types.catchClause(types.identifier('myError'), catchBlockStatement)
// Try statement node
const tryStatementNode = types.tryStatement(blockStatementNode, catchClause)
// The try-catch node declares the node as a new function
const tryCatchFunctionDeclare = types.functionDeclaration(id, params, types.blockStatement([tryStatementNode]))
path.replaceWith(tryCatchFunctionDeclare)
}
}
traverse.default(ast, visitor)
// 3. code generator
const newCode = generator.default(ast, {}, code).code
}
Copy the code
🌰 🌰 🌰
Goal: Implement on-demand import of imports in Webpack (beggar version babel-import-plugin)
// Before
import { Button as Btn, Dialog } from '233_UI'
import { HHH as hhh } from '233_UI'Set custom parameters:(moduleName) = > `233_UI/lib/src/${moduleName}/${moduleName} `
// => After
import { Button as Btn } from "233_UI/lib/src/Button/Button"
import { Dialog } from "233_UI/lib/src/Dialog/Dialog"
import { HHH as hhh } from "233_UI/lib/src/HHH/HHH"
Copy the code
Ideas:
- Specify a custom find file path rule in the context state in which the plug-in is running
- Traverse the import declaration node (ImportDeclaration)
- Extract all imported variable nodes from the import node (ImportSpecifier)
- The value of this node is generated by finding the file path rule to generate a new import source path, there are several import nodes have several new source paths
- Combine the imported node with the source path node to generate a new import declaration node and replace it
// Beggar edition introduces the Babel plugin on demand
const visitor = ({types}) = > {
return {
visitor: {
ImportDeclaration(path, {opts}) {
const _getModulePath = opts.moduleName // Get the module's specified path, passed in by the plug-in's arguments
const importSpecifierNodes = path.node.specifiers // The object node to import
const importSourceNode = path.node.source // Import the source node
const sourceNodePath = importSourceNode.value
// Nodes that have been successfully replaced are no longer traversed
if(! opts.libaryName || sourceNodePath ! == opts.libaryName) {return
}
const modulePaths = importSpecifierNodes.map(node= > {
return _getModulePath(node.imported.name)
})
const newImportDeclarationNodes = importSpecifierNodes.map((node, index) = > {
return types.importDeclaration([node], types.stringLiteral(modulePaths[index]))
})
path.replaceWithMultiple(newImportDeclarationNodes)
}
}
}
}
const result = babel.transform(code, {
plugins: [
[
visitor,
{
libaryName: '233_UI'.moduleName: moduleName= > `233_UI/lib/src/${moduleName}/${moduleName}`}}]])Copy the code
Detailed code and repository addresses for the above three 🌰 examples can be found at github.com/xunhui/ast_…
conclusion
We probably don’t spend a lot of time working with aN AST in our daily lives, and we don’t care much about the underlying compiler principles of the language, but knowing an AST can help us better understand the principles of everyday development tools, and make it easier to work with the apis exposed by those tools.
Every day at work, we pour out our joys and sorrows to the machine in front of us through line after line of code. How it reads your feelings and responds to them is something worth exploring 🙂
reference
ASTs – What are they and how to use them
AST implementation function errors are automatically reported
Babel Handbook