What is the AST
Abstract Syntax Tree (AST for short) is an Abstract representation of source code Syntax structure. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code.
What is AST useful for
AST is widely used, for example:
- Editor error prompt, code formatting, code highlighting, code auto-completion;
elint
,pretiier
Checks for code errors or styles;webpack
throughbabel
translationjavascript
Grammar;
And if you want to understand how JS builds and executes, then you need to know AST.
How an AST is generated
The first step of JS execution is to read the character stream in the JS file, then generate tokens through lexical analysis, then generate AST through Parser, and finally generate machine code execution.
The whole parsing process is divided into the following two steps:
- Word segmentation: Splits the entire code string into an array of minimal syntax units
- Grammatical analysis: to establish and analyze the relationships between grammatical units based on word segmentation
JS Parser is a JS grammar Parser, which can convert JS source code into AST. Common Parser includes Esprima, Traceur, Acorn, Shift, etc.
Lexical analysis
Lexical analysis, also known as scanner, simply means calling the next() method to read characters letter by letter, and then comparing them with the defined JavaScript key characters to generate the corresponding Token. Token is an indivisible minimum unit:
For example, the three characters var can only be considered as a whole and cannot be decomposed semantically. Therefore, it is a Token.
In lexical analysis, each keyword is a Token, each identifier is a Token, each operator is a Token, and each punctuation mark is a Token. In addition, comments and whitespace characters (newlines, Spaces, tabs, and so on) are filtered out of the source program.
Eventually, the entire code is split into a list of tokens (or a one-dimensional array).
Syntax analysis
The syntax analysis will transform the tokens from lexical analysis into an abstract syntax tree structure with grammatical meaning. Also, validate the syntax and throw syntax errors if there are any.
So with that said let’s take a look at what happens when we convert a javaScript snippet into an AST and show you a simple line of code, okay
🌰 example 1
const fn = a => a;
Copy the code
From this AST syntax tree, we can clearly see the specific meaning of a code, and what syntax, methods, etc.
Const fn refers to an arrow function expression. Its argument is a and the function body is a.
🌰 example 2
const fn = a => {
let i = 1;
return a + i;
};
Copy the code
Let’s look at the body:
🌰 example 3
A function call
function test() {let a = 1;
console.log(a)
}
Copy the code
Basically see MemberExpression
The above screenshots are resolved using Acorn. The reason for using Acorn is that as far as I know Acorn is recognized as the fastest parser. And one of the Webpack packaging tools we use for Babel is Acorn.
The properties in the screenshot above are part of the AST, a structure that contains many properties.
- VariableDeclaration VariableDeclaration
- Description of VariableDeclarator variable declarations
- Expression Expression node
- …
More properties:
- You can go to the AST Explorer and see the AST generated by parsing javascript code from different Parsers online.
- See all ESTree ESTree on Github
- Document Abstract syntax tree AST introduction to property introduction
Practical AST application
The title
Console. log AST (xx) is called with a function name in front of it so that the user can see which function is called when printing.
For example,
/ / the source code
function getData() {
console.log("data")}// --------------------
// The converted code
function getData() {
console.log("getData"."data");
}
Copy the code
introduce
Let’s start with the tool we need to use, Babel
@babel/parser
: 将 js 代码 ——->>>AST
Abstract syntax tree;@babel/traverse
对AST
Nodes are recursively traversed;@babel/types
About specificAST
Node to modify;@babel/generator
:AST
Abstract syntax tree ——->>> new JS code;
Why Babel? Mainly is relatively easy to use (only this is more familiar with 😭).
Babel/Parser uses Acorn to parse JS code into AST syntax trees.
Start coding
- Create a new file to open the package required for console installation
cnpm i @babel/parser @babel/traverse @babel/types @babel/generator -D
Copy the code
- Create the JS file and write the rough layout as follows using AST
const generator = require("@babel/generator");
const parser = require("@babel/parser");
const traverse = require("@babel/traverse");
const types = require("@babel/types");
function compile(code) {
// 1. Parse code into an abstract syntax tree (AST)
const ast = parser.parse(code);
// 2,traverse switch code
traverse.default(ast, {});
// 3. Generator converts the AST back into code
return generator.default(ast, {}, code);
}
const code = ` function getData() { console.log("data") } `;
const newCode = compile(code)
Copy the code
Using Node to run the result, since nothing is being processed and the output is the original code,
Improve compile method
function compile(code) {
// 1.parse
const ast = parser.parse(code);
// 2,traverse
const visitor = {
CallExpression(path) {
// Get callee data
const { callee } = path.node;
// Check if console.log is called
// 1. Check whether it is a member expression node, as detailed in the screenshot above
// 2. Check whether it is a console object
// 3. Check whether the attribute of the object is log
const isConsoleLog =
types.isMemberExpression(callee) &&
callee.object.name === "console" &&
callee.property.name === "log";
if (isConsoleLog) {
// If it is a console.log call to find the last parent node is a function
const funcPath = path.findParent(p= > {
return p.isFunctionDeclaration();
});
// Take the function name
const funcName = funcPath.node.id.name;
// Place the name in front of function arguments with typespath.node.arguments.unshift(types.stringLiteral(funcName)); }}};// traverse switch code
traverse.default(ast, visitor);
// 3. Generator converts the AST back into code
return generator.default(ast, {}, code);
}
Copy the code
It’s a little hard to understand in pure code. Here’s what I’m going to do: I’m going to write the above path.node into a file to show you the data format.
{
"type": "CallExpression"."start": 24."end": 43."loc": {
"start": { "line": 3."column": 2 },
"end": { "line": 3."column": 21}},"callee": {
"type": "MemberExpression"."start": 24."end": 35."loc": {
"start": { "line": 3."column": 2 },
"end": { "line": 3."column": 13}},"object": {
"type": "Identifier"."start": 24."end": 31."loc": {
"start": { "line": 3."column": 2 },
"end": { "line": 3."column": 9 },
"identifierName": "console"
},
"name": "console"
},
"property": {
"type": "Identifier"."start": 32."end": 35."loc": {
"start": { "line": 3."column": 10 },
"end": { "line": 3."column": 13 },
"identifierName": "log"
},
"name": "log"
},
"computed": false
},
"arguments": [{"type": "StringLiteral"."start": 36."end": 42."loc": {
"start": { "line": 3."column": 14 },
"end": { "line": 3."column": 20}},"extra": { "rawValue": "data"."raw": "'data'" },
"value": "data"}}]Copy the code
We removed the unnecessary start, end, and LOC attributes so that the code can be read at a glance against the data
Run the file again
Console. log (console.log, console.log, console.log, console.log)
For your convenience, the following is the complete code
const generator = require("@babel/generator");
const parser = require("@babel/parser");
const traverse = require("@babel/traverse");
const types = require("@babel/types");
const fs = require("fs");
function compile(code) {
// 1.parse
const ast = parser.parse(code);
// 2,traverse
const visitor = {
CallExpression(path) {
const { callee } = path.node;
const isConsoleLog =
types.isMemberExpression(callee) &&
callee.object.name === "console" &&
callee.property.name === "log";
if (isConsoleLog) {
const funcPath = path.findParent(p= > {
return p.isFunctionDeclaration();
});
const funcName = funcPath.node.id.name;
fs.writeFileSync("./funcPath.json".JSON.stringify(funcPath.node), err => {
if (err) throw err;
console.log("Write succeeded"); }); path.node.arguments.unshift(types.stringLiteral(funcName)); }}}; traverse.default(ast, visitor);// 3. generator
return generator.default(ast, {}, code);
}
const code = ` function getData() { console.log('data') } `;
console.log(compile(code).code);
Copy the code
If you’re comfortable with this, you’ll have a good understanding of AST and how Babel compiles code, so you won’t be unfamiliar with writing WebPack configurations in the future.
conclusion
We also usually use webpack packaging to compile our code to degrade ES6 syntax for compatibility with older browsers, such as making arrow functions normal. Changing const, let declarations to var, etc., is done through AST, but the implementation process is more complicated and refined. But it’s all the same:
- Js syntax parses to AST;
- Modify the AST;
- AST to JS syntax;
The last
If you have time, you can also try some common code conversions such as arrow function to normal function, which can be a good impression.
The whole article, if there are mistakes or not rigorous place, please be sure to give correction, thank you!
reference
-
github ES Tree
-
Babel’s official website
-
Abstract syntax tree