dpjeep.com/gozhi-di-ce…
background
Recently, we need to do some automation tools based on AST, so we also need to do some research on this magic weapon. This article also intends to briefly explain the following two parts:
- Parse a Go program through the AST
- This AST is then analyzed through the Go standard library
AST
What is AST? It is short for Abstract Syntax Tree. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code. The syntax is “abstract” because it does not represent every detail that occurs in real grammar.
The main course
Appetizers
The following content is a bit long, want to buy some melon seeds first, while kowtowing while watching?
The build process
To cover the related AST part, let’s briefly explain what we know about the compilation process:
- Lexical analysis
- Syntax analysis
- Semantic analysis and intermediate code generation
- Compiler optimization
- Object code generation and what we’re going to use now is a very friendly chain of lexical and grammar analysis tools that Google has prepared for us to build cars.
Code sample
Examples are provided in Golang’s official documentation, so we will not post the source of the documentation, but only release some use cases
// This example shows what an AST looks like when printed for debugging.
func ExamplePrint(a) {
// src is the input for which we want to print the AST.
src := ` package main func main() { println("Hello, World!" )} `
// Create the AST by parsing src.
fset := token.NewFileSet() // positions are relative to fset
f, err := parser.ParseFile(fset, "", src, 0)
iferr ! =nil {
panic(err)
}
// Print the AST.
ast.Print(fset, f)
// Output:
// 0 *ast.File {
// 1 . Package: 2:1
// 2 . Name: *ast.Ident {
// 3 . . NamePos: 2:9
// 4 . . Name: "main"
/ / 5.}
// 6 . Decls: []ast.Decl (len = 1) {
// 7 . . 0: *ast.FuncDecl {
// 8 . . . Name: *ast.Ident {
// 9 . . . . NamePos: 3:6
// 10 . . . . Name: "main"
// 11 . . . . Obj: *ast.Object {
// 12 . . . . . Kind: func
// 13 . . . . . Name: "main"
// 14 . . . . . Decl: *(obj @ 7)
// 15....}
// 16..}
// 17 . . . Type: *ast.FuncType {
// 18 . . . . Func: 3:1
// 19 . . . . Params: *ast.FieldList {
// 20 . . . . . Opening: 3:10
// 21 . . . . . Closing: 3:11
// 22....}
// 23..}
// 24 . . . Body: *ast.BlockStmt {
// 25 . . . . Lbrace: 3:13
// 26 . . . . List: []ast.Stmt (len = 1) {
// 27 . . . . . 0: *ast.ExprStmt {
// 28 . . . . . . X: *ast.CallExpr {
// 29 . . . . . . . Fun: *ast.Ident {
// 30 . . . . . . . . NamePos: 4:2
// 31 . . . . . . . . Name: "println"
// 32.......}
// 33 . . . . . . . Lparen: 4:9
// 34 . . . . . . . Args: []ast.Expr (len = 1) {
// 35 . . . . . . . . 0: *ast.BasicLit {
// 36 . . . . . . . . . ValuePos: 4:10
// 37 . . . . . . . . . Kind: STRING
// 38 . . . . . . . . . Value: "\"Hello, World! \ ""
// 39........}
// 40.......}
// 41 . . . . . . . Ellipsis: -
// 42 . . . . . . . Rparen: 4:25
// 43......}
// 44.....}
// 45....}
// 46 . . . . Rbrace: 5:1
// 47..}
// 48..}
/ / 49.}
// 50 . Scope: *ast.Scope {
// 51 . . Objects: map[string]*ast.Object (len = 1) {
// 52 . . . "main": *(obj @ 11)
// 53..}
/ / 54.}
// 55 . Unresolved: []*ast.Ident (len = 1) {
// 56 . . 0: *(obj @ 29)
/ / 57.}
58} / /
}
Copy the code
Do you feel dizzy when you see the print above? Ha ha, me too. There are so many interesting elements hidden in a simple Hello World, functions, variables, comments, imports, etc. How can we extract the data we want from it? To do this, we need to use the Go/Parser package Golang provides for us:
// Create the AST by parsing src.
fset := token.NewFileSet() // positions are relative to fset
f, err := parser.ParseFile(fset, "", src, 0)
iferr ! =nil {
panic(err)
}
Copy the code
The first line references the Go/Token package to create a new source file FileSet for parsing. ParseFile returns a structure of type ast.file (the original document). If you look back at the log print above, the meaning of each field element should be clear to you. The structure is defined as follows:
type File struct {
Doc *CommentGroup // associated documentation; or nil
Package token.Pos // position of "package" keyword
Name *Ident // package name
Decls []Decl // top-level declarations; or nil
Scope *Scope // package scope (this file only)
Imports []*ImportSpec // imports in this file
Unresolved []*Ident // unresolved identifiers in this file
Comments []*CommentGroup // list of all comments in the source file
}
Copy the code
So now we’re going to do a small code example using this structure, and we’ll parse the following file ast_traversal.go:
package ast_demo
import "fmt"
type Example1 struct {
// Foo Comments
Foo string `json:"foo"`
}
type Example2 struct {
// Aoo Comments
Aoo int `json:"aoo"`
}
// print Hello World
func PrintHello(a){
fmt.Println("Hello World")}Copy the code
We can already parse this File using the ast.file structure mentioned above, such as listing the referenced packages using F.iPorts:
for _, i := range f.Imports {
t.Logf("import: %s", i.Path.Value)
}
Copy the code
Similarly, we can filter out comments, functions, etc., such as:
for _, i := range f.Comments {
t.Logf("comment: %s", i.Text())
}
for _, i := range f.Decls {
fn, ok := i.(*ast.FuncDecl)
if! ok {continue
}
t.Logf("function: %s", fn.Name.Name)
}
Copy the code
*ast.FucDecl is used for functions. At this point, step to the top of this article and look at the print of the AST tree. [] ast.decl is stored in the form of an array, and it stores various types of Nodes. Here, the cast is used to detect whether a certain type exists. If so, it will be printed according to the structure of the type. The above way has been able to meet our basic needs, specific to a type can be resolved. However, there is always a but, ha ha, through the above way to a resolution is not a little trouble? No problem, Google dad gave us another quick and easy way to do this with the Go/AST package:
// Inspect traverses an AST in depth-first order: It starts by calling
// f(node); node must not be nil. If f returns true, Inspect invokes f
// recursively for each of the non-nil children of node, followed by a
// call of f(nil).
//
func Inspect(node Node, f func(Node) bool) {
Walk(inspector(f), node)
}
Copy the code
The general usage of this method is that the entire AST passed in is parsed depth-first, starting with a call to f(node); The node cannot be zero. If f returns true, Inspect recursively calls f for each non-zero child of the node and then calls f(nil). Related uses are as follows:
ast.Inspect(f, func(n ast.Node) bool {
// Find Return Statements
ret, ok := n.(*ast.ReturnStmt)
if ok {
t.Logf("return statement found on line %d:\n\t", fset.Position(ret.Pos()).Line)
printer.Fprint(os.Stdout, fset, ret)
return true
}
// Find Functions
fn, ok := n.(*ast.FuncDecl)
if ok {
var exported string
if fn.Name.IsExported() {
exported = "exported "
}
t.Logf("%sfunction declaration found on line %d: %s", exported, fset.Position(fn.Pos()).Line, fn.Name.Name)
return true
}
return true
})
Copy the code
Afterword.
At this point, you may have run out of melon seeds in your hand. AST is very useful, and what we have talked about above is only a small part of AST. Many of the underlying analysis tools are based on it for grammar analysis. In the future, some small tools based on Go AST will be updated. I hope I can realize it as soon as possible, haha 😆. The following is the test case used in the above and the source code for field parsing of the structure using AST. I have submitted it to Github. If you are interested, you can go to see it
- ast-demo
- parseStruct