dpjeep.com/gozhi-di-ce…

background

Recently, we need to do some automation tools based on AST, so we also need to do some research on this magic weapon. This article also intends to briefly explain the following two parts:

  • Parse a Go program through the AST
  • This AST is then analyzed through the Go standard library

AST

What is AST? It is short for Abstract Syntax Tree. It represents the syntactic structure of a programming language as a tree, with each node in the tree representing a structure in the source code. The syntax is “abstract” because it does not represent every detail that occurs in real grammar.

The main course

Appetizers

The following content is a bit long, want to buy some melon seeds first, while kowtowing while watching?

The build process

To cover the related AST part, let’s briefly explain what we know about the compilation process:

  • Lexical analysis
  • Syntax analysis
  • Semantic analysis and intermediate code generation
  • Compiler optimization
  • Object code generation and what we’re going to use now is a very friendly chain of lexical and grammar analysis tools that Google has prepared for us to build cars.

Code sample

Examples are provided in Golang’s official documentation, so we will not post the source of the documentation, but only release some use cases

// This example shows what an AST looks like when printed for debugging.
func ExamplePrint(a) {
	// src is the input for which we want to print the AST.
	src := ` package main func main() { println("Hello, World!" )} `

	// Create the AST by parsing src.
	fset := token.NewFileSet() // positions are relative to fset
	f, err := parser.ParseFile(fset, "", src, 0)
	iferr ! =nil {
		panic(err)
	}

	// Print the AST.
	ast.Print(fset, f)

	// Output:
	// 0 *ast.File {
	// 1 . Package: 2:1
	// 2 . Name: *ast.Ident {
	// 3 . . NamePos: 2:9
	// 4 . . Name: "main"
	/ / 5.}
	// 6 . Decls: []ast.Decl (len = 1) {
	// 7 . . 0: *ast.FuncDecl {
	// 8 . . . Name: *ast.Ident {
	// 9 . . . . NamePos: 3:6
	// 10 . . . . Name: "main"
	// 11 . . . . Obj: *ast.Object {
	// 12 . . . . . Kind: func
	// 13 . . . . . Name: "main"
	// 14 . . . . . Decl: *(obj @ 7)
	// 15....}
	// 16..}
	// 17 . . . Type: *ast.FuncType {
	// 18 . . . . Func: 3:1
	// 19 . . . . Params: *ast.FieldList {
	// 20 . . . . . Opening: 3:10
	// 21 . . . . . Closing: 3:11
	// 22....}
	// 23..}
	// 24 . . . Body: *ast.BlockStmt {
	// 25 . . . . Lbrace: 3:13
	// 26 . . . . List: []ast.Stmt (len = 1) {
	// 27 . . . . . 0: *ast.ExprStmt {
	// 28 . . . . . . X: *ast.CallExpr {
	// 29 . . . . . . . Fun: *ast.Ident {
	// 30 . . . . . . . . NamePos: 4:2
	// 31 . . . . . . . . Name: "println"
	// 32.......}
	// 33 . . . . . . . Lparen: 4:9
	// 34 . . . . . . . Args: []ast.Expr (len = 1) {
	// 35 . . . . . . . . 0: *ast.BasicLit {
	// 36 . . . . . . . . . ValuePos: 4:10
	// 37 . . . . . . . . . Kind: STRING
	// 38 . . . . . . . . . Value: "\"Hello, World! \ ""
	// 39........}
	// 40.......}
	// 41 . . . . . . . Ellipsis: -
	// 42 . . . . . . . Rparen: 4:25
	// 43......}
	// 44.....}
	// 45....}
	// 46 . . . . Rbrace: 5:1
	// 47..}
	// 48..}
	/ / 49.}
	// 50 . Scope: *ast.Scope {
	// 51 . . Objects: map[string]*ast.Object (len = 1) {
	// 52 . . . "main": *(obj @ 11)
	// 53..}
	/ / 54.}
	// 55 . Unresolved: []*ast.Ident (len = 1) {
	// 56 . . 0: *(obj @ 29)
	/ / 57.}
	58} / /
}
Copy the code

Do you feel dizzy when you see the print above? Ha ha, me too. There are so many interesting elements hidden in a simple Hello World, functions, variables, comments, imports, etc. How can we extract the data we want from it? To do this, we need to use the Go/Parser package Golang provides for us:

// Create the AST by parsing src.
fset := token.NewFileSet() // positions are relative to fset
f, err := parser.ParseFile(fset, "", src, 0)
iferr ! =nil {
    panic(err)
}
Copy the code

The first line references the Go/Token package to create a new source file FileSet for parsing. ParseFile returns a structure of type ast.file (the original document). If you look back at the log print above, the meaning of each field element should be clear to you. The structure is defined as follows:

type File struct {
        Doc        *CommentGroup   // associated documentation; or nil
        Package    token.Pos       // position of "package" keyword
        Name       *Ident          // package name
        Decls      []Decl          // top-level declarations; or nil
        Scope      *Scope          // package scope (this file only)
        Imports    []*ImportSpec   // imports in this file
        Unresolved []*Ident        // unresolved identifiers in this file
        Comments   []*CommentGroup // list of all comments in the source file
}
Copy the code

So now we’re going to do a small code example using this structure, and we’ll parse the following file ast_traversal.go:

package ast_demo

import "fmt"

type Example1 struct {
	// Foo Comments
	Foo string `json:"foo"`
}

type Example2 struct {
	// Aoo Comments
	Aoo int `json:"aoo"`
}

// print Hello World
func PrintHello(a){
	fmt.Println("Hello World")}Copy the code

We can already parse this File using the ast.file structure mentioned above, such as listing the referenced packages using F.iPorts:

for _, i := range f.Imports {
	t.Logf("import:	%s", i.Path.Value)
}
Copy the code

Similarly, we can filter out comments, functions, etc., such as:

for _, i := range f.Comments {
	t.Logf("comment:	%s", i.Text())
}

for _, i := range f.Decls {
	fn, ok := i.(*ast.FuncDecl)
	if! ok {continue
	}
	t.Logf("function:	%s", fn.Name.Name)
}
Copy the code

*ast.FucDecl is used for functions. At this point, step to the top of this article and look at the print of the AST tree. [] ast.decl is stored in the form of an array, and it stores various types of Nodes. Here, the cast is used to detect whether a certain type exists. If so, it will be printed according to the structure of the type. The above way has been able to meet our basic needs, specific to a type can be resolved. However, there is always a but, ha ha, through the above way to a resolution is not a little trouble? No problem, Google dad gave us another quick and easy way to do this with the Go/AST package:

// Inspect traverses an AST in depth-first order: It starts by calling
// f(node); node must not be nil. If f returns true, Inspect invokes f
// recursively for each of the non-nil children of node, followed by a
// call of f(nil).
//
func Inspect(node Node, f func(Node) bool) {
	Walk(inspector(f), node)
}
Copy the code

The general usage of this method is that the entire AST passed in is parsed depth-first, starting with a call to f(node); The node cannot be zero. If f returns true, Inspect recursively calls f for each non-zero child of the node and then calls f(nil). Related uses are as follows:

ast.Inspect(f, func(n ast.Node) bool {
	// Find Return Statements
	ret, ok := n.(*ast.ReturnStmt)
	if ok {
		t.Logf("return statement found on line %d:\n\t", fset.Position(ret.Pos()).Line)
		printer.Fprint(os.Stdout, fset, ret)
		return true
	}

	// Find Functions
	fn, ok := n.(*ast.FuncDecl)
	if ok {
		var exported string
		if fn.Name.IsExported() {
			exported = "exported "
		}
		t.Logf("%sfunction declaration found on line %d: %s", exported, fset.Position(fn.Pos()).Line, fn.Name.Name)
		return true
	}

	return true
})
Copy the code

Afterword.

At this point, you may have run out of melon seeds in your hand. AST is very useful, and what we have talked about above is only a small part of AST. Many of the underlying analysis tools are based on it for grammar analysis. In the future, some small tools based on Go AST will be updated. I hope I can realize it as soon as possible, haha 😆. The following is the test case used in the above and the source code for field parsing of the structure using AST. I have submitted it to Github. If you are interested, you can go to see it

  • ast-demo
  • parseStruct