Many automated code generation tools rely on syntax tree analysis, such as GoImport, Gomock, wire, and others. Many interesting and useful tools can be implemented based on syntax tree analysis. This article uses examples to show how to manipulate syntax trees based on the AST standard package.

A complete example of the code in this article can be found here: ast-example

Quick Start

First let’s take a look at what the syntax tree looks like. The following code will print the syntax tree for the./demo.go file:

package main

import (
	"go/ast"
	"go/parser"
	"go/token"
	"log"
	"path/filepath"
)

func main() {fset := token.newfileset () {fset := token.newfileset ();"./demo.go")
	f, err := parser.ParseFile(fset, path, nil, parser.AllErrors)
	iferr ! = nil { log.Println(err)returnAst.Print(fset, f)}Copy the code

demo.go:

package main

import (
	"context") // Foo structuretypeFoo struct {I int} // Bar interfacetypeBar interface {Do(CTX context.context) error} // main method funcmain() {
    a := 1
}
Copy the code

The Demo. go file has been simplified as much as possible, but the syntax tree output is still huge. Let’s take some excerpts for a brief explanation.

The package name to which the file belongs and where it is declared in the file:

 0  *ast.File {
     1  .  Package: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:1
     2  .  Name: *ast.Ident {
     3  .  .  NamePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:9
     4  .  .  Name: "main"5.}...Copy the code

Decls is followed by Declarations that contain variables, methods, interfaces, and so on:

. 6 . Decls: []ast.Decl (len = 4) { 7 . . 0: *ast.GenDecl { 8 . . . TokPos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:3:1
     9  .  .  .  Tok: import
    10  .  .  .  Lparen: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:3:8
    11  .  .  .  Specs: []ast.Spec (len = 1) {
    12  .  .  .  .  0: *ast.ImportSpec {
    13  .  .  .  .  .  Path: *ast.BasicLit {
    14  .  .  .  .  .  .  ValuePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:4:2
    15  .  .  .  .  .  .  Kind: STRING
    16  .  .  .  .  .  .  Value: "\"context\""
    17  .  .  .  .  .  }
    18  .  .  .  .  .  EndPos: -
    19  .  .  .  .  }
    20  .  .  .  }
    21  .  .  .  Rparen: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:5:1
    22  .  .  }
 ....
Copy the code

You can see that the syntax tree contains four Decl records. Let’s take the first record as an example, which is of type * ast.gendecl. It is not hard to see that this record corresponds to our import code snippet. Information such as the beginning position (TokPos), the position of the left and right parentheses (Lparen,Rparen), and the import package (Specs) are all available from the syntax tree.

The printed message for the syntax tree comes from the ast.file structure:

$GOROOT/src/go/ast/ast.go

// This structure is located in the standard package go/ast/ast.gotype File struct {
	Doc        *CommentGroup   // associated documentation; or nil
	Package    token.Pos       // position of "package" keyword
	Name       *Ident          // package name
	Decls      []Decl          // top-level declarations; or nil
	Scope      *Scope          // package scope (this file only)
	Imports    []*ImportSpec   // imports in this file
	Unresolved []*Ident        // unresolved identifiers in this file
	Comments   []*CommentGroup // list of all comments in the source file
}
Copy the code

With comments and field names to get a sense of what each field means, let’s take a closer look at the syntax tree structure.

The Node Node

The syntax tree is composed of different nodes. There are three main types of nodes:

There are 3 main classes of nodes: Expressions and type nodes, statement nodes, and declaration nodes.

Detailed specifications and descriptions of these node types can be found in the Language Specification of Go. Interested partners can have an in-depth study, which will not be expanded here.

In practice, however, there is a fourth type of Node: Spec Node, each with its own interface definition:

$GOROOT/src/go/ast/ast.go

. // All node types implement the Node interface.type Node interface {
	Pos() token.Pos // position of first character belonging to the node
	End() token.Pos // position of first character immediately after the node
}

// All expression nodes implement the Expr interface.
type Expr interface {
	Node
	exprNode()
}

// All statement nodes implement the Stmt interface.
type Stmt interface {
	Node
	stmtNode()
}

// All declaration nodes implement the Decl interface.
type Decl interface {
	Node
	declNode()
}
...

// A Spec node represents a single (non-parenthesized) import,
// constant, type, or variable declaration.
//
type (
	// The Spec type stands for any of *ImportSpec, *ValueSpec, and *TypeSpec.
	Spec interface {
		Node
		specNode()
	}
....
)
Copy the code

You can see that all nodes inherit node interfaces, which record the start and end positions of nodes. Remember Decls from the Quick Start example? It is declaration Nodes. In addition to the above four types of nodes that use interfaces for classification, there are some nodes that do not define additional interface categories and only implement node interfaces. For ease of description, I refer to these nodes as common Nodes in this article. $GOROOT/ SRC /go/ast/ast.go lists all the implementations of all the nodes, so let’s pick a few examples to get a feel for the differences.

Expression and Type

Let’s start with expression Node.

$GOROOT/src/go/ast/ast.go

. // An Ident node represents an identifier. Ident struct { NamePos token.Pos // identifier position Name string // identifier name Obj *Object // denoted object; or nil } ...Copy the code

Indent (Identifier) represents an identifier, such as the Name field in the Quick Start example that represents the package Name is an expression node:

 0  *ast.File {
     1  .  Package: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:1
     2  .  Name: *ast.Ident { <----
     3  .  .  NamePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:1:9
     4  .  .  Name: "main"5.}...Copy the code

Next comes the Type node.

$GOROOT/src/go/ast/ast.go

. // A StructType node represents a struct type. StructType struct { Struct token.Pos // position of"struct" keyword
		Fields     *FieldList // list of field declarations
		Incomplete bool       // true if (source) fields are missing in the Fields list
	}

	// Pointer types are represented via StarExpr nodes.

	// A FuncType node represents a function type.
	FuncType struct {
		Func    token.Pos  // position of "func" keyword (token.NoPos if there is no "func")
		Params  *FieldList // (incoming) parameters; non-nil
		Results *FieldList // (outgoing) results; or nil
	}

	// An InterfaceType node represents an interface type.
	InterfaceType struct {
		Interface  token.Pos  // position of "interface" keyword
		Methods    *FieldList // list of methods
		Incomplete bool       // true if (source) methods are missing in the Methods list
	}
...
Copy the code

A Type node is easy to understand because it contains compound types such as StructType,FuncType, and InterfaceType as shown in Quick Start.

Statement

Assignment statements, control statements (if, else,for, select…) All of them belong to statement nodes.

$GOROOT/src/go/ast/ast.go

. // An AssignStmt node represents an assignment or // a short variable declaration. // AssignStmt struct { Lhs []Expr TokPos token.Pos // position of Tok Tok token.Token // assignment token, DEFINE Rhs []Expr } ... // An IfStmt node represents anif statement.
	IfStmt struct {
		If   token.Pos // position of "if" keyword
		Init Stmt      // initialization statement; or nil
		Cond Expr      // condition
		Body *BlockStmt
		Else Stmt // else branch; or nil
	}
...
Copy the code

For example, in Quick Start, the program fragment that we assign to ain main is AssignStmt:

. 174 . . . Body: *ast.BlockStmt { 175 . . . . Lbrace: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:18:13 176 . . . . List: [] the ast. Stmt (len = 1) {177..... 0: * ast AssignStmt {< - 178. Here...... Lhs: []ast.Expr (len = 1) { 179 . . . . . . . 0: *ast.Ident { 180 . . . . . . . . NamePos: /usr/local/gopath/src/github.com/DrmagicE/ast-example/quickstart/demo.go:19:2
   181  .  .  .  .  .  .  .  .  Name: "a".Copy the code

Spec Node

There are only three types of Spec nodes: ImportSpec, ValueSpec, and TypeSpec:

$GOROOT/src/go/ast/ast.go

	// An ImportSpec node represents a single package import.
	ImportSpec struct {
		Doc     *CommentGroup // associated documentation; or nil
		Name    *Ident        // local package name (including "."); or nil
		Path    *BasicLit     // import path
		Comment *CommentGroup // line comments; or nil
		EndPos  token.Pos     // end of spec (overrides Path.Pos if nonzero)
	}

	// A ValueSpec node represents a constant or variable declaration
	// (ConstSpec or VarSpec production).
	//
	ValueSpec struct {
		Doc     *CommentGroup // associated documentation; or nil
		Names   []*Ident      // value names (len(Names) > 0)
		Type    Expr          // value type; or nil
		Values  []Expr        // initial values; or nil
		Comment *CommentGroup // line comments; or nil
	}

	// A TypeSpec node represents a type declaration (TypeSpec production).
	TypeSpec struct {
		Doc     *CommentGroup // associated documentation; or nil
		Name    *Ident        // type name
		Assign  token.Pos     // position of '='.if any
		Type    Expr          // *Ident, *ParenExpr, *SelectorExpr, *StarExpr, or any of the *XxxTypes
		Comment *CommentGroup // line comments; or nil
	}
Copy the code

ImportSpec represents a single import, ValueSpec represents a constant or variable declaration, and TypeSpec represents a Type declaration. For example, in the Quick Start example, ImportSpec and TypeSpec appear

import (
	"context"// <-- here is an ImportSpec node) // Foo structuretypeFoo struct {// <-- TypeSpec node I int}Copy the code

You can see the corresponding output in the printed result of the syntax tree, and you can find it by yourself.

Declaration Node

There are only three types of Declaration nodes:

$GOROOT/src/go/ast/ast.go

.type (
	// A BadDecl node is a placeholder for declarations containing
	// syntax errors for which no correct declaration nodes can be
	// created.
	//
	BadDecl struct {
		From, To token.Pos // position range of bad declaration
	}

	// A GenDecl node (generic declaration node) represents an import,
	// constant, type or variable declaration. A valid Lparen position
	// (Lparen.IsValid()) indicates a parenthesized declaration.
	//
	// Relationship between Tok value and Specs element type:
	//
	//	token.IMPORT  *ImportSpec
	//	token.CONST   *ValueSpec
	//	token.TYPE    *TypeSpec
	//	token.VAR     *ValueSpec
	//
	GenDecl struct {
		Doc    *CommentGroup // associated documentation; or nil
		TokPos token.Pos     // position of Tok
		Tok    token.Token   // IMPORT, CONST, TYPE, VAR
		Lparen token.Pos     // position of '('.if any
		Specs  []Spec
		Rparen token.Pos // position of ') '.if any
	}

	// A FuncDecl node represents a function declaration.
	FuncDecl struct {
		Doc  *CommentGroup // associated documentation; or nil
		Recv *FieldList    // receiver (methods); or nil (functions)
		Name *Ident        // function/method name
		Type *FuncType     // function signature: parameters, results, and position of "func" keyword
		Body *BlockStmt    // function body; or nil for external (non-Go) function})...Copy the code

BadDecl indicates a syntactic error; GenDecl is used to indicate import, const, type, or variable declarations; FunDecl is used to represent function declarations. GenDecl and FunDecl are both found in Quick Start examples.

Common Node

In addition to the above four categories of nodes, there are some nodes that do not belong to the above four categories:

$GOROOT/src/go/ast/ast.go

// Comment a Comment node that represents a single line of either //- or /*- format comments.typeComment struct { ... }... // CommentGroup Comment block node, containing multiple consecutive commentstypeCommentGroup struct { ... } // Field Field nodes, which can represent fields in structure definitions, method lists in interface definitions, input arguments and return value fields in front of functionstypeField struct { ... }... // FieldList contains multiple fieldstypeFieldList struct { ... } // File represents a File nodetypeFile struct { ... } // Package represents a Package nodetype Package struct {
    ...
}
Copy the code

The Quick Start example contains all the nodes listed above, so you can find them yourself. See the source code for more detailed comments and specific structure fields.

All node types are roughly enumerated, and there are many specific node types that can not be enumerated one by one, but they are basically similar, the source notes are relatively clear, and so on when used to look at it again. Now that we have a basic understanding of the construction of the entire syntax tree, let’s go through a few examples to illustrate its use.

The sample

Add context parameters to all interface methods in the file

To do this, we need four steps:

  1. Walk through the syntax tree
  2. Check whether an import has been createdcontextPackage, if not, import
  3. Iterate through all the interface methods to see if there are any in the method listcontext.ContextType, if not the first parameter we add to the method
  4. Convert the modified syntax tree into Go code and print it

Walk through the syntax tree

The syntax tree level is deep and the nesting relationship is complex. If we cannot fully grasp the relationship between nodes and nesting rules, it is difficult for us to write the correct traversal method by ourselves. Fortunately, the AST package already provides us with traversal methods:

$GOROOT/src/go/ast/ast.go

func Walk(v Visitor, node Node) 
Copy the code
type Visitor interface {
	Visit(node Node) (w Visitor)
}
Copy the code

The Walk method traverses the syntax tree with a depth-first order, and we just implement the Visitor interface for our business needs. Walk calls the Visitor.Visit method each time a node is traversed, passing in the current node. If Visit returns nil, it stops traversing the children of the current node. The Visitor implementation for this example is as follows:

// Visitor
type Visitor struct {
}
func (v *Visitor) Visit(node ast.Node) ast.Visitor {
	switch node.(type) {
	case* ast.gendecl: GenDecl: = node.(* ast.gendeclifGendecl.tok == token.IMPORT {v.addimport (genDecl) // No need to traverse the subtree againreturn nil
		}
	caseIface := node.(* ast.interfaceType) addContext(iface) // No need to traverse the subtree againreturn nil
	}
	return v
}
Copy the code

Add the import

// addImport add context package func (v *Visitor) addImport(genDecl * ast.gendecl)false
	for_, v := range genDecl.Specs {imptSpec := v.(* ast.importSpec) // If yes"context"
		if imptSpec.Path.Value == strconv.Quote("context") {
			hasImported = true}} // If there is no import context, importif! hasImported { genDecl.Specs = append(genDecl.Specs, &ast.ImportSpec{ Path: &ast.BasicLit{ Kind: token.STRING, Value: strconv.Quote("context"),},})}}Copy the code

Add parameters for the interface method

// addContext Adds the context argument func addContext(iface * ast.interfaceType) {// If the interface method is not empty, the interface method is iteratedififace.Methods ! = nil || iface.Methods.List ! = nil {for _, v := range iface.Methods.List {
			ft := v.Type.(*ast.FuncType)
			hasContext := false// Check whether the parameter contains the context. context typefor _, v := range ft.Params.List {
				if expr, ok := v.Type.(*ast.SelectorExpr); ok {
					if ident, ok := expr.X.(*ast.Ident); ok {
						if ident.Name == "context" {
							hasContext = true}}}} // Add context parameters to methods that do not have context parametersif! hasContext { ctxField := &ast.Field{ Names: []*ast.Ident{ ast.NewIdent("ctx"Type: & ast.selectorexpr {X: ast.newident ("context"),
						Sel: ast.NewIdent("Context"), }, } list := []*ast.Field{ ctxField, } ft.Params.List = append(list, ft.Params.List...) }}}}Copy the code

Convert the syntax tree into Go code

The format package provides a conversion function for us. Format. Node prints the syntax tree as gofmt:

. var output []byte buffer := bytes.NewBuffer(output) err = format.Node(buffer, fset, f)iferr ! = nil {log.fatal (err)} // Output Go code fmt.println (buffer.string ())...Copy the code

The following output is displayed:

package main

import (
        "context"
)

type Foo interface {
        FooA(ctx context.Context, i int)
        FooB(ctx context.Context, j int)
        FooC(ctx context.Context)
}

type Bar interface {
        BarA(ctx context.Context, i int)
        BarB(ctx context.Context)
        BarC(ctx context.Context)
}
Copy the code

You can see that all of our interface sides have their first argument changed to context.context. It is recommended to print out the syntax tree in the example and look at it against the code for easy comprehension.

Some pits and deficiencies

We have now finished parsing, traversing, modifying, and exporting the syntax tree. But as you might have noticed, the file in the example does not have a single line of comment. This is really intentional, and if we add comments, we’ll find that the resulting comments in the generated file are like stray lambs, completely lost in their place. Like this:

/ / modify beforetypeFoo interface {FooA(I int) // FooB FooB(j int) FooC(CTX context.context)} // Modifiedtype Foo interface {
    FooA(ctx context.
            // FooB
            Context, i int)

    FooB(ctx context.Context, j int)
    FooC(ctx context.Context)
}
Copy the code

The reason for this is that the comments in the syntax tree generated by the AST package are “free-floating”. Remember that each node has Pos() and End() methods to identify its position? For non-comment nodes, the syntax tree adjusts their position correctly, but does not automatically adjust the position of comment nodes. If we want the comment to appear in the correct position, we must manually set the nodes Pos and End. The source code comments address this issue:

Whether and how a comment is associated with a node depends on the interpretation of the syntax tree by the manipulating program: Except for Doc and Comment comments directly associated with nodes, the remaining comments are “free-floating” (see also issues #18593, #20744).

There is a specific discussion in the issue. The official admits that this is a design defect, but it has not been improved yet. One impatient young man offered his own solution:

github.com/dave/dst

If you really need to make changes to the annotated syntax tree, try it. Although syntax trees are difficult to modify, they are sufficient for most code generation based on syntax tree analysis (Gomock, Wire, etc.).

reference

Syslog.ravelin.com/how-to-make… medium.com/@astrid.deg… Stackoverflow.com/questions/3… Github.com/golang/go/i… Github.com/golang/go/i… Golang.org/src/go/ast/…