A few days ago I wrapped the lexer and named Bklexer. Currently Bklexer supports Go/C++/Python respectively.
The code is in the GitHub project: click on the project page
Learn from each version of the try_lexer code. Take Go as an example:
Package main import (" FMT ""strconv" "./bklexer") func main() {fmt.println ("Test Code:") Code: =" declare variable = PI * 100 - The fda \ n1024 * 4 * 3.14 # # # \ n123 "FMT. Println (code) FMT. Println (" -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --") lexer: = BKLexer.NewLexer() lexer.AddRule("\\d+\\.\\d*", "FLOAT") lexer.AddRule("\\d+", "INT") lexer.AddRule("[\\p{L}\\d_]+", "NAME") lexer.AddRule("\\+", "PLUS") lexer.AddRule("\\-", "MINUS") lexer.AddRule("\\*", "MUL") lexer.AddRule("/", "DIV") lexer.AddRule("=", "ASSIGN") lexer.AddRule("#[^\\r\\n]*", "COMMENT") lexer.AddIgnores("[ \\f\\t]+") lexer.Build(code) for true { token := lexer.NextToken() if (token.TType ! = BKLexer.TOKEN_TYPE_EOF) { fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n", token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col) } if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) { break } } }
Packages including Bklexer were introduced first
import (
"fmt"
"strconv"
"./bklexer"
)
- FMT is used for printout
- Strconv is used to optimize the display of literals
- ./bklexer introduces the bklexer package
Instantiate the lexer and set the rules
lexer := BKLexer.NewLexer()
lexer.AddRule("\\d+\\.\\d*", "FLOAT")
lexer.AddRule("\\d+", "INT")
lexer.AddRule("[\\p{L}\\d_]+", "NAME")
lexer.AddRule("\\+", "PLUS")
lexer.AddRule("\\-", "MINUS")
lexer.AddRule("\\*", "MUL")
lexer.AddRule("/", "DIV")
lexer.AddRule("=", "ASSIGN")
lexer.AddRule("#[^\\r\\n]*", "COMMENT")
lexer.AddIgnores("[ \\f\\t]+")
NewLexer
Instantiate the lexerAddRule
Add a matching rule with a regular expression as the parameter and the corresponding type nameAddIgnores
Use to set the content of characters to be ignored
Build and loop match
lexer.Build(code) for true { token := lexer.NextToken() if (token.TType ! = BKLexer.TOKEN_TYPE_EOF) { fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n", token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col) } if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) { break } }
Using the Build method, Build with code as an argument, then loop through the nextToken method to get the NextToken and print the relevant information. It is important to note that the type of Token should be checked to determine whether it is EOF or ERROR to terminate.
The running results are as follows
Test Code: Declare variables = PI * 1024 * 100 - the fda 4 * 3.14 # # # 123 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the NAME "statement" t3 0, 0, 0 the NAME "variable" 7 0, 3 t3 Assign "=" T8 14 0,6 NAME "PI" T3 16 0,8 MUL "*" T6 19 0,11 INT "100" T2 21 0,13 MINUS "-" T5 25 0,17 NAME "FDA" T3 27 0,19 NEWLINE "\n" t0 30 0,22 INT "1024" t2 31 1,0 MUL "*" t6 36 1,5 INT "4" t2 38 1,7 MUL "*" t6 40 1,9 FLOAT "3.14" t1 42 1,11 COMMENT "### "t9 47 1,16 NEWLINE "\n" t0 51 1,20 INT" t2 52 2,0
The next section, “Recursive Down Algorithm Realizes CALC”, welcome to pay attention.