Use Bklexer for lexical analysis

A few days ago I wrapped the lexer and named Bklexer. Currently Bklexer supports Go/C++/Python respectively.

The code is in the GitHub project: click on the project page

Learn from each version of the try_lexer code. Take Go as an example:

Package main import (" FMT ""strconv" "./bklexer") func main() {fmt.println ("Test Code:") Code: =" declare variable = PI * 100 - The fda \ n1024 * 4 * 3.14 # # # \ n123 "FMT. Println (code) FMT. Println (" -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --") lexer: = BKLexer.NewLexer() lexer.AddRule("\\d+\\.\\d*", "FLOAT") lexer.AddRule("\\d+", "INT") lexer.AddRule("[\\p{L}\\d_]+", "NAME") lexer.AddRule("\\+", "PLUS") lexer.AddRule("\\-", "MINUS") lexer.AddRule("\\*", "MUL") lexer.AddRule("/", "DIV") lexer.AddRule("=", "ASSIGN") lexer.AddRule("#[^\\r\\n]*", "COMMENT") lexer.AddIgnores("[ \\f\\t]+") lexer.Build(code) for true { token := lexer.NextToken() if (token.TType ! = BKLexer.TOKEN_TYPE_EOF) { fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n", token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col) } if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) { break } } }

Packages including Bklexer were introduced first

import (
    "fmt"
    "strconv"
    "./bklexer"
)

FMT is used for printout
Strconv is used to optimize the display of literals
./bklexer introduces the bklexer package

Instantiate the lexer and set the rules

lexer := BKLexer.NewLexer()
lexer.AddRule("\\d+\\.\\d*", "FLOAT")
lexer.AddRule("\\d+", "INT")
lexer.AddRule("[\\p{L}\\d_]+", "NAME")
lexer.AddRule("\\+", "PLUS")
lexer.AddRule("\\-", "MINUS")
lexer.AddRule("\\*", "MUL")
lexer.AddRule("/", "DIV")
lexer.AddRule("=", "ASSIGN")
lexer.AddRule("#[^\\r\\n]*", "COMMENT")
lexer.AddIgnores("[ \\f\\t]+")

NewLexerInstantiate the lexer
AddRuleAdd a matching rule with a regular expression as the parameter and the corresponding type name
AddIgnoresUse to set the content of characters to be ignored

Build and loop match

lexer.Build(code) for true { token := lexer.NextToken() if (token.TType ! = BKLexer.TOKEN_TYPE_EOF) { fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n", token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col) } if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) { break } }

Using the Build method, Build with code as an argument, then loop through the nextToken method to get the NextToken and print the relevant information. It is important to note that the type of Token should be checked to determine whether it is EOF or ERROR to terminate.

The running results are as follows

Test Code: Declare variables = PI * 1024 * 100 - the fda 4 * 3.14 # # # 123 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the NAME "statement" t3 0, 0, 0 the NAME "variable" 7 0, 3 t3 Assign "=" T8 14 0,6 NAME "PI" T3 16 0,8 MUL "*" T6 19 0,11 INT "100" T2 21 0,13 MINUS "-" T5 25 0,17 NAME "FDA" T3 27 0,19 NEWLINE "\n" t0 30 0,22 INT "1024" t2 31 1,0 MUL "*" t6 36 1,5 INT "4" t2 38 1,7 MUL "*" t6 40 1,9 FLOAT "3.14" t1 42 1,11 COMMENT "### "t9 47 1,16 NEWLINE "\n" t0 51 1,20 INT" t2 52 2,0

The next section, “Recursive Down Algorithm Realizes CALC”, welcome to pay attention.

Packages including Bklexer were introduced first

Instantiate the lexer and set the rules

Build and loop match

The running results are as follows

Related Posts

Python Lambda functions in detail, the use of recommended collection!

Are programmers obsolete at 35? Watch me shoot down the biggest rumor on the Internet! (depth of good article, suggested collection!)

Anaconda PyCharm configuration