“This is the first day of my participation in the First Challenge 2022. For details: First Challenge 2022.”
antlr4
Antlr is an open source parser that automatically generates and visually displays a syntax tree based on input. ANTLR — Another Tool for Language Recognition, formerly known as PCCTS, provides a syntactic description to automatically construct custom languages recognizer for languages including Java, C++, and C#. Parser and Translator framework.
Antlr is a powerful cross-language syntax parser that can be used to read, process, execute, or translate structured text or binary files
Antlr runs the process
- Lexical analysis (receive text, source code, output token stream, and generate symbol table)
- Parsing (receiving token stream and generating syntax tree)
- Semantic analysis (converting syntax trees into intermediate code that can be executed by the CPU)
- Interpreter interpretation (calling the host language, or executing code directly from the virtual machine)
Install the antlr
Open official website anttr4
For macOS, execute the following code
$$sudo CD/usr/local/lib curl -o https://www.antlr.org/download/antlr-4.9.2-complete.jar $export CLASSPATH=".:/usr/local/lib/antlr-4.9.2-complete.jar:$CLASSPATH" $alias antlr4=' java-jar Jar '$alias grun=' Java org.antlr.v4.4.gui.testrig'Copy the code
If you’re a Windows user
- Download ANTLr4 here is the download address
- Add antlR-4.9.2-complete. jar to the environment variable
After the installation can be verified by antlr4 command
Initialize the TS project
Here we write antlr4 projects in typescript
- Create A new directory, let’s call the language A, and initialize NPM
mkdir ALang
cd ALang
npm init -y
Copy the code
- Install ANTlr4ts for parsing G4 syntax files,antlr4ts- CLI as package manager
yarn add antlr4ts
yarn add -D antlr4ts-cli
Copy the code
- Create a new syntax file directory. Here I put the syntax file in a new directory
ALang>src>antlr>ALang.g4
Copy the code
- Set up the package.json startup script, which is meant to parse the alang. g4 file using antLr4TS visitor mode
"scripts": {
"antlr4ts": "antlr4ts -visitor src/antlr/ALang.g4"
}
Copy the code
- Create an entry file app.ts
touch app.ts
Copy the code
- Parsing the G4 file generates an associated TS file in the SRC/ANTlr directory
npm run antlr4ts
Copy the code
The initialization task is done, and it’s time to start writing the code
Grammar file
grammar Alang; prog: stat+ ; / / -- -- -- -- -- -- -- -- -- -- -- -- -- to each alternative branches tagging stat: expr NEWLINE # printExpr | ID '=' expr NEWLINE # assign | NEWLINE # blank; expr: expr MUL expr # Multiplication | expr ADD expr # Addition | expr DIV expr # Division | expr SUB expr # Subtraction | INT # int | ID # id | BooleanLiteral # BooleanExpr | '(' expr ')' # parens ; // ------------- sets the name for the operation symbol, also forming the lexical symbol MUL: '*'; DIV : '/' ; ADD : '+' ; SUB : '-' ; BooleanLiteral: 'true' | 'false'; / / -- -- -- -- -- -- -- -- -- -- -- -- -- the rest is the same as before the lexical symbol ID: [\ u4e00 - \ u9fa5_a - zA - Z] +; // Identifier: one or more letters INT: [0-9]+; NEWLINE:'\r'? '\n' ; // newline WS: [\t]+ -> skip; // Skip Spaces and tabsCopy the code
- Grammar Alang(used to declare that the current language is called Alang)
- Prog: stat+ (prog is used in app.ts)
- Stat and expr are both used to declare syntax, specifying how the language is written
- At the bottom of the file is the lexical file, which is a description of the words, for example MUL stands for “*” multiplication sign, and angLR parses it to generate the corresponding symbol table
Entrance to the file
Here, the code needs to be operated in sequence according to antLR execution order:
- Lexical analysis (receive text, source code, output token stream, and generate symbol table)
- Parsing (receiving token stream and generating syntax tree)
- Semantic analysis (converting syntax trees into intermediate code that can be executed by the CPU)
- Interpreter interpretation (calling the host language, or executing code directly from the virtual machine)
The complete code is as follows
import {
ANTLRInputStream, BufferedTokenStream, CharStream, CommonTokenStream
} from "antlr4ts";
import { ALangLexer } from "./antlr/ALangLexer";
import { ALangParser } from "./antlr/ALangParser";
// Convert text to tokens and generate a symbol table
let inputStream: CharStream = new ANTLRInputStream("a=1+2\nb=a*2+1\nc=a*3+2*b\n");
// lexical analysis
let lexer: ALangLexer = new ALangLexer(inputStream);
// Generate the token stream
let tokenStream: BufferedTokenStream = new CommonTokenStream(lexer);
// Receives tokens and generates a syntax tree
let parser = new ALangParser(tokenStream);
// Execute the parser
let tree = parser.prog();
Copy the code
Implement visitor specific methods
In the usable code generated by antLR tool, we have used two files, the first one is ALangLexer, the second one is ALangParser, and there are two files we haven’t used, which are ALangListener and ALangVisitor
Here we use ALangVisitor, a visitor pattern that is more suitable for the current traversal of the tree structure
Open the ALangVisitor file to see its source code
// Generated from SRC /antlr/ alang. g4 by antlr 4.9.0-SNAPSHOT import {ParseTreeVisitor} from // Generated from SRC /antlr/ alang. g4 by antlr 4.9.0-SNAPSHOT import {ParseTreeVisitor} from "antlr4ts/tree/ParseTreeVisitor"; import { PrintExprContext } from "./ALangParser"; import { AssignContext } from "./ALangParser"; import { BlankContext } from "./ALangParser"; import { MultiplicationContext } from "./ALangParser"; import { AdditionContext } from "./ALangParser"; import { DivisionContext } from "./ALangParser"; import { SubtractionContext } from "./ALangParser"; import { IntContext } from "./ALangParser"; import { IdContext } from "./ALangParser"; import { BooleanExprContext } from "./ALangParser"; import { ParensContext } from "./ALangParser"; import { ProgContext } from "./ALangParser"; import { StatContext } from "./ALangParser"; import { ExprContext } from "./ALangParser"; export interface ALangVisitor<Result> extends ParseTreeVisitor<Result> { visitPrintExpr? : (ctx: PrintExprContext) => Result; visitAssign? : (ctx: AssignContext) => Result; visitBlank? : (ctx: BlankContext) => Result; visitMultiplication? : (ctx: MultiplicationContext) => Result; visitAddition? : (ctx: AdditionContext) => Result; visitDivision? : (ctx: DivisionContext) => Result; visitSubtraction? : (ctx: SubtractionContext) => Result; visitInt? : (ctx: IntContext) => Result; visitId? : (ctx: IdContext) => Result; visitBooleanExpr? : (ctx: BooleanExprContext) => Result; visitParens? : (ctx: ParensContext) => Result; visitProg? : (ctx: ProgContext) => Result; visitStat? : (ctx: StatContext) => Result; visitExpr? : (ctx: ExprContext) => Result; }Copy the code
The source code is a bunch of interface classes that need to be implemented, so we need to implement these methods. Create a new ALangBaseVisitor. Ts file to implement the above interface classes
import { AbstractParseTreeVisitor } from "antlr4ts/tree"; import { ALangVisitor } from "./antlr/ALangVisitor"; export default class ALangBaseVisitor extends AbstractParseTreeVisitor<number> implements ALangVisitor<number>{ protected defaultResult(): number { throw new Error("Method not implemented."); }}Copy the code
- The visitPrintExpr method, which calls the expression recursively, gets the final value, and prints out the text
visitPrintExpr(ctx: PrintExprContext) {
const value: number = this.visit(ctx.expr());
const exprString: string = ctx.expr().text;
console.log(exprString+":"+value.toString());
return value;
}
Copy the code
2. VisitAssign assignment statement
- We need to get the name of the variable to be assigned
- Get the value assigned to the drug
- The calculated value is stored in memory for later calculation
- After the calculation, you need to clear the memory
visitAssign(ctx: AssignContext) {
const id: string = ctx.ID().text;
const value: number = this.visit(ctx.expr());
this.memory[id]=value;
return value;
}
Copy the code
- Visitsign multiplication expression, it is necessary to pay attention to the multiplication expression is on both sides of the expression, the need to perform recursion on both sides of the expression visitAddition, visitDivision, visitSubtraction are the same
visitMultiplication(ctx: MultiplicationContext){
const left: number = this.visit(ctx.expr(0));
const right: number = this.visit(ctx.expr(1));
return left*right;
};
Copy the code
- VisitInt In the syntax declaration file, writing a single number also defaults to an expression statement, and I return it as it is
visitInt (ctx: IntContext){
return parseInt(ctx.INT().text);
};
Copy the code
- When visitId accesses an Id, it’s actually a value of Id, so it reads the value of Id from the cache and returns, or if it doesn’t get it, returns 0
visitId (ctx: IdContext){ const id: string = ctx.ID().text; if(this.memory[id]! =null){ return this.memory[id] } return 0; };Copy the code
- VisitParens parenthesis expression, just need to recurse to the statement in parentheses
visitParens(ctx: ParensContext){
return this.visit(ctx.expr());
};
Copy the code
Now that the visitor has been created, we can use the visitor in the entry file to access the contents of the previously generated syntax tree and execute the code in the visitor to get the result
app.ts
import { ANTLRInputStream, BufferedTokenStream, CharStream, CommonTokenStream } from "antlr4ts"; import ALangBaseVisitor from "./ALangBaseVisitor"; import { ALangLexer } from "./antlr/ALangLexer"; import { ALangParser } from "./antlr/ALangParser"; let inputStream: CharStream = new ANTLRInputStream("a=1+2\nb=a*2+1\nc=a*3+2*b\n"); let lexer: ALangLexer = new ALangLexer(inputStream); let tokenStream: BufferedTokenStream = new CommonTokenStream(lexer); let parser = new ALangParser(tokenStream); let tree = parser.prog(); const exprBaseVisitor: ALangBaseVisitor = new ALangBaseVisitor(); const result: number = exprBaseVisitor.visit(tree); Console. log(" result: ",result);Copy the code
As expected!
This demo is a very small ANTLR language parsing example, using the visitor pattern to generate the syntax tree recursive access, do the syntax tree very small invasion.
The source code:
Java version: github.com/chesongsong…
Typescript version: github.com/chesongsong…
About me
WeChat: cjs764901388
Public account: XSTxoo
My official account: Komatsu student oh
You can follow me, learn front-end knowledge together, and like to record where technology is used in life