1. The background
In the front-end development, usually mentioned syntax parsing and other functions, are generally responsible for providing interface by the back-end, front-end call; Or if it is executed, it is thrown directly to the server. But in some special cases, such as the use of the editor, often need to have some error reminder, automatic completion and other functions; Although there are ready-made editors on the market that can be used directly, in some special or complex business scenarios, these editors are not able to meet our needs, so we need to customize the development at this time, such as the SQL editor we need to use in our business.
2. Meet Antlr4
Antlr4 profile
Antlr4 is a powerful parser generation Tool that can be used to read, process, execute and translate structured text or binary files. Antlr4 generates a parser that includes a lexical parser that converts input code character sequences into Token sequences and a syntax parser that converts Token sequences into syntax trees.
Antlr4 installation
To install ANTLR V4, you can refer to Getting Started with ANTLR V4 on Github. There are no details here. Here are just a few of the installation steps for macOS:
-
** Prepare the Java environment and install the Java SDK. ** Can be downloaded and installed directly from the official website; Oracle.com/java/techno…
-
Download the antlr package
$ cd /usr/local/lib
$The curl - O https://www.antlr.org/download/antlr-4.7.1-complete.jar
Copy the code
- Add the installation package to
CLASSPATH
:
$ export CLASSPATH=". : / usr/local/lib/antlr - 4.7.1 - complete. Jar:$CLASSPATH"
Copy the code
- Create ANTLR Tool and TestRig aliases
$ alias antlr4='Java -xmx500m -cp "/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH" org.antlr.v4.4.tool'
$ alias grun='Java -xmx500m -cp "/usr/local/lib/antlr-4.7.1-complete.jar:$CLASSPATH" org.antlr.v4.4.gui.testrig'
Copy the code
- Verify that the installation is correct
$ java org.antlr.v4.Tool
Copy the code
3. Procedure
After setting up the environment, we can basically follow the following three steps to use our ANTLr4.
- Custom G4 syntax files
ANTLR4’s grammar rules are divided into Lexer rules and Parser rules. The lexical rules define how to convert code string sequences into token sequences. Syntax rules define how to convert a sequence of tags into a syntax tree. In general, the rule name of a lexical rule is named with a capital letter, while the rule name of a grammatical rule begins with a lowercase letter. ANTLR4 syntax definitions for major languages can be found in the syntax repository.
- ANTLR 4 generates Lexer and Parser code for target programming languages, including Java, JavaScript, Python, C, C++, etc.
- Iterating through AST, ANTLR 4 supports two modes: Visitor and Listener
4. Implement the DSQL editor
What is DSQL?
Cross-database query (DSQL) provides timely associative query services for online heterogeneous data sources in different environments. No matter whether the database is MySQL, SQLServer, PostgreSQL, or Redis, or in which region or environment the database instance is deployed, associated query can be implemented between these databases through a SINGLE SQL. Interactive experience can be found at dms.aliyun.com/ : help.aliyun.com/document_de…
Next, the SQL editor in DSQL is used as an example to introduce the specific implementation of ANTLR4 in detail. Let’s take a look at the final implementation:
Write a G4 file
Because the file is too long, I won’t show it here; The file is named sqlbase.g4
grammar SqlBase;
tokens {
DELIMITER
}
singleStatement
: statement EOF
;
singleExpression
: expression EOF
;
statement
: query #statementDefault
| USE schema=identifier #use
| USE catalog=identifier '. ' schema=identifier #use
| CREATE SCHEMA (IF NOT EXISTS)? qualifiedName
....
Copy the code
Java syntax tree generation
Run in the sqlbase.g4 directory$ antlr4 SqlBase.g4
To generate lexer, parse, and Java parsers (note that the antlr4 command is equivalent to org.antlr.v4.tool)Compile the Java program and run it in the same directory$ javac SqlBase*.java
A bunch of compiled class files are generated, and we can enter the corresponding DSQL syntax to check that our syntax tree is generated correctly. First, we enter the correct SQL as follows:
$ grun SqlBase statement -tree
(Now enter some SQL like below)
SELECT * FROM `adb_mysql_dblink`.`adb_mysql_1124qie`.`courses`
(now,do:)
^D
(The output:)
((statement (query (queryNoWith (queryTerm (queryPrimary (querySpecification SELECT (selectItem *) FROM (relation (sampledRelation (aliasedRelation (relationPrimary (qualifiedName (identifier `adb_mysql_dblink`) . (identifier `adb_mysql_1124qie`) . (identifier `courses`)))))))))))))
// gui
$ grun SqlBase statement -gui
SELECT * FROM `adb_mysql_dblink`.`adb_mysql_1124qie`.`courses`
^D
Copy the code
GUI syntax tree
What if our SQL does not conform to the specification?
$ grun SqlBase statement -gui
(Now enter some SQL like below)
SELECT * FROM aa where id = 1
(now,do:)
^D
(The output:)
line 1:14 mismatched input 'a' expecting {'(', 'ADD', 'ALL', 'ANALYZE', 'ANY', 'ARRAY', 'ASC', 'AT', 'BERNOULLI', 'CALL', 'CASCADE', 'CATALOGS', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMMITTED', 'CURRENT', 'DATA', 'DATE', 'DAY', 'DESC', 'DISTRIBUTED', 'EXCLUDING', 'EXPLAIN', 'FILTER', 'FIRST', 'FOLLOWING', 'FORMAT', 'FUNCTIONS', 'GRANT', 'GRANTS', 'GRAPHVIZ', 'HOUR', 'IF', 'INCLUDING', 'INPUT', 'INTERVAL', 'ISOLATION', 'LAST', 'LATERAL', 'LEVEL', 'LIMIT', 'LOGICAL', 'MAP', 'MINUTE', 'MONTH', 'NFC', 'NFD', 'NFKC', 'NFKD', 'NO', 'NULLIF', 'NULLS', 'ONLY', 'OPTION', 'ORDINALITY', 'OUTPUT', 'OVER', 'PARTITION', 'PARTITIONS', 'POSITION', 'PRECEDING', 'PRIVILEGES', 'PROPERTIES', 'PUBLIC', 'RANGE', 'READ', 'RENAME', 'REPEATABLE', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'ROLLBACK', 'ROW', 'ROWS', 'SCHEMA', 'SCHEMAS', 'SECOND', 'SERIALIZABLE', 'SESSION', 'SET', 'SETS', 'SHOW', 'SOME', 'START', 'STATS', 'SUBSTRING', 'SYSTEM', 'TABLES', 'TABLESAMPLE', 'TEXT', 'TIME', 'TIMESTAMP', 'TO', 'TRANSACTION', 'TRY_CAST', 'TYPE', 'UNBOUNDED', 'UNCOMMITTED', 'UNNEST', 'USE', 'VALIDATE', 'VERBOSE', 'VIEW', 'WORK', 'WRITE', 'YEAR', 'ZONE', IDENTIFIER, DIGIT_IDENTIFIER, BACKQUOTED_IDENTIFIER}
Copy the code
An incorrect AST tree was generated` `SELECT * FROM aa where id = 1
The FROM keyword must be followed by a symbol or keyword in parentheses. The syntax gives an error.
Through the above command line, we can write the G4 file to do the test, of course, you can also use Idea antLr4 plug-in to generate view, here will not tell, we can try.
Generate JS lexical and syntax parsers
Finally to the focus of our article, how to generate parsing files in the front end, the use of the front end is actually very simple, you can see the official website tutorial github.com/antlr/antlr… What I’m going to focus on here is how do you use, run the following command
$ antlr4 -Dlanguage=JavaScript MyGrammar.g4
Copy the code
The corresponding resolution file is as follows
Now you can code to generate the ParseTree syntax tree
/* eslint-disable react-hooks/rules-of-hooks */
import { SqlBaseLexer } from './antlr4/SqlBaseLexer';
import { SqlBaseParser } from './antlr4/SqlBaseParser';
var SqlBaseListener = require('./antlr4/SqlBaseListener').SqlBaseListener;
var antlr4 = require('antlr4');
function ParseTree = (sql) = >{
// sql = "SELECT * FROM `adb_mysql_dblink`.`adb_mysql_1124qie`.`courses`"
const chars = new antlr4.InputStream(sql);
const lexer = new SqlBaseLexer(chars);
const tokens = new antlr4.CommonTokenStream(lexer);
const parser = new SqlBaseParser(tokens);
parser.buildParseTrees = true;
const tree = parser.statement();
const walker = new tree.ParseTreeWalker();
// Custom Listener mode
const extractor = new DsqlListener({
enterAliasedRelation: this.enterAliasedRelation, // this. EnterAliasedRelatio is the specific business logic
enterQualifiedName: this.enterQualifiedName,
});
walker.walk(extractor, tree);
}
Copy the code
Two ways to access ParseTree
- The Listener pattern
1) The Listener mode is automatically called by the Walker object provided by ANTLR; Different methods of the supplied listener are called when different nodes are encountered. 2) The Listener mode returns no value, only some variables can be used to store intermediate values. 3) The Listener mode traverses the entire tree
- The Visitor pattern
1) The visitor needs to specify itself to visit a particular type of node. During use, only the visit method needs to be implemented on the node of interest. 2) The visitor pattern can be customized to return values
According to our business scenario, we choose the Listener mode, we define our own Listener DsqlListener, part of the code as follows:
class DsqlListener extends SqlBaseListener {
constructor(opts) {
super(a);this.configs = opts || {};
}
enterQualifiedName(ctx) {
const { enterQualifiedName } = this.configs;
setFunction(enterQualifiedName, ctx);
}
enterAliasedRelation(ctx) {
const { enterAliasedRelation } = this.configs; setFunction(enterAliasedRelation, ctx); }}function setFunction(fun, params) {
if (fun && typeof fun === 'function') { fun(params); }}export default DsqlListener;
Copy the code
Note: The enterQualifiedName and enterAliasedRelation method names here are generated when we specify the G4 file.
End of the 5.
The above is the general idea that we use ANTLR4 to implement DSQL editor syntax intelligent prompt, want to know more internal implementation welcome to join us ~
Reference: github.com/antlr/gramm… Github.com/antlr/antlr… Antlr4 Definitive Guide