Formula calculation (or expression calculation) is usually applied in low code, script plug-ins and other functions. The advantage is that the business logic of software can be customized without the participation of professional programmers, which reduces the development cost of software and meets diversified needs. There are some excellent formula calculation engines such as Aviotor, Power Fx, etc.
Review of compilation principle
The principles of formula calculation are essentially the same as those of programming languages, or more precisely, interpreted languages such as JavaScript, Perl, Python, etc., which are also simple scripting languages. It just strips away most of the syntactic functionality of normal programming languages, making it easier for non-specialists to learn and use. So developing a formula calculation engine still requires some knowledge of compilation principles, so let’s review the compilation principles before doing so.
Crafting Interpreters is a tutorial written by Robert Nystrom (currently working on the Google Dart team). It’s a great idea to get into Crafting Interpreters. The author explains the process of compiling and running a programming language step by step. The result is a new programming language that makes it easier for developers to understand the theory of how compilation works. This section is basically a short summary overview of the Crafting Interpreters section.
The compilation process for modern programming languages is usually divided into two parts: Front-end compilation compiles the source code into intermediate code IR, and back-end compilation compiles the intermediate code IR into machine code of the target platform. In this way, we can reuse each part of the function. For example, if we want to add a compiler for a language, we can only develop the front-end compiler for that language. The compiled intermediate code can be passed directly to the backend compiler of the existing platform, without redeveloping the entire compilation process. The same is true for a new target platform compiler, which can be developed only for the target platform. For formula calculation engine, it is mainly the development of front-end compilation. The front-end compilation process roughly has the following steps:
- Participles scanning
- Syntax parsing
- Static semantic analysis
- Source code optimization
- Generative syntax tree
1. Segmentation scan
Lexical scanning, also known as Lexical Analysis, is the process of segmenting source code into Token by Token:
The word segmentation scan removes some useless characters and keeps a clean and meaningful card sequence, such as the one generated by the code in the above image:
["var", "average", "=", "(", "min", "+", "max", ")", "/", "2", ";"]
Copy the code
The order of the characters in the sequence is the same as the reading direction from left to right and top to bottom in the source code.
2. Grammar analysis
After the word segmentation scan is completed in the previous step, the generated token sequence can be parsed. The result of parsing will construct a Parse Tree or Abstract Syntax Tree:
3. Static semantic analysis
In the phase of static semantic analysis, the meaning of the syntax tree generated in the previous phase needs to be determined. For example, the expression: A +b, in the previous step we can distinguish a, +, b, but do not know what a and B represent, are local variables? Global variables? Where are they defined? And so on.
If it is a statically typed language, you can determine the type of the variable to determine whether it can be used in the current expression. If not, a type error will be thrown.
The definition, type and other information of the above variable declarations need a storage location for easy lookup and judgment during semantic analysis:
- Typically, this information can be stored on a syntactic number node as Attributes, which initially have no values and are gradually populated with information by Parse.
- On the other hand, a Symbol Table can be established to store the definition information of variables. When a variable is used in an expression, the type and scope of the variable definition can be determined by searching the Symbol Table.
4. Source code optimization
If we can replace one piece of code with another that has the same semantics but is more efficient, this step is optimization.
Such as the following source code:
pennyArea = 3.14159 * (0.75 / 2) * (0.75 / 2);
Copy the code
After optimization, it can be replaced with the following code:
pennyArea = 0.4417860938;
Copy the code
5. Generate a syntax tree
After optimizing the source code, a new syntax tree is generated. Abstract syntax tree (AST) is widely used at present.
Dart Analyzer code analysis
Dart’s official code analysis tool analyzer provides complete lexical, syntactic, semantic analysis and syntax tree generation, so first analyze the code implementation of Analyzer. The code is in the Dart SDK project directory PKG /_fe_analyzer_shared and PKG/Analyzer.
1. Segmentation scan
This part mainly looks at two parts of the code, one is the definition of Token, the other is the implementation logic of Scan, the corresponding code files are respectively:
pkg/_fe_analyzer_shared/lib/src/scanner/token.dart
pkg/_fe_analyzer_shared/lib/src/scanner/abstract_scanner.dart
To illustrate some of the core code, let’s first look at the Token definition:
abstract class Token implements SyntacticEntity {
/**
* Initialize a newly created token to have thegiven [type] and [offset]. */
factory Token(TokenType type, int offset, [CommentToken? preceedingComment]) =
SimpleToken;
/ * *
* Initialize a newly created end-of-file token to have the given [offset].
* /
factory Token.eof(int offset, [CommentToken? precedingComments]) {
Token eof = new SimpleToken(TokenType.EOF, offset, precedingComments);
// EOF points to itself so there's always infinite look-ahead.
eof.previous = eof;
eof.next = eof;
return eof;
}
/**
* The number of characters parsed by this token. */
int get charCount;
/ * *
* The character offset of the start of this token within the source text.
* /
int get charOffset;
/**
* The character offset of the end of this token within the source text. */
int get charEnd;
@override
int get end;
/ * *
* The token that corresponds to this token, or `null` if this token is not
* the first of a pair of matching tokens (such as parentheses).
* /
Token? get endGroup => null;
/**
* Return `true` if this token represents an end of file. */
bool get isEof;
@override
int get length;
/ * *
* Return the lexeme that represents this token.
*
* For [StringToken]s the [lexeme] includes thequotes, explicit escapes, etc. */
String get lexeme;
/**
* Return the next token in thetoken stream. */
Token? get next;
/ * *
* Return the next token in the token stream.
* /
void set next(Token? next);
@override
int get offset;
/**
* Set the offset from the beginning of the file to the first character in * the token to the given [offset].
* /
void set offset(int offset);
/** * Return the previous token in the token stream. */
Token? get previous;
/ * * * Return the type of the token.
* /
TokenType gettype; . }Copy the code
Part of the Token definition code is captured above. The attributes you need to pay attention to are:
lexeme
:Token
The character content of;offset
和charOffset
: the currentToken
Location in the source code,offset
Fields andcharOffset
Field, the two values are the same,offset
Mainly for implementationSyntacticEntity
Interface method;isEof
: Is it the endToken
;previous
: the currentToken
Former oneToken
;next
: the currentToken
The nextToken
;
It can be seen from the next and Previous fields that the Token sequence after word segmentation in the Analyzer is organized in a linked list. Such a Token sequence will have a starting Token and an end Token, and usually the end Token must be isEof = true.
Here is the code implementation of the scanning part:
abstract class AbstractScanner implements Scanner {... Token tokenize() {while(! atEndOfFile()) {int next = advance();
// Scan the header looking for a language version
if(! identical(next, $EOF)) { Token oldTail = tail; next = bigHeaderSwitch(next);if(! identical(next, $EOF) && tail.kind == SCRIPT_TOKEN) { oldTail = tail; next = bigHeaderSwitch(next); }while (!identical(next, $EOF) && tail == oldTail) {
next = bigHeaderSwitch(next);
}
}
while(! identical(next, $EOF)) { next = bigSwitch(next); }if (atEndOfFile()) {
appendEofToken();
} else{ unexpectedEof(); }}// Always pretend that there's a line at the end of the file.
lineStarts.add(stringOffset + 1);
returnfirstToken(); }...int bigSwitch(int next) {
beginToken();
if (identical(next, $SPACE) ||
identical(next, $TAB) ||
identical(next, $LF) ||
identical(next, $CR)) {
appendWhiteSpace(next);
next = advance();
// Sequences of spaces are common, so advance through them fast.
while (identical(next, $SPACE)) {
// We don't invoke [:appendWhiteSpace(next):] here for efficiency,
// assuming that it does not do anything for space characters.
next = advance();
}
return next;
}
int nextLower = next | 0x20;
if ($a <= nextLower && nextLower <= $z) {
if (identical($r, next)) {
return tokenizeRawStringKeywordOrIdentifier(next);
}
return tokenizeKeywordOrIdentifier(next, /* allowDollar = */ true);
}
if (identical(next, $CLOSE_PAREN)) {
return appendEndGroup(TokenType.CLOSE_PAREN, OPEN_PAREN_TOKEN);
}
if (identical(next, $OPEN_PAREN)) {
appendBeginGroup(TokenType.OPEN_PAREN);
return advance();
}
if (identical(next, $SEMICOLON)) {
appendPrecedenceToken(TokenType.SEMICOLON);
// Type parameters and arguments cannot contain semicolon.
discardOpenLt();
return advance();
}
if (identical(next, $PERIOD)) {
return tokenizeDotsOrNumber(next);
}
if (identical(next, $COMMA)) {
appendPrecedenceToken(TokenType.COMMA);
return advance();
}
if (identical(next, $EQ)) {
return tokenizeEquals(next);
}
if (identical(next, $CLOSE_CURLY_BRACKET)) {
return appendEndGroup(
TokenType.CLOSE_CURLY_BRACKET, OPEN_CURLY_BRACKET_TOKEN);
}
if (identical(next, $SLASH)) {
return tokenizeSlashOrComment(next);
}
if (identical(next, $OPEN_CURLY_BRACKET)) {
appendBeginGroup(TokenType.OPEN_CURLY_BRACKET);
return advance();
}
if (identical(next, $DQ) || identical(next, $SQ)) {
return tokenizeString(next, scanOffset, /* raw = */ false);
}
if (identical(next, $_)) {
return tokenizeKeywordOrIdentifier(next, /* allowDollar = */ true);
}
if (identical(next, $COLON)) {
appendPrecedenceToken(TokenType.COLON);
return advance();
}
if (identical(next, $LT)) {
return tokenizeLessThan(next);
}
if (identical(next, $GT)) {
return tokenizeGreaterThan(next);
}
if (identical(next, $BANG)) {
return tokenizeExclamation(next);
}
if (identical(next, $OPEN_SQUARE_BRACKET)) {
return tokenizeOpenSquareBracket(next);
}
if (identical(next, $CLOSE_SQUARE_BRACKET)) {
return appendEndGroup(
TokenType.CLOSE_SQUARE_BRACKET, OPEN_SQUARE_BRACKET_TOKEN);
}
if (identical(next, $AT)) {
return tokenizeAt(next);
}
if (next >= $1 && next <= $9) {
return tokenizeNumber(next);
}
if (identical(next, $AMPERSAND)) {
return tokenizeAmpersand(next);
}
if (identical(next, $0)) {
return tokenizeHexOrNumber(next);
}
if (identical(next, $QUESTION)) {
return tokenizeQuestion(next);
}
if (identical(next, $BAR)) {
return tokenizeBar(next);
}
if (identical(next, $PLUS)) {
return tokenizePlus(next);
}
if (identical(next, $$)) {
return tokenizeKeywordOrIdentifier(next, /* allowDollar = */ true);
}
if (identical(next, $MINUS)) {
return tokenizeMinus(next);
}
if (identical(next, $STAR)) {
return tokenizeMultiply(next);
}
if (identical(next, $CARET)) {
return tokenizeCaret(next);
}
if (identical(next, $TILDE)) {
return tokenizeTilde(next);
}
if (identical(next, $PERCENT)) {
return tokenizePercent(next);
}
if (identical(next, $BACKPING)) {
appendPrecedenceToken(TokenType.BACKPING);
return advance();
}
if (identical(next, $BACKSLASH)) {
appendPrecedenceToken(TokenType.BACKSLASH);
return advance();
}
if (identical(next, $HASH)) {
return tokenizeTag(next);
}
if (next < 0x1f) {
return unexpected(next);
}
next = currentAsUnicode(next);
returnunexpected(next); }}Copy the code
The entry method for scanning part is tokenize(). In this method, every character in the source code is scanned through a while loop, and then the Ascii code of the characters is judged in bigSwitch(). The characters that meet the word segmentation rules are put into a Token, and finally a Token chain is output.
2. Syntax/semantic analysis, syntax tree generation
After completing the lexical analysis, the following is the grammar/semantic analysis of the Token chain. The corresponding core code files are mainly as follows:
pkg/_fe_analyzer_shared/lib/src/parser/parser_impl.dart
pkg/analyzer/lib/src/dart/ast/ast.dart
pkg/analyzer/lib/src/dart/ast/ast_factory.dart
pkg/analyzer/lib/src/fasta/ast_builder.dart
The analysis generates an abstract syntax tree, whose nodes are defined in the ast.dart file. The entry method for analysis is the parseUnit() method in parser_impl.dart:
/// Parse a compilation unit.
///
/// This method is only invoked from outside the parser. As a result, this
/// method takes the next token to be consumed rather than the last consumed
/// token and returns the token after the last consumed token rather than the
/// last consumed token.
///
/// ` ` `
/// libraryDefinition:
/// scriptTag?
/// libraryName?
/// importOrExport*
/// partDirective*
/// topLevelDefinition*
/// ;
///
/// partDeclaration:
/// partHeader topLevelDefinition*
/// ;
/// ` ` `
Token parseUnit(Token token) {
...
while(! token.next! .isEof) {finalToken start = token.next! ; token = parseTopLevelDeclarationImpl(token, directiveState); listener.endTopLevelDeclaration(token.next!) ; count++;if (start == token.next!) {
// Recovery:
// If progress has not been made reaching the end of the token stream,
// then report an error and skip the current token.token = token.next! ; listener.beginMetadataStar(token); listener.endMetadataStar(/* count = */ 0); reportRecoverableErrorWithToken( token, codes.templateExpectedDeclaration); listener.handleInvalidTopLevelDeclaration(token); listener.endTopLevelDeclaration(token.next!) ; count++; }}... }Copy the code
The while loop is used to traverse the Token chain, analyze the syntactic meaning of each Token, and verify whether it is valid. If it is illegal, an Error exception is thrown. If it is valid, the relevant method in ast_build.dart is called to construct AST. Dart code definition:
/// A parser listener that builds the analyzer's AST structure.
class AstBuilder extends StackListener {... }Copy the code
AstBuilder inherits StackListener, which in turn inherits StackListener, which defines methods (in part) similar to the following:
/// A parser event listener that does nothing except throw exceptions
/// on parser errors.
///
/// Events are methods that begin with one of: `begin`.`end`, or `handle`.
///
/// Events starting with `begin` and `end` come in pairs. Normally, a
/// `beginFoo` event is followed by an `endFoo` event. There's a few exceptions
/// documented below.
///
/// Events starting with `handle` are used when isn't possible to have a begin
/// event.
class Listener implements UnescapeErrorListener {.../// Handle the beginning of a class declaration.
/// [begin] may be the same as [name], or may point to modifiers
/// (or extraneous modifiers in the case of recovery) preceding [name].
///
/// At this point we have parsed the name and type parameter declarations.
void beginClassDeclaration(
Token begin, Token? abstractToken, Token? macroToken, Token name) {}
/// Handle the end of a class declaration. Substructures:
/// - class header
/// - class body
void endClassDeclaration(Token beginToken, Token endToken) {
logEvent("ClassDeclaration"); }...void beginDoWhileStatement(Token token) {}
void endDoWhileStatement(
Token doKeyword, Token whileKeyword, Token endToken) {
logEvent("DoWhileStatement"); }.../// This method is invoked when parser finishes parsing the corresponding
/// expression of the expression function body.
void handleExpressionFunctionBody(Token arrowToken, Token? endToken) {
logEvent("ExpressionFunctionBody"); }... }Copy the code
The AstBuilder implements these methods in the Listener, and then constructs the AST node using the stack structure to get context information about the current syntax. This stack is defined in the StackListener:
abstract class StackListener extends Listener {...final Stack stack = debugStack ? new DebugStack() : new StackImpl();
void discard(int n) {
for (int i = 0; i < n; i++) { pop(); }}...void push(Object? node) {
if (node == null) {
internalProblem(
templateInternalProblemUnhandled.withArguments("null"."push"),
/* charOffset = */ - 1,
uri);
}
stack.push(node);
}
void pushIfNull(Token? tokenOrNull, NullValue nullValue) {
if (tokenOrNull == null) stack.push(nullValue);
}
Object? peek() => stack.isNotEmpty ? stack.last : null;
Object? pop([NullValue? nullValue]) {
return stack.pop(nullValue);
}
Object? popIfNotNull(Object? value) {
return value == null ? null: pop(); }... }Copy the code
To see what this stack does, take a look at the AstBuilder implementation example:
@override
void endDoWhileStatement(
Token doKeyword, Token whileKeyword, Token semicolon) {
assert(optional('do', doKeyword));
assert(optional('while', whileKeyword));
assert(optional('; ', semicolon));
debugEvent("DoWhileStatement");
var condition = pop() as ParenthesizedExpression;
var body = pop() as Statement;
push(ast.doStatement(
doKeyword,
body,
whileKeyword,
condition.leftParenthesis,
condition.expression,
condition.rightParenthesis,
semicolon));
}
Copy the code
The above example expresses the syntax node construction of a do-while:
do{
//body. }while(condition?)
Copy the code
When constructing the AST node of the do-while syntax, the Condition of the judgment Condition of the do-while and the content Body of the code block executed can be obtained by pushing two elements out of the stack. Only by using the information of these two nodes can the grammar node of the Do-while be constructed completely. When all nodes are constructed, the top of the stack becomes the root of the entire AST tree.
So how do you traverse these AST nodes next? Analyzer uses the visitor pattern, defined (in part) as:
abstract class AstVisitor<R> {... R? visitBinaryExpression(BinaryExpression node); R? visitBlock(Block node); R? visitBlockFunctionBody(BlockFunctionBody node); R? visitBreakStatement(BreakStatement node); R? visitClassDeclaration(ClassDeclaration node); . }Copy the code
Each AST node implements the Accept (AstVisitor) method:
/// A node in the AST structure for a Dart program.
///
/// Clients may not extend, implement or mix-in this class.
abstract class AstNode implements SyntacticEntity {.../// Use the given [visitor] to visit this node.
///
/// Return the value returned by the visitor as a result of visiting this
/// node.E? accept<E>(AstVisitor<E> visitor); . }Copy the code
By implementing the AstVisitor
interface method, create a Visitor instance and pass it as a parameter to the accept(AstVisitor) method on the root node, where each AST node can be iterated and processed. Here is a default Visitor (part) provided by Analyzer:
/// An AST visitor that will recursively visit all of the nodes in an AST
/// structure (like instances of the class [RecursiveAstVisitor]). In addition,
/// when a node of a specific type is visited not only will the visit method for
/// that specific type of node be invoked, but additional methods for the
/// superclasses of that node will also be invoked. For example, using an
/// instance of this class to visit a [Block] will cause the method [visitBlock]
/// to be invoked but will also cause the methods [visitStatement] and
/// [visitNode] to be subsequently invoked. This allows visitors to be written
/// that visit all statements without needing to override the visit method for
/// each of the specific subclasses of [Statement].
///
/// Subclasses that override a visit method must either invoke the overridden
/// visit method or explicitly invoke the more general visit method. Failure to
/// do so will cause the visit methods for superclasses of the node to not be
/// invoked and will cause the children of the visited node to not be visited.
///
/// Clients may extend this class.
class GeneralizingAstVisitor<R> implements AstVisitor<R> {
/// Initialize a newly created visitor.
constGeneralizingAstVisitor(); .@override
R? visitBinaryExpression(BinaryExpression node) => visitExpression(node);
@override
R? visitBlock(Block node) => visitStatement(node);
@override
R? visitBlockFunctionBody(BlockFunctionBody node) => visitFunctionBody(node);
@override
R? visitBreakStatement(BreakStatement node) => visitStatement(node);
@overrideR? visitClassDeclaration(ClassDeclaration node) => visitNamedCompilationUnitMember(node); . }Copy the code
Third, engine development
1. Formula calculation of functional requirements
Before development, we should first clarify the functional requirements of the formula calculation engine:
- Support four general operations, such as:
"1 + 2 times 3-4/5"
; - Support for built-in functions such as:
"MAX(1, 2, 3, 4)"
; - Support variables such as:
"$a$ + $b$ + 3"
; - Support assignment expressions, such as:
"$result$ = 1 + 2 * 3"
;
2. Engine flow chart
The “lexical, syntactic, and semantic” analysis of the flow chart has been introduced in the previous part. For the content of the AST Runtime, please refer to my previous series of articles, Thoughts and Practice on Flutter Dynamic thermal update (3) —- to parse the AST Runtime, you need to take the code from that part.
3. Analyzer code clipping
The core compilation part of the engine mainly uses Analyzer code. As formula calculation is a relatively simple class programming language, it does not need too many syntax functions, so the analyzer code is cut and extracted, and only the core key part of the code is used. The clipping idea is as follows:
- Extract the code of the word segmentation scan part and cut out the word segmentation scan of irrelevant grammar keywords, such as:
at
.is
$
Etc, while adding support for our own variable declarationsToken
:$abc$
; - Extract syntactic/semantic analysis part of the code
Parser
andAST
Constructor part codeAstBuilder
As well asAstNode
Node definition, clipping the classclass
And the functionfunction
Related to syntactic/semantic logic and responseAstNode
; - Realize one’s own
Visitor
In the traversalAstNode
When constructed customAST
Data, for incomingAstRuntime
In the implementation;
4. The Api encapsulation
Define apis open to the outside world:
// Run the formula expression and return the result
dynamic fx(String expression) {
}
///
/// Run the include variable declaration ($... $), and returns the result
/// Expression: formula expression
/// Envs: variable value object {}
///
dynamic fxWithEnvs(String expression, Map envs) {
}
///
/// Run the include variable declaration ($... $), and the result of the assignment is updated in`envs`In the
/// Expression: expression of the assignment formula, for example, $a.b$=1+2+3
/// Envs: variable value object {}
///
dynamic fxAssignment(String expression, Map envs,
{void Function(List<String>)? leftEnvFields}) {
}
Copy the code
5. JavaScript support
With Dart tool Dart 2JS, you can compile a JS library for use by front-end partners, all with the same set of code logic. Dart = jsfx.dart = jsfx.dart = jsfx.dart = jsfx.dart
import 'dart:js';
import 'dart:js' as js;
import 'dartfx_main.dart';
import 'util/jsvalue2dart.dart';
void main() {
js.context['jsfx'] = fx;
js.context['jsfxWithEnvs'] = jsfxWithEnvs;
js.context['jsfxAssignment'] = jsfxAssignment;
js.context['jsSetFunctionResolver'] = jsSetFunctionResolver;
}
dynamic jsSetFunctionResolver(JsFunction jsFunction) {
fxSetFunctionResolver((name, arguments) {
return jsFunction.apply([name, ...arguments]);
});
}
dynamic jsfxWithEnvs(String expression, JsObject envs) {
var envsMap = jsValueToDart(envs);
if (envsMap is Map) {
return fxWithEnvs(expression, envsMap);
} else {
return null; }}dynamic jsfxAssignment(String expression, JsObject envs) {
var envsMap = jsValueToDart(envs);
if (envsMap is Map) {
var obj = envs;
var valueKey = "";
var rightValue = fxAssignment(expression, envsMap, leftEnvFields: (fields) {
for (var i = 0; i < fields.length - 1; i++) {
obj = obj[fields[i]];
}
valueKey = fields.last;
});
obj[valueKey] = rightValue;
returnrightValue; }}Copy the code
Run the following command on the terminal:
dart2js -O2 --omit-implicit-checks -o js/fx.js lib/src/jsfx.dart
Copy the code
The resulting Fx.js is our final JS library. You can verify that the Api is exported and executed properly with a section of test code:
require('./d8');
require('./fx');
console.log(jsfx("1 + 2 * 3-2"));
Copy the code
The./d8 in the code above is a dependency to support js code execution at the terminal using NPM
Four,
I also learned a lot from developing and implementing such a formula calculation engine, especially the compilation principle part. The following part of development is not introduced too much, mainly about the ideas and some explanations, if you are interested in the project home page to see the code, you will have a better understanding. After the completion of the whole project, I felt it was interesting to compile a natural language into a language that the computer could read and run. After mastering this principle, I could compile any language into a computer language, such as the ancient Chinese programming language that was circulated on the Internet before. With all this enthusiasm, you also made a JSON editor project that allows you to check in real time that your JSON content is compliant with the specification. Finally, I hope this article will give you a better understanding of the compilation process, solve some problems in actual work projects, and enjoy compiling.
dartfx
json_editor