About the author:

Lei Ting is a front-end architect at Nobeltech in Beijing. He has been engaged in front-end development and architecture for 17 years. He is good at front-end development in the field of visualization and front-end communication

Introduction: This section is only for the production of literacy level popular science, suitable for two kinds of readers:

  • Computer science readers: do a knowledge point review;
  • Readers who are not computer majors: have a general understanding of grammar generation, which is convenient for a deeper understanding of the role of grammar and lays a foundation for further self-study.

Why learn the production form? Here’s the analogy:

  • If you’re a user of the Vue or React frameworks, all you need to do is learn the features of the two frameworks and write a good app. However, you can only be an app developer within the framework.
  • If you want to write your app better than anyone else, you have to go behind Vue/React to see how it works, and learn about it at a higher level, just like a high school teacher must be a college graduate and a bachelor’s or master’s degree teacher.

The same: the production is a tool that defines a language, and mastering it allows you to learn how JavaScript is defined from the top, making it more likely that more competitive technologies will be challenged in the future.

If you want to learn more about production, you can purchase the professional compilation Principles book

1

As the picture above shows,

  • Do something -> do something;
  • Have something for lunch -> have something for lunch;
  • What to do -> what to do;

The above sentences are Chinese, which is also the northern dialect of our daily communication, but southerners can understand them, so there is no strict grammatical definition of the language we use daily. We can express the same sentence in different ways when we are happy or unhappy. This kind of language is classified as “non-formal language”.

Formal languages Most computer programming languages are “formal languages”, which have strict grammatical definitions and must be written according to convention, or the runtime may fail to meet expectations. Formal languages are classified according to the Chomsky lineage

  • Type 0- : unrestricted grammar
  • Type 1- : context-dependent grammar
  • Type 2- : context-free grammar
  • Type 3- : regular grammar

0123 is an inclusion relation, for example, 3 contains the preceding 012

2 grammar, semantics, grammar

  • grammar

Describe the correct form of the language’s programs, and the syntax is intended for human users;

  • The semantic

Defines the meaning of the program, that is, what each syntactic feature of the program can do during execution.

  • Grammar (type 0123 mentioned above)

Formal rules used to describe the structure of a language’s grammar, which is shown to a computer compiler; Grammar includes:

See JavaScript LL(1) syntax.

Please refer to the Chinese translation version of JavaScript official syntax and lexical definition.

LL(1) : The first “L” indicates scanning input from left to right; The second “L” means produced to the left; The “1” indicates that only one forward input symbol is needed in each step to determine the parsing action.

Bnf-bacos normal Form BNF is a formal notation used to describe the grammar of a given language. It is a meta-language consisting of a set of symbols that not only rigorously represent the grammar rules, but also describe the grammar of the description that is context-free.

  • BNF metacharacters and their meanings are as follows:
The word" word" in double quotes stands for the characters themselves. Double_quote is used to represent double quotation marks; The words (possibly underlined) outside the quotation marks represent the syntactic parts; Angle brackets < > contain mandatory options; Square brackets [] contain optional; Braces {} contain items that can be repeated 0 to an infinite number of times; All items contained within parentheses () are a group that controls the precedence of the expression; From a vertical bar | said in its left and right sides, the equivalent of "OR" meaning; ::= means "to be defined by"; . Represents the term symbol;Copy the code
  • The way in which BNF represents grammatical rules
    • Non-terminal characters are enclosed in Angle brackets
    • The left side of each rule is non-terminal, and the right side is a string of symbols consisting of non-terminal and terminal characters separated by “:==” or “:=” or “->”, as shown in the following figure

  • Rules with the same left the public a left, each with a “|” between the right
  • Nonterminal: a symbol that can be derived again in a production, or that is nonterminal except for the terminal
  • Terminator: A formal symbol that cannot be derived again in a production, such as for, let, const, etc

3 production

In the computer source code through the compiler lexical analysis, grammar analysis after a series of compliance with the grammar rules (BNF) statement called production, BNF is a common type of production, in addition to EBNF, ABNF are on the basis of the BNF syntax extension, so in general each language standard, Each has a custom production writing style. JavaScript also has its own production writing, as follows:

// Function generation in JavaScript (ES5)
Element:
  function Identifier ( ParameterListOpt ) CompoundStatement
  Statement
Copy the code

It starts with an indent, which is equivalent to the non-terminal character on the left of the production, followed by a colon and then two Spaces of indentation. However, in the JavaScript standard, its non-terminal, plus and minus signs are represented in bold black.

For a more in-depth look at the Javascript language, click here

For more concise descriptions of grammatical production, please refer to the article

How to use LL(1) parser to generate AST please refer to

4 JavaScript ESTree

This is part of the review of the previous article. ESTree is a unified standard followed by the industry, which defines the expression form of all the syntax involved in JavaScript and provides a unified standard definition for the description of syntax elements. In addition, ESTree will be upgraded along with the continuous upgrading of ES.

The main points of

  • JavaScript generates an AST during parsing;
  • Grammatical analysis involves inserting Tokens derived from lexical analysis into production expressions (BNF) to replace non-terminals (the substitution rules follow LL(1)).
  • The resulting AST structure is defined by ESTree.

The part below the horizontal line is ESTree data structure, you can not look at it carefully, when used, you can refer to it.

Node

Node objects are similar to the JS superclass Object, which is the parent class of all objects and contains information related to type and location

interface Node {
    type: string;
    loc: SourceLocation | null;
}

interface SourceLocation {
    source: string | null;
    start: Position;
    end: Position;
}

interface Position {
    line: number; / / > = 1
    column: number; / / > = 0
}
Copy the code

Identifier

Identifiers are customized names, such as variable names, method names, class names, parameter names, and so on.

interface Identifier <: Expression, Pattern {
    type: "Identifier";
    name: string;
}
Copy the code

Literal

Literals that describe values of different data types

interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp | bigint;
}

interface RegExpLiteral <: Literal {
  regex: {
    pattern: string;
    flags: string;
  };
}

interface BigIntLiteral <: Literal {
  bigint: string;
}
Copy the code

Programs

SourceType specifies the type of module used to determine whether it is an exported module or the entire script document

interface Program <: Node {
    type: "Program";
    sourceType: "script" | "module";
    body: [ Statement | ModuleDeclaration ];
}
Copy the code

Functions

Used to describe a function that is not used directly and is the parent of FunctionDeclaration

interface Function <: Node {
    id: Identifier | null;
    async: boolean;
    generator: boolean;
    params: [ Pattern ];
    body: FunctionBody;
}
Copy the code

Statements

A definition description of a statement category

interface ExpressionStatement <: Statement {
    type: "ExpressionStatement";
    expression: Expression;
}

interface Directive <: Node {
    type: "ExpressionStatement";
    expression: Literal;
    directive: string;
}

interface BlockStatement <: Statement {
    type: "BlockStatement";
    body: [ Statement ];
}

interface FunctionBody <: BlockStatement {
    body: [ Directive | Statement ];
}

interface EmptyStatement <: Statement {
    type: "EmptyStatement";
}

dinterface DebuggerStatement <: Statement {
    type: "DebuggerStatement";
}

interface WithStatement <: Statement {
    type: "WithStatement";
    object: Expression;
    body: Statement;
}

interface ReturnStatement <: Statement {
    type: "ReturnStatement";
    argument: Expression | null;
}

interface LabeledStatement <: Statement {
    type: "LabeledStatement";
    label: Identifier;
    body: Statement;
}

interface BreakStatement <: Statement {
    type: "BreakStatement";
    label: Identifier | null;
}

interface IfStatement <: Statement {
    type: "IfStatement";
    test: Expression;
    consequent: Statement;
    alternate: Statement | null;
}

interface SwitchStatement <: Statement {
    type: "SwitchStatement";
    discriminant: Expression;
    cases: [ SwitchCase ];
}

interface SwitchCase <: Node {
    type: "SwitchCase";
    test: Expression | null;
    consequent: [ Statement ];
}

interface ThrowStatement <: Statement {
    type: "ThrowStatement";
    argument: Expression;
}

interface TryStatement <: Statement {
    type: "TryStatement";
    block: BlockStatement;
    handler: CatchClause | null;
    finalizer: BlockStatement | null;
}

interface CatchClause <: Node {
    type: "CatchClause";
    param: Pattern | null;
    body: BlockStatement;
}

interface WhileStatement <: Statement {
    type: "WhileStatement";
    test: Expression;
    body: Statement;
}

interface DoWhileStatement <: Statement {
    type: "DoWhileStatement";
    body: Statement;
    test: Expression;
}

interface ForStatement <: Statement {
    type: "ForStatement";
    init: VariableDeclaration | Expression | null;
    test: Expression | null;
    update: Expression | null;
    body: Statement;
}

interface ForInStatement <: Statement {
    type: "ForInStatement";
    left: VariableDeclaration |  Pattern;
    right: Expression;
    body: Statement;
}

interface ForOfStatement <: ForInStatement {
    type: "ForOfStatement";
    await: boolean;
}
Copy the code

Declarations

Definition of functions and variables

interface FunctionDeclaration <: Function, Declaration {
    type: "FunctionDeclaration";
    id: Identifier;
}

// Variable definition description, no assignment
interface VariableDeclaration <: Declaration {
    kind: "var" | "let" | "const";
    declarations: [ VariableDeclarator ];
}

// The variable itself is described, including assignment
interface VariableDeclarator <: Node {
    type: "VariableDeclarator";
    id: Pattern;
    init: Expression | null;
}
Copy the code

Expressions

Expressions, such as:

  • var a = 1 + 1; The 1 + 1 following this sentence is a BinaryExpression;
  • Const fn = function () {} The green part is a FunctionExpression
interface Expression <: Node { }

interface SpreadElement <: Node {
    type: "SpreadElement";
    argument: Expression;
}

// this.fn = this.fn is this. expression
interface ThisExpression <: Expression {
    type: "ThisExpression";
}

// [1,2,3] is an array expression
interface ArrayExpression <: Expression {
    type: "ArrayExpression";
    elements: [ Expression | SpreadElement | null ];
}

interface ObjectExpression <: Expression {
    type: "ObjectExpression";
    properties: [ Property | SpreadElement ];
}

interface Property <: Node {
    type: "Property";
    key: Expression;
    value: Expression;
    kind: "init" | "get" | "set";
    method: boolean;
    shorthand: boolean;
    computed: boolean;
}

interface FunctionExpression <: Function, Expression {
    type: "FunctionExpression";
}

interface UnaryExpression <: Expression {
    type: "UnaryExpression";
    operator: UnaryOperator;
    prefix: boolean;
    argument: Expression;
}

enum UnaryOperator {
    "-" | "+" | "!" | "~" | "typeof" | "void" | "delete"
}

interface UpdateExpression <: Expression {
    type: "UpdateExpression";
    operator: UpdateOperator;
    argument: Expression;
    prefix: boolean;
}

enum UpdateOperator {
    "+ +" | "--"
}

interface BinaryExpression <: Expression {
    type: "BinaryExpression";
    operator: BinaryOperator;
    left: Expression;
    right: Expression;
}

enum BinaryOperator {
    "= =" | ! "" =" | "= = =" | ! "" = ="
         | "<" | "< =" | ">" | "> ="
         | "< <" | "> >" | "> > >"
         | "+" | "-" | "*" | "* *" | "/" | "%"
         | "|" | "^" | "&" | "in"
         | "instanceof"
}

interface AssignmentExpression <: Expression {
    type: "AssignmentExpression";
    operator: AssignmentOperator;
    left: Pattern | Expression;
    right: Expression;
}

enum AssignmentOperator {
    "=" | "+ =" | "- =" | "* =" | "* * =" | "/ =" | "% ="
        | "< < =" | "> > =" | "> > > ="
        | "| =" | "^ =" | "& ="
        | "| | =" | "&& =" | "?? ="
}

interface LogicalExpression <: Expression {
    type: "LogicalExpression";
    operator: LogicalOperator;
    left: Expression;
    right: Expression;
}

enum LogicalOperator {
    "| |" | "&" | "??"
}

interface MemberExpression <: Expression, Pattern, ChainElement {
    type: "MemberExpression";
    object: Expression | Super;
    property: Expression;
    computed: boolean;
}

interface ConditionalExpression <: Expression {
    type: "ConditionalExpression";
    test: Expression;
    alternate: Expression;
    consequent: Expression;
}

interface CallExpression <: Expression, ChainElement {
    type: "CallExpression";
    callee: Expression | Super;
    arguments: [ Expression | SpreadElement ];
}

interface NewExpression <: Expression {
    type: "NewExpression";
    callee: Expression;
    arguments: [ Expression | SpreadElement ];
}

interface SequenceExpression <: Expression {
    type: "SequenceExpression";
    expressions: [ Expression ];
}

interface ArrowFunctionExpression <: Function, Expression {
    type: "ArrowFunctionExpression";
    body: FunctionBody | Expression;
    expression: boolean;
}

interface YieldExpression <: Expression {
    type: "YieldExpression";
    argument: Expression | null;
    delegate: boolean;
}

interface AwaitExpression <: Expression {
    type: "AwaitExpression";
    argument: Expression;
}

interface ChainExpression <: Expression {
  type: "ChainExpression"
  expression: ChainElement
}

interface ChainElement <: Node {
  optional: boolean
}

interface ImportExpression <: Expression {
  type: "ImportExpression";
  source: Expression;
}
Copy the code

Patterns

Note that Identifier is also a subclass of Pattern

interface Pattern <: Node { }

interface AssignmentProperty <: Property {
    type: "Property"; // inherited
    value: Pattern;
    kind: "init";
    method: false;
}

interface ObjectPattern <: Pattern {
    type: "ObjectPattern";
    properties: [ AssignmentProperty, RestElement];
}
// let {a, b} = c

interface ArrayPattern <: Pattern {
    type: "ArrayPattern";
    elements: [ Pattern | null ];
}
// let [a, ...c] = b; The left parentheses are ArrayPattern

interface RestElement <: Pattern {
    type: "RestElement";
    argument: Pattern;
}
// funtion foo(a, ... C) {} in the... C is RestElement

interface AssignmentPattern <: Pattern {
    type: "AssignmentPattern";
    left: Pattern;
    right: Expression;
}
Function foo(a =1, b){} a=1 is AssignmentPattern

Copy the code

Template Literals

interface TemplateLiteral <: Expression {
    type: "TemplateLiteral";
    quasis: [ TemplateElement ];
    expressions: [ Expression ];
}

interface TaggedTemplateExpression <: Expression {
    type: "TaggedTemplateExpression";
    tag: Expression;
    quasi: TemplateLiteral;
}

interface TemplateElement <: Node {
    type: "TemplateElement";
    tail: boolean;
    value: {
        cooked: string | null;
        raw: string;
    };
}
Copy the code

Classes

interface Super <: Node {
    type: "Super";
}

interface Class <: Node {
    id: Identifier | null;
    superClass: Expression | null;
    body: ClassBody;
}

interface ClassBody <: Node {
    type: "ClassBody";
    body: [ MethodDefinition ];
}

interface MethodDefinition <: Node {
    type: "MethodDefinition";
    key: Expression;
    value: FunctionExpression;
    kind: "constructor" | "method" | "get" | "set";
    computed: boolean;
    static: boolean;
}

interface ClassDeclaration <: Class, Declaration {
    type: "ClassDeclaration";
    id: Identifier;
}

interface ClassExpression <: Class, Expression {
    type: "ClassExpression";
}

interface MetaProperty <: Expression {
    type: "MetaProperty";
    meta: Identifier;
    property: Identifier;
}
Copy the code

Modules

The node types ending with Declearation can form separate sentences. The node types ending with Specifier are part of the node types ending with Declearation. The official documents put the two types at the same level. This document puts the node types at the end of the Specifier at a sub-level for easy differentiation.

interface ModuleDeclaration <: Node { }

interface ModuleSpecifier <: Node {
    local: Identifier;
}

interface ImportDeclaration <: ModuleDeclaration {
    type: "ImportDeclaration";
    specifiers: [ ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier ];
    source: Literal;
}

interface ImportSpecifier <: ModuleSpecifier {
    type: "ImportSpecifier";
    imported: Identifier;
}

interface ImportDefaultSpecifier <: ModuleSpecifier {
    type: "ImportDefaultSpecifier";
}

interface ImportNamespaceSpecifier <: ModuleSpecifier {
    type: "ImportNamespaceSpecifier";
}

interface ExportNamedDeclaration <: ModuleDeclaration {
    type: "ExportNamedDeclaration";
    declaration: Declaration | null;
    specifiers: [ ExportSpecifier ];
    source: Literal | null;
}

interface ExportSpecifier <: ModuleSpecifier {
    type: "ExportSpecifier";
    exported: Identifier;
}

interface AnonymousDefaultExportedFunctionDeclaration <: Function {
    type: "FunctionDeclaration";
    id: null;
}

interface AnonymousDefaultExportedClassDeclaration <: Class {
    type: "ClassDeclaration";
    id: null;
}

interface ExportDefaultDeclaration <: ModuleDeclaration {
    type: "ExportDefaultDeclaration";
    declaration: AnonymousDefaultExportedFunctionDeclaration | FunctionDeclaration | AnonymousDefaultExportedClassDeclaration | ClassDeclaration | Expression;
}

interface ExportAllDeclaration <: ModuleDeclaration {
    type: "ExportAllDeclaration";
    source: Literal;
    exported: Identifier | null;
}
Copy the code