1. The AST abstract syntax code (AST) is a tree representation of the abstract syntax structure of the source code. Each node in the tree represents a structure in the source code, so it is abstract because the abstract syntax tree does not represent every detail where the real syntax occurs, for example, The nested parentheses are implicit in the structure of the tree and are not presented as nodes. Once you convert the source code into an AST, you can do a lot of things with the AST, including things you wouldn’t think of, things that do all sorts of things, and bring you into a different world. Understanding the principles of compilation can really do anything you want.
The -super-tiiny-compiler is a JavaScript compiler
- Most compilers break down into three primary stages: Parsing, Transformation, and Code Generation
(3) Parsing the code and transforming it into a more abstract representation of the code This abstract representation and manipulates to do whatever the compiler wants it to. (3) Code Generation takes the transformed representation of the code and turns it into new code.
Parsing Parsing typically gets broken down into two phases: Lexical Analysis and Syntactic Analysis. (1) Lexical Analysis takes the raw code and splits it apart into these things Syntactic Analysis takes the tokens and reformats them into tokens called by a thing called a Tokenizer (or lexer) a representation that describes each part of the syntax and their relation to one another. This is known as an intermediate representation or Abstract Syntax Tree. An Abstract Syntax Tree, or AST for short, is a deeply nested object that represents code in a way that is both easy to work with and tells us a lot of information.
The Transformation:
- The next type of stage for a compiler is transformation. Again, this just takes the AST from the last step and makes changes to it. It can manipulate the AST in the same language or it can translate it into an entirely new language.
- When transforming the AST we can manipulate nodes by adding/removing/replacing properties, we can add new nodes, remove nodes, or we could leave the existing AST alone and create an entirely new one based on it.
- Since we’re targeting a new language, we’re going to focus on creating an entirely new AST that is specific to the target language.
AST tree demo:
* {
* type: 'Program',
* body: [{
* type: 'CallExpression',
* name: 'add',
* params: [{
* type: 'NumberLiteral',
* value: '2'
* }, {
* type: 'CallExpression',
* name: 'subtract',
* params: [{
* type: 'NumberLiteral',
* value: '4'
* }, {
* type: 'NumberLiteral',
* value: '2'
* }]
* }]
* }]
* }
Code Generation
- The final phase of a compiler is code generation. Sometimes compilers will do things that overlap with transformation, but for the most part code generation just means take our AST and string-ify code back out.
- Code generators work several different ways, some compilers will reuse the tokens from earlier, others will have created a separate representation of the code so that they can print nodes linearly, but from what I can tell most will use the same AST we just created, which is what we’re going to focus on.
Acorn:A Tiny, Fast JavaScript Parser Written in Javascript. Babylon: A JavaScript parser used in Babel, Heavily based on acorn and acorn-jsx.
(1) Interface: parse(input, options) is the main interface to the library. The return value will be an abstract syntax tree object as specified by the ESTree spec.
Let acorn = require("acorn") console.log(acorn.parse("1 + 1", {ecmaVersion: 2020})) output: {"type": "Program", "start": 0, "end": 9, "body": [ { "type": "VariableDeclaration", "start": 0, "end": 9, "declarations": [ { "type": "VariableDeclarator", "start": 4, "end": 9, "id": { "type": "Identifier", "start": 4, "end": 5, "name": "a" }, "init": { "type": "Literal", "start": 8, "end": 9, "value": 3, "raw": "3" } } ], "kind": "let" } ], "sourceType": "script" }
(2) Acron library source code
Familiar with the following JS all keywords, reserved words, in-depth understanding of the language to provide all the functions.
// Reserved word lists for various dialects of the language var reservedWords = { 3: "abstract boolean byte char class double enum export extends final float goto implements import int interface long native package private protected public short static super synchronized throws transient volatile", 5: "class enum extends super const export import", 6: "enum", strict: "implements interface let package private protected public static yield", strictBind: "eval arguments" }; // And the keywords var ecma5AndLessKeywords = "break case catch continue debugger default do else finally for function if return switch throw try var while with null true false instanceof typeof void delete new in this"; var keywords = { 5: ecma5AndLessKeywords, "5module": ecma5AndLessKeywords + " export import", 6: ecma5AndLessKeywords + " const class extends export import super" }; var keywordRelationalOperator = /^in(stanceof)? $/;