First of all, LET me tell you why I came up with this idea, because I found it interesting to use this syntax called tail closure frequently in my free time learning Swift and SwiftUI recently. As I had seen JamieBuilds’ The-super-tiny-Compiler a long time ago, I wondered if I could implement a similarly fun and simple compiler myself. Hence the jS-trailing -closure- toy-Compiler project, and today’s article.

For those of you unfamiliar with Swift, let me explain what a tail closure is. In simple terms, if the last argument to a function is also a function, then we can pass the last function as a tail closure. Take a look at the following code example:

// A represents a function

// #1: simple version
// Swift mode
a(){
  // The contents of the tail closure
}
// JavaScript
a(() = > {})

// #2: the function takes arguments
// the type of the parameter is ignored
a(1.2.3){
  // The contents of the tail closure
}
// JavaScript
a(1.2.3.() = > {})

// #3: The tail closure takes parameters
// the type of the parameter is ignored
a(1.2.3){ arg1, arg2 in
  // The contents of the tail closure
}
// JavaScript
a(1.2.3.(arg1, arg2) = > {})

Copy the code

If there is any doubt about Swift tail Closures, take a look at Closures in the official documentation, which explains them quite clearly.

I remember reading the source code for the -super-tiny-Compiler project a long time ago, but only briefly at the time. I thought I knew something about it. But when I wanted to realize this idea in my own mind. But I found that I had not mastered some of the methods and skills practiced in this project. So I decided to take a good look at the source code of this project, and then implement a sample with the same functionality as the original project before starting to implement their own small compiler.

Friendship tips, the following article content is relatively long, it is recommended to read carefully after collection.

Compiler implementation process

From the-super-tiny-compiler we can see that for general compilers. There are four main steps to complete the compilation process, which are:

  • Tokenizer: Converts our code text strings into meaningful units (i.etoken). Such asif."hello".123.let.constAnd so on.
  • Parser: Converts the token obtained in the previous step into an Abstract Syntax Tree (AST) of the current language. Why do you do that? After doing this, we know the sequence and hierarchy of the statements in the code. You know the order of the run, the context and so on.
  • Transformer: Converts the AST obtained in the previous step into the AST of the target language. Why do I do this step? For the same function of the program statements, if the choice of implementation language is not the same, then their syntax is likely to be different. As a result, their corresponding abstract syntax trees are different. So we need to do a transformation to get ready for the next generation of code in the target language.
  • CodeGenerator: This step is relatively simple, once we know the syntax of the target language, we can easily generate the code of the target language by using the new abstract syntax tree generated in the previous step.

The steps above give you an overview of the compiler’s workflow, but it’s not enough to just know it. If you’re interested, you can click on the JavaScript Trailing Closure Toy Compiler here to see what the final implementation looks like. If you’re interested in the implementation process, read on, you’ll get a lot out of it, and you might want to implement an interesting compiler yourself.

Tokenizer: Converts code strings totoken

The first thing we need to understand is why do we convert strings to individual stringstokenBecause if we don’t do the conversion, we don’t know what this program is going to mean, becausetokenIs a necessary condition for understanding a program.

This is similar to console.log(” Hello World! For example, we know what it does at a glance, but how do we think about it? Is it the console first we know the console object, and then the console object. We know that it is the property operator that gets the object, followed by the log method, and then the call to the method needs to (open parenthesis to start, followed by Hello World! The string is the argument, followed by the) closing parenthesis.

So the purpose of converting strings into tokens is to let us know what this program is meant to represent. Because based on the value of each token, as well as the position of the token, we can know exactly what the token represents and what it does.

For our compiler, the first step is to divide the tokens we need, according to the code example above. We can know that there are several types of tokens we need:

  • digital: for instance,1.66And so on.
  • string: for instance,"hello"And so on.
  • identifier: for instance,aIn the context of our compiler, this is usually the name of a function or variable.
  • parentheses:(and), is used here to refer to a function call.
  • Curly braces:{and}, is used here to represent the function body.
  • The comma:.Is used to split parameters.
  • Whitespace characters: To distinguish one from anothertoken.

Since our compiler is focused on the desired implementation of tail closures for now, we only need to focus on the above token types for the time being.

This step is actually quite simple, it is according to our requirements, cyclic read token, the code part is as follows:

// Parse strings as Tokens
const tokenizer = (input) = > {
    // Simple re
    const numReg = /\d/;
    const idReg = /[a-z]/i;
    const spaceReg = /\s/;

    / / Tokens array
    const tokens = [];

    // Determine the length of the input
    const len = input.length;
    if (len > 0) {
        let cur = 0;
        while(cur < len) {
            let curChar = input[cur];

            // Check if it is a number
            if (numReg.test(curChar)) {
                let num = ' ';
                while(numReg.test(curChar) && curChar) {
                    num += curChar;
                    curChar = input[++cur];
                }
                tokens.push({
                    type: 'NumericLiteral'.value: num
                });
                continue;
            }

            // Check whether it is an identifier
            if (idReg.test(curChar)) {
                let idVal = ' ';
                while(idReg.test(curChar) && curChar) {
                    idVal += curChar;
                    curChar = input[++cur];
                }

                // Check if it is the in keyword
                if (idVal === 'in') {
                    tokens.push({
                        type: 'InKeyword'.value: idVal
                    });
                } else {
                    tokens.push({
                        type: 'Identifier'.value: idVal
                    });
                }
                continue;
            }

            // Check if it is a string
            if (curChar === '"') {
                let strVal = ' ';
                curChar = input[++cur];
                while(curChar ! = ='"') {
                    strVal += curChar;
                    curChar = input[++cur];
                }
                tokens.push({
                    type: 'StringLiteral'.value: strVal
                });
                // The last double quotation mark of the string needs to be handled
                cur++;
                continue;
            }

            // Check if it is an open parenthesis
            if (curChar === '(') {
                tokens.push({
                    type: 'ParenLeft'.value: '('
                });
                cur++;
                continue;
            }

            // Check if it is a close parenthesis
            if (curChar === ') ') {
                tokens.push({
                    type: 'ParenRight'.value: ') '
                });
                cur++;
                continue;
            }

            // Check whether it is an open curly brace
            if (curChar === '{') {
                tokens.push({
                    type: 'BraceLeft'.value: '{'
                });
                cur++;
                continue;
            }

            // Check if it is a close curly brace
            if (curChar === '} ') {
                tokens.push({
                    type: 'BraceRight'.value: '} '
                });
                cur++;
                continue;
            }

            // Check whether it is a comma
            if (curChar === ', ') {
                tokens.push({
                    type: 'Comma'.value: ', '
                });
                cur++;
                continue;
            }

            // Check for whitespace
            if (spaceReg.test(curChar)) {
                cur++;
                continue;
            }

            throw new Error(`${curChar} is not a good character`); }}console.log(tokens, tokens.length);
    return tokens;
};
Copy the code

The code above is not very complicated, but there are a few points that need to be noted. If you are not careful, you can easily make mistakes or enter an endless loop. Here are some of the areas where I see problems:

  • The outer layer is usedwhileLoop, each loop starts with the character corresponding to the current subscript. The reason it’s not usedforThe loop is because of the subscript of the current charactercurIs propelled by the judgment inside, usingwhileIt’s more convenient.
  • If a string, number, or identifier is read, an inner loop continues until the next character is not of the desired type. Because if more than one character may be read, you need to determine whether the next character matches the current type. If it does not, the read of the current type is terminated and the loop needs to break out of the current loop and proceed to the next loop.
  • For strings, the “at the beginning and end of the string should be skipped and not counted in the string value. If you encounter whitespace, skip it.

The process is not technically difficult and requires a little more patience. Once the implementation is complete, we can test it:

tokenizer(`a(1){}`)
Copy the code

You can see the output as follows:

(6[{...}, {...}, {...}, {...}, {...}]0: {type: "Identifier".value: "a"}
1: {type: "ParenLeft".value: "("}
2: {type: "NumericLiteral".value: "1"}
3: {type: "ParenRight".value: ")"}
4: {type: "BraceLeft".value: "{"}
5: {type: "BraceRight".value: "}"}
Copy the code

You can see that the output is exactly what we want, and we’re 25% of the way there. The next step is to convert the resulting token array into an AST abstract syntax tree.

Parser:tokenArray conversion toASTAbstract syntax tree

The next step is to convert the token array into an AST(abstract syntax tree). After the previous step, we convert the code string into meaningful tokens. Once we have these tokens, we can deduce the whole abstract syntax tree from the meaning of each token.

For example, when we encounter {, we know that until we encounter the next}, all the tokens in between represent the body of a function (regardless of other cases for now).

Examples of tokens are shown below:

The program statement should be:

a(1) {
  // block
};
Copy the code

The corresponding abstract syntax tree would look something like this:

{
 "type": "Program"."body": [{"type": "CallExpression"."value": "a"."params": [{"type": "NumericLiteral"."value": "1"."parentType": "ARGUMENTS_PARENT_TYPE"}]."hasTrailingBlock": true."trailingBlockParams": []."trailingBody": []}]}Copy the code

We can simply look at the abstract syntax tree above. First, the outermost type is Program, and then the content inside the body represents the content of our code. Here our body array has just one element, which represents CallExpression, which is a function call.

The CallExpression function name is A, and then the first parameter type value of the function is NumericLiteral, which has a value of 1. The parent of this argument is ARGUMENTS_PARENT_TYPE, which is explained below. The CallExpression then has a hasTrailingBlock value of true, indicating that it is a tail closure call. Then trailingBlockParams indicates that the tail closure has no arguments, and trailingBody indicates that the contents of the tail closure are empty.

The above is just a simple explanation, the detailed code section is as follows:

// Convert Tokens to AST
const parser = (tokens) = > {
    const ast = {
        type: 'Program'.body: []};let cur = 0;

    const walk = () = > {
        let token = tokens[cur];

        // The number is returned directly
        if (token.type === 'NumericLiteral') {
            cur++;
            return {
                type: 'NumericLiteral'.value: token.value
            };
        }

        // Is a string returned directly
        if (token.type === 'StringLiteral') {
            cur++;
            return {
                type: 'StringLiteral'.value: token.value
            };
        }

        // the comma is returned directly
        if (token.type === 'Comma') {
            cur++;
            return;
        }

        // If it is an identifier, in this case we only have the function call, so we need to check whether the function has any other arguments
        if (token.type === 'Identifier') {
            const callExp = {
                type: 'CallExpression'.value: token.value,
                params: [].hasTrailingBlock: false.trailingBlockParams: [].trailingBody: []};// Specify the type of the parent node
            const specifyParentNodeType = () = > {
                // Filter comma
                callExp.params = callExp.params.filter(p= > p);
                callExp.trailingBlockParams = callExp.trailingBlockParams.filter(p= > p);
                callExp.trailingBody = callExp.trailingBody.filter(p= > p);

                callExp.params.forEach((node) = > {
                    node.parentType = ARGUMENTS_PARENT_TYPE;
                });
                callExp.trailingBlockParams.forEach((node) = > {
                    node.parentType = ARGUMENTS_PARENT_TYPE;
                });
                callExp.trailingBody.forEach((node) = > {
                    node.parentType = BLOCK_PARENT_TYPE;
                });
            };
            const handleBraceBlock = () = > {
                callExp.hasTrailingBlock = true;
                // Collect the arguments to the closure function
                token = tokens[++cur];
                const params = [];
                const blockBody = [];
                let isParamsCollected = false;
                while(token.type ! = ='BraceRight') {
                    if (token.type === 'InKeyword') {
                        callExp.trailingBlockParams = params;
                        isParamsCollected = true;
                        token = tokens[++cur];
                    } else {
                        if(! isParamsCollected) { params.push(walk()); token = tokens[cur]; }else {
                            // Process the data inside curly bracesblockBody.push(walk()); token = tokens[cur]; }}}// If isParamsCollected is still false, there are no parameters in the curly braces
                if(! isParamsCollected) {// If there are no parameters, the collected parameters are not parameters
                    callExp.trailingBody = params;
                } else {
                    callExp.trailingBody = blockBody;
                }
                // Handle the curly braces on the right
                cur++;
            };
            // Check whether the token following is' (' or '{'
            // You need to determine whether the current token is a function call or a parameter
            const next = tokens[cur + 1];
            if (next.type === 'ParenLeft' || next.type === 'BraceLeft') {
                token = tokens[++cur];
                if (token.type === 'ParenLeft') {
                    // The parameters of the function need to be collected
                    // Need to check whether the next token is') '
                    token = tokens[++cur];
                    while(token.type ! = ='ParenRight') {
                        callExp.params.push(walk());
                        token = tokens[cur];
                    }
                    // Handle the parentheses on the right
                    cur++;
                    // Get the token after ') '
                    token = tokens[cur];
                    // handle the trailing closure; You need to determine whether the token exists considering 'func()'
                    if (token && token.type === 'BraceLeft') { handleBraceBlock(); }}else {
                    handleBraceBlock();
                }
                // Specify the type of the parent node corresponding to the node
                specifyParentNodeType();
                return callExp;
            } else {
                cur++;
                return {
                    type: 'Identifier'.value: token.value }; }}throw new Error(`this ${token} is not a good token`);
    };

    while (cur < tokens.length) {
        ast.body.push(walk());
    }

    console.log(ast);
    return ast;
};
Copy the code

In order to make it easier for you to understand, I have added some notes to the key points. Again, a brief explanation of the above code.

First we need to iterate through the tokens array. First we define the outermost structure of the abstract syntax tree:

const ast = {
  type: 'Program'.body: []};Copy the code

This is defined so that subsequent node objects can be added to our abstract syntax tree according to certain rules.

We then define a walk function to iterate over the elements in the tokens array. The walk function returns numbers, strings, and commas. When the token is an identifier, there are more cases to determine.

For an identifier, there are two ways to handle it in our situation:

  • One is a single identifier followed by a token that represents neither (nor {.
  • The other case represents a call to a function. For a call to a function, we need to consider the following cases:
    • In one case, the function has only one tail closure, with no additional arguments. Such asa{};
    • Another case is when a function is called without a tail closure, either with or without arguments. Such asa()ora(1);
    • Finally, a function call contains a tail closure. Such asa{}.a(){}.a(1){}And so on.In the case of tail closures, we also need to consider whether the tail closure has any parameters, such asa(1){b, c in }.

Next, we will give a simple explanation on the processing of token as identifier type. If we judge token as identifier type, we will first define an object of CallExpression type callExp, which is the syntax tree object used to represent our function calls. This object has the following properties:

  • Type: indicates the node type
  • Value: indicates the node name, which indicates the function name
  • Params: Represents the parameters of the function call
  • HasTrailingBlock: Indicates whether the current function call contains a tail closure
  • TrailingBlockParams: Indicates whether the tail closure contains arguments
  • TrailingBody: The contents of a tail closure

Next, determine what token type follows the identifier. In the case of a function call, the token following the current token must be (or {. If not, we simply return the identifier.

If it is a function call, we need to do two things, one is to collect the parameters of the function call, and one is to determine whether the function call contains a tail closure. The collection of function parameters is relatively simple. First, determine whether the token after the current token indicates yes (if so, start to collect parameters until the next token type is encountered) to indicate the end of parameter collection. Another point to note is that since the argument may be a function, we need to call the walk function again when we collect the argument to help us recursively process the argument.

The next step is to determine whether the function call contains a tail closure. For the judgment of tail closure, there are two cases to consider: one is that the function call contains parameters, after which there is a tail closure; The other is that the function is called with no arguments and is simply a tail closure. So we need to do a little bit of both.

Since there are two places to check for tail closures, we can pull this logic out of the handleBraceBlock function, which helps us handle tail closures. Let’s explain how tail closures are handled.

If we determine that the next token is {then we need to do tail closure processing. We first set the hasTrailingBlock property of the callExp object to true. Then you need to determine whether the tail closure contains parameters, and you need to process the contents of the tail closure.

How do I collect the parameters of a tail closure? We need to determine if there is an in keyword inside the closure. If there is an in keyword, it means that the closure contains parameters. If there is no in keyword, it means that there are no parameters in the closure.

Since we don’t know if the in keyword is in the tail closure at first, what we collect at first may be the contents of the tail closure, or it may be parameters. So if there is no in keyword after the} closing token is encountered, then all we have collected is the contents of the closing token.

We need to use the Walk function to recursively collect both the parameters and the contents of the tail closure, since the parameters and contents may not be basic numeric type values. (To simplify the operation, we also use Walk recursively for the parameters of the tail closure.)

Before returning callExp objects, we need to do additional processing using specifyParentNodeType help. The first is to remove the token representing the callExp object. The other is to specify the type of parent for the nodes in the Params, trailingBlockParams and trailingBody properties of the callExp object. For params and trailingBlockParams, both parent nodes are of type ARGUMENTS_PARENT_TYPE; For trailingBody, its parent is of type BLOCK_PARENT_TYPE. This processing is convenient for us to proceed to the next operation. We’ll explain this again as we go through the next steps.

Transformer: Converts the old AST into an AST for the target language

The next step is to convert the raw AST we obtained into the AST for the target language, so why do we do this? This is because the same encoding logic behaves differently in different host languages. So we’re going to convert the original AST into the AST of our target language.

So how do we do that? The original AST is a tree structure, and we need to traverse the tree structure; Traversal requires depth-first traversal, because for a nested structure, the contents of the outside can only be determined once the contents of the inside are determined.

One design pattern we’ll use for traversing the tree is the visitor pattern. We need a visitor object to our tree depth-first traversal of the type of the object, the visitors to object to the processing function of different types of nodes, when a node, we can according to the type of the current node, corresponding processing function was obtained from the object of visitors to deal with this node.

Let’s start by looking at how to traverse the original tree structure, where each node is either an object of a specific type or an array. So we’re going to do both cases separately. We first determine how to traverse the tree structure. The code for this part is as follows:

// Iterate over the node
const traverser = (ast, visitor) = > {
    const traverseNode = (node, parent) = > {

        const method = visitor[node.type];
        if (method && method.enter) {
            method.enter(node, parent);
        }

        const t = node.type;
        switch (t) {
            case 'Program':
                traverseArr(node.body, node);
                break;
            case 'CallExpression':
                / / ArrowFunctionExpression processing
                // TODO considers that there is a tail closure inside the body
                if (node.hasTrailingBlock) {
                    node.params.push({
                        type: 'ArrowFunctionExpression'.parentType: ARGUMENTS_PARENT_TYPE,
                        params: node.trailingBlockParams,
                        body: node.trailingBody
                    });
                    traverseArr(node.params, node);
                } else {
                    traverseArr(node.params, node);
                }
                break;
            case 'ArrowFunctionExpression':
                traverseArr(node.params, node);
                traverseArr(node.body, node);
                break;
            case 'Identifier':
            case 'NumericLiteral':
            case 'StringLiteral':
                break;
            default:
                throw new Error(`this type ${t} is not a good type`);
        }

        if(method && method.exit) { method.exit(node, parent); }};const traverseArr = (arr, parent) = > {
        arr.forEach((node) = > {
            traverseNode(node, parent);
        });
    };
    traverseNode(ast, null);
};
Copy the code

Let me briefly explain the traverser function, which defines two traverseNode functions and traverseArr functions. If the current node is an array, we need to process each node in the array separately.

The primary traverseNode processing logic for nodes is in the traverseNode. What does this function do? First, the processing method of the corresponding node is obtained from the visitor object based on the node type. Then the contact type is judged, if the node type is the basic type, it is not processed; If the node type is ArrowFunctionExpression arrow function, the params and body properties of the node need to be traversed in sequence. If the type of node is CallExpression, which means that the current node is a function call node, then we need to determine whether the function call contains a tail closure. If it does, then our original function call requires an additional parameter, which is an arrow function. So there is code like the following to determine:

// ...
if (node.hasTrailingBlock) {
    node.params.push({
        type: 'ArrowFunctionExpression'.parentType: ARGUMENTS_PARENT_TYPE,
        params: node.trailingBlockParams,
        body: node.trailingBody
    });
    traverseArr(node.params, node);
} else {
    traverseArr(node.params, node);
}
// ...
Copy the code

The next step is to iterate over the params properties of the CallExpression node. When a function call contains a tail closure, we add an object of type ArrowFunctionExpression to the params property of the node. The parentType of this object is ARGUMENTS_PARENT_TYPE. This way we know the type of the parent node of this object, which we can use for syntax tree conversion.

The next step is to define the handling method of the different node types on the visitor object. The specific code is as follows:

const transformer = (ast) = > {
    const newAst = {
        type: 'Program'.body: []}; ast._container = newAst.body;const getNodeContainer = (node, parent) = > {
        const parentType = node.parentType;
        if (parentType) {
            if (parentType === BLOCK_PARENT_TYPE) {
                return parent._bodyContainer;
            }
            if (parentType === ARGUMENTS_PARENT_TYPE) {
                returnparent._argumentsContainer; }}else {
            returnparent._container; }}; traverser(ast, {NumericLiteral: {
            enter: (node, parent) = > {
                getNodeContainer(node, parent).push({
                    type: 'NumericLiteral'.value: node.value }); }},StringLiteral: {
            enter: (node, parent) = > {
                getNodeContainer(node, parent).push({
                    type: 'StringLiteral'.value: node.value }); }},Identifier: {
            enter: (node, parent) = > {
                getNodeContainer(node, parent).push({
                    type: 'Identifier'.name: node.value }); }},CallExpression: {
            enter: (node, parent) = > {
                // TODO optimizes
                const callExp = {
                    type: 'CallExpression'.callee: {
                        type: 'Identifier'.name: node.value
                    },
                    arguments: [].blockBody: []};// Add _container to the parameternode._argumentsContainer = callExp.arguments; node._bodyContainer = callExp.blockBody; getNodeContainer(node, parent).push(callExp); }},ArrowFunctionExpression: {
            enter: (node, parent) = > {
                // TODO optimizes
                const arrowFunc = {
                    type: 'ArrowFunctionExpression'.arguments: [].blockBody: []};// Add _container to the parameternode._argumentsContainer = arrowFunc.arguments; node._bodyContainer = arrowFunc.blockBody; getNodeContainer(node, parent).push(arrowFunc); }}});console.log(newAst);
    return newAst;
};
Copy the code

We first define the outer properties of the newAst, and then ast._container = newast.body. The effect of this operation is to associate the old AST with the newAst outermost layer, because we are iterating over the old AST. This allows us to point to the new AST via the _container property. So when we add elements to _container, we’re actually adding nodes to the new AST. It’s relatively easy for us to do this.

Then there is the getNodeContainer function, which retrieves the _container attribute of the parent node of the current node. If the parentType attribute of the current node is not empty, the parent node of the current node could represent a function call parameter or the contents of a tail closure. In this case, you can determine the node. ParentType. If the parentType attribute of the current node is empty, the _container attribute of the current node’s parent node is the _container attribute of the parent node.

The next step is to handle the different node types of the visitor object. For the basic type, return the corresponding node directly. If it is of the CallExpression and ArrowFunctionExpression types, it requires some additional processing.

For the ArrowFunctionExpression node, an arrowFunc object is declared. The _argumentsContainer attribute of the node points to the Arguments attribute of the arrowFunc object. Point the _bodyContainer property of the node to the blockBody property of the arrowFunc object. Then get the _container property of the parent node of the current node, and finally add arrowFunc to the property. The processing of nodes whose node type is CallExpression is similar to the above, except that the object defined has an additional Callee attribute indicating the function name of the function call.

At this point the conversion of the old AST to the new AST is complete.

CodeGenerator: Traverses the new AST generated code

This step is relatively simple, according to the node type of the concatenation of the corresponding type of code can be; The detailed code is shown below:

const codeGenerator = (node) = > {
    const type = node.type;
    switch (type) {
        case 'Program':
            return node.body.map(codeGenerator).join('; \n');
        case 'Identifier':
            return node.name;
        case 'NumericLiteral':
            return node.value;
        case 'StringLiteral':
            return `"${node.value}"`;
        case 'CallExpression':
            return `${codeGenerator(node.callee)}(${node.arguments.map(codeGenerator).join(', ')}) `;
        case 'ArrowFunctionExpression':
            return ` (${node.arguments.map(codeGenerator).join(', ')}) = > {${node.blockBody.map(codeGenerator).join('; ')}} `;
        default:
            throw new Error(`this type ${type} is not a good type`); }};Copy the code

Perhaps the most important thing to note is the handling of the CallExpression and ArrowFunctionExpression nodes, for CallExpression it needs to add the name of the function, and then the parameters of the function call. For ArrowFunctionExpression, you need to process the arguments of the arrow function as well as the contents of the function body. This step is relatively simple compared to the previous three steps.

The next step is to combine these four steps, and the simple compiler is complete. The specific code is as follows:

/ / assembly
const compiler = (input) = > {
    const tokens = tokenizer(input);
    const ast = parser(tokens);
    const newAst = transformer(ast);
    return codeGenerator(newAst);
};

// Export the corresponding module
module.exports = {
    tokenizer,
    parser,
    transformer,
    codeGenerator,
    compiler
};
Copy the code

Simple summary

If you have the patience to read through it, you will find that completing a simple compiler is not that complicated. We need to figure out what these four processes do, and then pay attention to some special areas that require special treatment, and one that requires a little bit of patience.

Of course, our version of the implementation is only a simple part of the functionality we want, in fact, the real compiler to consider a lot of things. The above version of the code is not very standard in many places, the initial implementation of how to consider the implementation, details and maintainability did not consider too much. If you have any good ideas or mistakes, please send Issues or Pull Request to this project to make it better. You are also welcome to leave a comment below the article and see if you can bump into any new ideas and ideas.

Some of you might say, well, what’s the point of learning this stuff? In fact, there are many uses. First of all, the construction of the front end is basically dependent on Babel’s support for new JavaScript features. Babel’s function is actually a compiler that converts the new features of our language into some syntax supported by current browsers, so that we can easily use the new syntax. It also takes some of the burden off front-end development.

On the other hand, if you know these principles, not only can you easily read some of the source code of Babel’s syntax converter, you can also implement a simple syntax converter or some interesting plug-ins yourself. This will give your front end a big boost.

How time flies, it has been two months since the last article was published 😂, this article is also the first article after the year, I hope to continue to output some high-quality articles in the future. Of course, the previous design Mode Adventure series will continue to update, and you are welcome to continue to follow.

This is the end of today’s article, if you have any comments or suggestions on this article, feel free to leave a comment below this article, or put forward here. Also welcome everyone to pay attention to my public account Guanshan is not difficult, if you think this article is written well, or helpful to you, then like to share it ~

Reference:

  • the-super-tiny-compiler